[darcs-users] darcs patch: fpstring.c: switch a memchr for memrchr

Tue Apr 29 02:50:24 UTC 2008

On 2008.04.26 04:30:32 -0700, David Roundy <droundy at darcs.net> scribbled 2.8K characters:
> On Fri, Apr 25, 2008 at 07:40:49PM -0400, Gwern Branwen wrote:
> > That's a workable solution, but I think I already suggested that we
> > simply limit how much of the string we look at. This is fundamentally a
> > trade-off that I think only Dr. Roundy can answer: is it alright to risk
> > increasing the false negative rate for is_binary, for even a potentially
> > significant space-time improvement?
>
> I think this is a reasonable option.  Why not look (as I think Jason
> suggested) in min(4k,len) of the contents? I think 4k is quite a resonable
> cutoff, as it's probably at or under a page on all architectures, which
> means that most likely it takes no extra IO to read 4k instead of less.  Of
> course, this only helps for folks with files larger than 4k, but that
> solves problems like your 8g file that is disk-access-dominated.

OK. I've actually done things a little differently than your suggestion, since I discovered that apparently 'min' isn't any sort of built-in or standard C function. I noticed that the call stack looked like A -> B -> C -> D and E. Since D :: PackedString -> Bool, I just plopped in a 'takePS 4096'.

I think technically this won't fit 'at or under a page', because this would pass has_funky_char a list of 4096 Word8s, and surely there's overhead there? I dunno.

(Also, I've sent the patches for this today, at least, I thought I did.)

> > Until we hear back from him, all we can do is tweak things and preserve
> > the current behaviour of inspecting each byte in the file, and there our
> > options are limited - basically changing a memchr to memrchr (or a
> > rewrite of memrchr if we want to be portable), changing the two loops to
> > a single loop (see my other email to dons), or possibly doing something
> > funky like inlining some asm.
>
> A single loop does sound like it might be the best option (although I'd be
> interested in seeing timings: I'm rather hesitant to drop use of the
> portable highly-optimized functions if we can avoid doing so).  It is
> certainly true that a single pass through the contents is always going to
> be easier on the cache.  But if we limit the size scanned to 4k, then two
> optimized traversals (even both in the forward direction) may be faster
> than a single less-optimized traversal, since 4k will easily fit in any L2
> cache, so we're only talking about L1 cache now.  At which point quite
> likely memrchr is slower than memchr on some architectures...
> --
> David Roundy

I did a bunch of timings using Jaak's single loop, once I had made the foregoing changes. Intuitively, it looked to me like it had more consistent timings, but was slower. But honestly, the numbers all fell in more or less the same range, so I just restored the memchr || memchr version. No real point to changing.

--
gwern
Pesec E.T. InfoSec Stanley Firewalls EO propaganda Type SERT Taiwan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.osuosl.org/pipermail/darcs-users/attachments/20080428/e2d2a06f/attachment.pgp