[darcs-users] darcs patch: fpstring.c: switch a memchr for memrchr

David Roundy droundy at darcs.net
Sat Apr 26 11:30:32 UTC 2008


On Fri, Apr 25, 2008 at 07:40:49PM -0400, Gwern Branwen wrote:
> That's a workable solution, but I think I already suggested that we
> simply limit how much of the string we look at. This is fundamentally a
> trade-off that I think only Dr. Roundy can answer: is it alright to risk
> increasing the false negative rate for is_binary, for even a potentially
> significant space-time improvement?

I think this is a reasonable option.  Why not look (as I think Jason
suggested) in min(4k,len) of the contents? I think 4k is quite a resonable
cutoff, as it's probably at or under a page on all architectures, which
means that most likely it takes no extra IO to read 4k instead of less.  Of
course, this only helps for folks with files larger than 4k, but that
solves problems like your 8g file that is disk-access-dominated.

> Until we hear back from him, all we can do is tweak things and preserve
> the current behaviour of inspecting each byte in the file, and there our
> options are limited - basically changing a memchr to memrchr (or a
> rewrite of memrchr if we want to be portable), changing the two loops to
> a single loop (see my other email to dons), or possibly doing something
> funky like inlining some asm.

A single loop does sound like it might be the best option (although I'd be
interested in seeing timings: I'm rather hesitant to drop use of the
portable highly-optimized functions if we can avoid doing so).  It is
certainly true that a single pass through the contents is always going to
be easier on the cache.  But if we limit the size scanned to 4k, then two
optimized traversals (even both in the forward direction) may be faster
than a single less-optimized traversal, since 4k will easily fit in any L2
cache, so we're only talking about L1 cache now.  At which point quite
likely memrchr is slower than memchr on some architectures...
-- 
David Roundy
Department of Physics
Oregon State University
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.osuosl.org/pipermail/darcs-users/attachments/20080426/f1580bf4/attachment.pgp 


More information about the darcs-users mailing list