[darcs-users] darcs patch: fpstring.c: switch a memchr for memrchr

Gwern Branwen gwern0 at gmail.com
Fri Apr 25 23:40:49 UTC 2008


On 2008.04.25 16:32:53 -0700, Jason Dagit <dagit at codersbase.com> scribbled 5.7K characters:
>    On Fri, Apr 25, 2008 at 3:53 PM, Gwern Branwen <gwern0 at gmail.com> wrote:
>
>      On 2008.04.25 13:42:04 -0700, Jason Dagit <dagit at codersbase.com> scribbled 1.9K characters:
>      >    On Fri, Apr 25, 2008 at 1:26 PM, Don Stewart <dons at galois.com> wrote:
>      >
>      >      gwern0:
>      >      > Fri Apr 25 16:01:53 EDT 2008  gwern0 at gmail.com
>      >      >   * fpstring.c: switch a memchr for memrchr
>      >      >   See <http://bugs.darcs.net/issue814>; memrchr speeds up is_funky quite a bit and
>      thus
>      >      helps whatsnew -s. It doesn't seem to break (any more) tests.
>      >
>      >      memrchr isn't available on BSDs.
>      >
>      >    Furthermore, I didn't see enough tests in Gwern's bug update to convince me that the
>      >    difference in run-times is attributable to the use of memrchr.
>      >
>      >    Jason
>
>      I'm not sure how I could've tested it any more: I used a fresh darcs repo, made the simple
>      change, compiled and installed, and tested it on a 9.5GB file 3 or 4 times, each time
>      getting a roughly 40 or 50 second result from 'time'; I then in the same repo tested three
>      or four times using my system's darcs binary (or installing from one of my other darcs darcs
>      repos) and get roughly 1m30s timings.
>
>      A single change, which leads to an expected difference multiple times; what did I miss?
>
>    I was commenting on this:
>    http://bugs.darcs.net/msg4338
>
>    Which doesn't indicate (or if it does I'm overlooking it) that you ran each test multiple
>    times, either to get the standard deviation or an intuitive sense of it.

Well, I went for the median, specifically.

>    Looking that over, I see this as the time it takes to run without changes:
>    (; darcs whats -s; ) 2.03s user 4.43s system 7% cpu 1:27.42 total
>
>    I'll just assume then that this timing is representative.
>
>    And then I see in your previous email that the times range from ~53 seconds to ~68 seconds.
>    That's a pretty good improvement in time although it's for a 9.5 GB file so on smaller files
>    it may be negligible.  But, yeah I'm all for this change.  I think I would prefer a change
>    where we don't read more than 1 page of memory (remembering that we mmap files) looking for
>    the magic bytes.  If we look at, say 4k, of a file and don't see funky bytes, probably darcs
>    should give up so that we don't cause the OS to swap everything out.
>
>    Jason

That's a workable solution, but I think I already suggested that we simply limit how much of the string we look at. This is fundamentally a trade-off that I think only Dr. Roundy can answer: is it alright to risk increasing the false negative rate for is_binary, for even a potentially significant space-time improvement?

Until we hear back from him, all we can do is tweak things and preserve the current behaviour of inspecting each byte in the file, and there our options are limited - basically changing a memchr to memrchr (or a rewrite of memrchr if we want to be portable), changing the two loops to a single loop (see my other email to dons), or possibly doing something funky like inlining some asm.

--
gwern
E.T. CDA SAI LEETAC Hague 50BMG Meade SIGDASYS guest 26
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.osuosl.org/pipermail/darcs-users/attachments/20080425/9e2fb44f/attachment.pgp 


More information about the darcs-users mailing list