[darcs-users] GSoC: network optimisation vs cache vs library?

Eric Kow kowey at darcs.net
Thu Apr 15 13:07:04 UTC 2010

On Wed, Apr 14, 2010 at 17:42:53 -0400, Isaac Dupree wrote:
> Eric makes a fair point about focusing (on performance). But also the  
> existence of side-projects is what will keep us going, should  
> performance someday lead towards dead-end or completion or boredom.  I'm  
> not sure what role GSoC has in this.

Me neither.

I tend to think that the "side" projects (hunk editor, skip-conflicts,
UTF-8) are actually taking good care of themselves regardless of the
master narrative.

I think I should probably also provide my criteria for when I will
consider the "performance-obsessed" phase of Darcs hacking to come
to an end:

 - "daily" operations tend to complete in less than a second
   (well under way with the hashed-storage arc coming to a close)
 - darcs get http://... should no longer induce finger drumming
 - annotate / show contents --match hash becomes usable
 - all benchmarks unambiguously faster than when we started

This is not a complete list.  I just wanted to convey the idea that this
performance-hacking stuff is finite.  We're not trying to squeeze
every last microsecond out of Darcs here, just nudge it over into some
definition of "good enough".  Maybe one way to look at "good enough" is
performance being actually fairly snappy on GHC-sized repos.

Anyway, point being that there is a chance we'll see start to see a
shift in focus by mid 2011 (the usual dangers of excessive
prognostication apply here)

> As a darcs user, personally, I don't care about first impressions. It'd  
> be nice but not that important for darcs get/pull to be faster (This  
> would require new darcs to be deployed on the server, correct? -- in  
> addition to on the client).  On the other hand, I can't tell from Eric's  
> description *any* benefit that I would get from "cache cleanup"; as far  
> as I can tell my cache works just fine. Eric, can you elaborate, or show  
> us a link to the proposal, or something?

Yes, one aspect of the cache project is that it's a bit user-invisible.
The idea is to (a) improve code and internal design quality which
manifests in (b) reduce flakiness in edge cases and thus (c) improve

See http://wiki.darcs.net/GoogleSummerOfCode/2010-Cache for more.

My comments on it on the project: there are a handful edge cases which I
don't think our cache code handles well, which leads to seemingly
unpredictable behaviour in Darcs:


- issue1599 : obsolete sources cause Darcs to timeout on each fetch
- issue1176 : cache interferes with remote-repo flag
- issue1210 : global cache recorded in sources
- issue1159 : cache not adjusted wrt to local paths on remote server
- issue1503 : darcs should prefer local sources to remote ones
- issue1536 : repo-level cache should be bucketed [hard?]
- no ticket yet : interaction between cache and different filesystems?
                  particularly NFS, for example, hard-linking doesn't
                  work across fs boundaries, so support multiple global
                  caches and pick the one which lets you hard-link?

Adolfo's project would nominally be about the first issue (which leads
to the seemingly random behaviour of darcs pull taking forever), but I
hope that in working on that one issue will position him to tackle the
other issues or to re-think the cache code in general.

Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20100415/47694a70/attachment.pgp>

More information about the darcs-users mailing list