[darcs-users] Which commands can change output of darcs query contents for a particular hash?

Max Battcher me at worldmaker.net
Sat Jul 11 05:58:44 UTC 2009


Gwern Branwen wrote:
> On Fri, Jul 10, 2009 at 6:43 PM, Max Battcher<me at worldmaker.net> wrote:
>> I've got a partial implementation of this myself in my "darcsforge" code.
>> The pristine object names/hashes as cache keys seems a useful tool for
>> caching historical data. Dealing with new integration states (er, revisions)
>> is easy (during a cache walk through pristine), and I was fretting how to
>> deal with historical integration states but just recently had some ideas
>> involving context file caching.
>>
>> I certainly think that modeling revision tracking around pristine.hashed is
>> the current best way forward. I do feel for those that haven't thrown out
>> the bathwater and trying to shoehorn such a solution into a traditional
>> revision number/hash-based design. I think gitit would be a different animal
>> if it were designed with darcs in mind from the beginning.
> 
> I'm just a humble filestore dev, but when I look into pristine.hashed,
> aren't I seeing hashes like that filestore/gitit use?

Er... No. I'll try to explain as best I can: Basically, gitit/filestore 
right now is using the hashes of individual patches to represent files 
at different states in time. In darcs that doesn't quite match actual 
file states because patches can be reordered, particularly during 
merges, and may not 1-to-1 correspond to useful/interesting/'real' file 
states.

Pristine.hashed files are specific hashes based on files at specific 
points in time with specific contents. Pristine hashes are useful as 
cache keys because that is essentially what they are: the pristine is 
darcs' cache for what git/hg might call HEAD or TIP. The pristine hashes 
don't encode any revision information, so they don't work entirely on 
their own, but they serve as the current best cache naming scheme in 
darcs for storing historical versions of files.

Perhaps an illustration might help... I'll use letters for patches (and 
their hashes) and numbers for pristine hashes.  A simple repo might see 
a few early patches like so:

Patch  /    a.txt  b.txt
A      00   01     02
B      03   01     04
C      05   06     04


(Keeping in mind that in reality the numbers here are UUIDs/hashes, with 
timestamps, and during an operation many of the pristine objects listed 
above will be ephemeral and garbage collected quickly leaving only the 
most current (pristine).)

So here we see three patches, one which touches both files a.txt and 
b.txt (with the containing root directory thrown in for completeness), 
and two which touch only one file or the other. Across the three patches 
you end up with 7 pristine objects (4 if you ignore the root directory 
object). ``darcs show contents b.txt --match C.hash`` will, for the time 
being, return pristine object 04, but due to patch reordering that may 
not always be the case. A simple, similar contrived, that may not 
necessarily reflect reality (I'm not actually testing this, merely using 
an illustration), would be pulling in patches D and E from a branch that 
doesn't have patch C, you could quite easily end up with:

Patch  /    a.txt  b.txt  c.txt
A      00   01     02
B      03   01     04
D      07   01     08     09
E      0a   01     0b     0c
C      0d   06     0b     0c

At this point, because darcs commuted D and E before C you suddenly get 
``darcs show contents b.txt --match C.hash``, exactly as before, returns 
pristine object 0b, not 04! This obviously not what you want in a cache 
key, and it can (and will) happen during a push or pull (commutation is 
a fundamental key in the way darcs operates), not just optimize 
--reorder. The file states at any given patch hash should not be 
considered idempotent: they can and will change. Patch hashes are 
currently "good enough" for darcsweb and Trac+darcs, and gitit could 
probably keep using them, but don't trust that a file state for a given 
patch hash will always remain the same. (That is, I for one would not 
use "lifetime" caching of file states based upon just the file name and 
a patch hash.)

If you want a glimpse at what I've been working on, during small spurts 
of spare time: I've been trying to find the best way(s) to capture 
interesting repository states and snapshot pristine as my backing cache. 
Unfortunately darcs doesn't (yet) make this easy and won't store this 
information for you; there certainly is no current way to query for 
specific 'archived' pristine objects, for instance (other than by 
context file). (...and worse there is no way at all to query for 
useful/meaningful historical states other than tags.) So rather than try 
to represent both scenarios above by referring to patch C's hash, I'm 
hoping to have some nice interface to provide something like:

Repo State                                a.txt  b.txt  c.txt  Context
...
2009-06-10 Pulled in first three patches  06     04            [A, B, C]
2009-06-11 Pulled in D and E from branch  06     0b     0c     [A, B, D, 
E, C]
...

Wouldn't you know it, but my "integration log" thing here looks somewhat 
like a merge log... I know others have pushed for darcs to store an 
explicit merge log before, and maybe it is time to revisit those 
discussions. Certainly we can build such a beast as a thirdparty 
component and iterate it without it being a necessarily being a part of 
darcs' codebase.

--
--Max Battcher--
http://worldmaker.net


More information about the darcs-users mailing list