[darcs-users] Which commands can change output of darcs query contents for a particular hash?
Max Battcher
me at worldmaker.net
Sat Jul 11 05:58:44 UTC 2009
Gwern Branwen wrote:
> On Fri, Jul 10, 2009 at 6:43 PM, Max Battcher<me at worldmaker.net> wrote:
>> I've got a partial implementation of this myself in my "darcsforge" code.
>> The pristine object names/hashes as cache keys seems a useful tool for
>> caching historical data. Dealing with new integration states (er, revisions)
>> is easy (during a cache walk through pristine), and I was fretting how to
>> deal with historical integration states but just recently had some ideas
>> involving context file caching.
>>
>> I certainly think that modeling revision tracking around pristine.hashed is
>> the current best way forward. I do feel for those that haven't thrown out
>> the bathwater and trying to shoehorn such a solution into a traditional
>> revision number/hash-based design. I think gitit would be a different animal
>> if it were designed with darcs in mind from the beginning.
>
> I'm just a humble filestore dev, but when I look into pristine.hashed,
> aren't I seeing hashes like that filestore/gitit use?
Er... No. I'll try to explain as best I can: Basically, gitit/filestore
right now is using the hashes of individual patches to represent files
at different states in time. In darcs that doesn't quite match actual
file states because patches can be reordered, particularly during
merges, and may not 1-to-1 correspond to useful/interesting/'real' file
states.
Pristine.hashed files are specific hashes based on files at specific
points in time with specific contents. Pristine hashes are useful as
cache keys because that is essentially what they are: the pristine is
darcs' cache for what git/hg might call HEAD or TIP. The pristine hashes
don't encode any revision information, so they don't work entirely on
their own, but they serve as the current best cache naming scheme in
darcs for storing historical versions of files.
Perhaps an illustration might help... I'll use letters for patches (and
their hashes) and numbers for pristine hashes. A simple repo might see
a few early patches like so:
Patch / a.txt b.txt
A 00 01 02
B 03 01 04
C 05 06 04
(Keeping in mind that in reality the numbers here are UUIDs/hashes, with
timestamps, and during an operation many of the pristine objects listed
above will be ephemeral and garbage collected quickly leaving only the
most current (pristine).)
So here we see three patches, one which touches both files a.txt and
b.txt (with the containing root directory thrown in for completeness),
and two which touch only one file or the other. Across the three patches
you end up with 7 pristine objects (4 if you ignore the root directory
object). ``darcs show contents b.txt --match C.hash`` will, for the time
being, return pristine object 04, but due to patch reordering that may
not always be the case. A simple, similar contrived, that may not
necessarily reflect reality (I'm not actually testing this, merely using
an illustration), would be pulling in patches D and E from a branch that
doesn't have patch C, you could quite easily end up with:
Patch / a.txt b.txt c.txt
A 00 01 02
B 03 01 04
D 07 01 08 09
E 0a 01 0b 0c
C 0d 06 0b 0c
At this point, because darcs commuted D and E before C you suddenly get
``darcs show contents b.txt --match C.hash``, exactly as before, returns
pristine object 0b, not 04! This obviously not what you want in a cache
key, and it can (and will) happen during a push or pull (commutation is
a fundamental key in the way darcs operates), not just optimize
--reorder. The file states at any given patch hash should not be
considered idempotent: they can and will change. Patch hashes are
currently "good enough" for darcsweb and Trac+darcs, and gitit could
probably keep using them, but don't trust that a file state for a given
patch hash will always remain the same. (That is, I for one would not
use "lifetime" caching of file states based upon just the file name and
a patch hash.)
If you want a glimpse at what I've been working on, during small spurts
of spare time: I've been trying to find the best way(s) to capture
interesting repository states and snapshot pristine as my backing cache.
Unfortunately darcs doesn't (yet) make this easy and won't store this
information for you; there certainly is no current way to query for
specific 'archived' pristine objects, for instance (other than by
context file). (...and worse there is no way at all to query for
useful/meaningful historical states other than tags.) So rather than try
to represent both scenarios above by referring to patch C's hash, I'm
hoping to have some nice interface to provide something like:
Repo State a.txt b.txt c.txt Context
...
2009-06-10 Pulled in first three patches 06 04 [A, B, C]
2009-06-11 Pulled in D and E from branch 06 0b 0c [A, B, D,
E, C]
...
Wouldn't you know it, but my "integration log" thing here looks somewhat
like a merge log... I know others have pushed for darcs to store an
explicit merge log before, and maybe it is time to revisit those
discussions. Certainly we can build such a beast as a thirdparty
component and iterate it without it being a necessarily being a part of
darcs' codebase.
--
--Max Battcher--
http://worldmaker.net
More information about the darcs-users
mailing list