[darcs-users] DRAFT Proposal: Navigating the space of versions using Tree hashes

Thu Sep 9 03:26:50 UTC 2010

On Mon, Sep 6, 2010 at 7:08 AM, Eric Kow <kowey at darcs.net> wrote:

> Thanks for posting the links!  Now that you have issue992 as background,
> would you mind researching prior discussions to help us avoid covering
> old ground?  Perhaps what would be most useful is a summary of proposals so far
> (note the one by David) and what how your proposal differs from it.

That should be pretty clear if you read my proposal or glance at the
older proposals.  My proposal does not use the context to generate the
hash but instead the pristine state.  Everything else is an
implication of that.

If you haven't read my proposal (and you indicated you haven't) then I
guess you might not spot that.

> I'm not sure what the second hash corresponds to, if it's pristine or if
> it's just the hash of the inventory contents, but in either case they
> would both uniquely identify equivalent repository states, unless I
> misunderstand.  If anybody finds out what the hash is in tag
> subinventory files, maybe updating http://wiki.darcs.net/Hashes could be
> a good idea

I'm pretty sure that prior to using hashed storage darcs never stored
a hash of any states.  David would have treated a hash of a tag to be
equivalent in meaning.

> The trickiness is that Darcs repositories can often have patches in
> different order from each other -- that's the whole point -- but I'm not
> sure how important that is in practice.  I think a lot of the previous
> discussion goes back into how to deal with this fact, and that minimal
> contexts tend to come up.

My proposal intentionally avoids both of those things (order of
inventory and the mystical "minimal context", which has never been
shown to exist), as I was aware of those past discussions.  It avoids
those because that seems to be exactly where the discussion always
stagnated in the past.

> Sorry, I haven't gotten around to reading the meat of your proposal yet.
> Hopefully later :-)

I'm assuming you didn't read the IRC discussion either.  We came to
the conclusion that it's actually really easy to get identical
pristine states from different *sets* of patches.  Meaning, not just
different permutations of the inventory.  This essentially means it's
meaningless to talk about identical pristine state.  When I wrote the
proposal I was well aware it's possible but I was thinking that such
cases are rare enough to ignore.  Petr pointed out that conflict
resolution is a time when it's actually kind of likely to happen.

It was generally agreed that hashing the inventory leads to the right
solution.  It's hard to say what order the inventory will be in, so as
a relaxation of hashing the inventory, it was suggested that we could
sort patchinfos and then hash the inventory.  Hashing the entire
inventory might be expensive.  The proposed actual implementation
schemes use optimizations such as only hashing the inventory after the
most recent tag. Personally, I'm not convinced that those
optimizations are guaranteed to work.  Perhaps incremental hashes of
the inventory could be used instead.  Also, Initially I thought that
sorting the patch infos before hashing was problematic, but I'm less
concerned about that now.  I was thinking it was an aggressive over
approximation of the property that your repo is defined by the set of
patches.  Now I think it may be fine.

So it seems that this proposal doesn't work but maybe the old proposal
of using hashes of patchinfos could be resurrected in a way that
doesn't lead to dead-end discussions about "minimal context".

Jason