[darcs-users] repository state identifier(s) in "darcs show repo"

Tue Dec 15 20:18:01 UTC 2015

I think showing the XORed hashes is a good idea. We've previously held
back because of not being able to do the reverse mapping easily, but
something is better than nothing.

There's some discussion of using XOR on this StackOverflow post:

http://stackoverflow.com/questions/5889238/why-is-xor-the-default-way-to-combine-hashes

The two main criticisms of XOR there are that it doesn't preserve
ordering (which for us is a feature), and that XORing two identical
things gives a 0. I don't think we'll have two identical patch hashes in
the same repo, so that also seems fine.

One thing to make very clear: these hashes will *not* be secure - as you
can fake any patch with the right metadata but wrong content.

I'm not too keen on showing the pristine hash. I guess a good point of
comparison here would be git: does it ever expose just a tree hash (as
opposed to the full commit id)?

On 14/12/2015 16:48, Guillaume Hoffmann wrote:
> Hi everyone,
> 
> As said by David Leuschner in a previous mail, one of the shortcomings
> of Darcs is:
> 
> "it's not as easy to refer to a specific state of the repository using a hash".
> 
> As a developer I know that Darcs uses (internally) the pristine hash,
> which is a hash of the recorded working copy. However two repositories
> can have the same pristine hash and different histories (eg, one being
> a superset of the other with patches and their corresponding
> rollbacks, or one having tags that the other lacks). But it can be
> good enough for some purposes (lazy cloning).
> 
> Should "darcs show repo" show that hash?
> 
> Now most importantly, we need a hash that would identify a set of
> patches independently of reordering, since it's what Darcs considers
> the history of a repository. Doing it right, eg building and hashing
> the dependency graph of all patches, is costly. Moreover we do not
> have any infrastructure to retrieve a set of patches from such hash.
> (That's the scenario in http://darcs.net/Ideas/ShortSecureId ).
> 
> So can we just have one that would enable us to quickly check that two
> repos have the same patches, ignoring reordering?
> 
> I propose a simple checksum: XOR all patch metadata hashes!
> Probability of collision should be low enough since patch metadata
> hashes are good hashes. Calculating the XOR is as fast as reading the
> inventories of the current repo (which can be lazy) plus the overhead
> of generating and XOR'ing the hashes.
> 
> Darcs itself would not need to store this XOR, it seems. But there
> could be many uses of it by third-party tools, on the other hand.
> Darcsden could show it for comparison purposes. A development team
> could maintain a XOR-to-patchset map to identify repository states
> encountered by its members. Let the tools emerge later!
> 
> So, should "darcs show repo" show that XOR?
> 
> Opinions?
> 
> Guillaume
> _______________________________________________
> darcs-users mailing list
> darcs-users at darcs.net
> http://lists.osuosl.org/mailman/listinfo/darcs-users
>