[darcs-devel] [issue2578] less weak and more useful hash for repo state

Wed Mar 21 03:16:21 UTC 2018

Ben Franksen <ben.franksen at online.de> added the comment:

Thanks Guillaume, the links are helpful and interesting reading. Yes
what I proposed seems to be the naive inventory version, basically. And
indeed, it occurred to me independently that it is not
"commutation-friendly" as the wiki page formulates it. But perhaps this
is not a big problem in practice. It limits the number of repos from
which Bob can get a positive reply, though.

Re compression: I made a mistake in the text (but not in the script). We
cannot first gzip then take the hash, this does not give reproducable
hashes. Anyway, compression doesn't buy us much.

My idea was that we store the state every time we mutate the repo, so we
have a handle for every past state. If we want to be able to restore a
state given its ID, we need to store the list of hashes in the inventory
(in the same order).

As David points out in the discussion of the ticket, this requires
O(N^2) space in the "length" (number of patches) of the repo, but this
is a bit too pessimistic. More precisely, for an average length N
(number of patches) of the head inventory and M repo-mutating operations
we need N*M*hashsize bytes to store the states, where hashsize=75. (This
assumes we store only the hashes, not the meta data.)

For the current screened we have 467 patches since the last tag. For a
public repo we can approximate M with the number of patches in the repo
(no amends, no reordering, only patch addition) which is currently
~12000 for us. If we had stored every state and assume an average length
of the head inventory of 500 we'd need 500MB disk space. That's a lot.

To save space, compression is one tool, but it gives us only a constant
factor. A more effective idea is to store the states so that we get
maximal sharing. I think this would reduce the space requirements to
2*M*hashsize for a linear repo (one hash for each patch and one hash for
its parent). We could also limit the length of the head inventory to
some constant. So we store an inventory not only when tagging but more
often. This would require that we mark clean tags specially.

The "minimal context hash" sounds interesting but I am missing an
explanation how exactly it would work. AFAIU contexts are always
understood to be relative to a specific change and minimising it means
to commute as many patches as possible past this specific patch (but
only up to the next clean tag).

__________________________________
Darcs bug tracker <bugs at darcs.net>
<http://bugs.darcs.net/issue2578>
__________________________________