[darcs-devel] [patch1887] remove the size prefix when writing hash... (and 14 more)

Ben Franksen bugs at darcs.net
Mon Aug 26 06:18:14 UTC 2019


Ben Franksen <ben.franksen at online.de> added the comment:

> Regarding dropping the size prefix, there's some history to its
> existence that I can't really remember or fully understand from
> searching old emails. The size prefix was dropped for pristine in 
> 2010:
> 
> http://bugs.darcs.net/patch156

Hm, interesting read, though mostly of historical interest.

> I guess my main questions about this would be
> (a) what do we gain by dropping it

Code gets simpler and more uniform (also slightly more efficient). Same
for the on-disk representation.

One practical advantage (not yet implemented in the patch I sent) is
that we can now share all pristine files with the cache, not just the
top-level pristine tree. We cannot currently do that, because of the
difference in format, unless we pull the case distinction into
Darcs.Repository.Cache, which would further complicate the code in that
module. Using the cache for all pristine files would speed up cloning of
unpacked repos (over http) quite a bit, I think.

> and
> (b) how will the transition work in existing repositories (will
> they end up with a mix of formats?) and in the cache (will we
> lose sharing?)

Note we already read both formats. The only change would be that we
write without size-prefix.

But yes, with the patch as I sent it, there will be a mix in existing
repos and the cache... for a while. There will also be a certain amount
of duplication of files in the cache.

We could make the transition even less noticeable by hard-linking old
and new formats whenever we find both. However, this means eventually
getting rid of the old size-prefixed files will be a bit more difficult.

Excursion:

The way 'darcs optimize cache' currently works makes it practically
unusable. Searching the home directory for darcs repos is extremely
slow, while manually listing all the repos is completely out of the
question for anyone who has any actual work to do. I guess what people
do instead is to just rm -rf ~/.cache/darcs and accept that the next few
clones take a bit longer than usual.

I think we should just delete all files in the global cache that have a
hard-link count of one. This would make it a pretty fast operation which
means we could perform it implicitly on a regular basis, say, every 100
times a darcs command accesses the cache. This would help keeping the
size of the cache at a more reasonable level, even in the long run. Yes,
this removes cached files for repos that are not on the same file
system. For people who are regularly working in this kind of setting
(pretty uncommon nowadays, I guess) we can add an option to disable the
automatic garbage collection.

If there are hard-links between differently named files in the cache,
this becomes a bit more complicated and less efficient: we'd also have
to search for files with a hard-link count of two, then check if there
are both versions in the cache and that they are linked, before deleting
both.

__________________________________
Darcs bug tracker <bugs at darcs.net>
<http://bugs.darcs.net/patch1887>
__________________________________


More information about the darcs-devel mailing list