[darcs-users] darcs patch: Add forceHashSlurped that hashes the slu... (and 1 more)
David Roundy
droundy at darcs.net
Mon Sep 1 15:29:57 UTC 2008
On Mon, Sep 01, 2008 at 05:16:31PM +0200, Petr Rockai wrote:
> "David Roundy" <daveroundy at gmail.com> writes:
> > Another alternative would be to simply always keep hash files that
> > refer to file or directory content in a fixed directory. The latter
> > option wouldn't be a bad idea at all, as it would also promote sharing
> > of disk space. In fact, that would be my leaning at the moment. If
> > we keep all HashedIO files in "pristine.hashed", then the whole
> > problem can't occur, and I don't see any downside. Well, I guess
> > there's the downside that cleaning the pristine.hashed directory could
> > be more complicated, as we wouldn't know how many other files might be
> > stashed away there. But we only cleanRepository in optimize, and
> > optimize doesn't need to be either amazingly fast or amazingly simple.
>
> I have been thinking about that approach before, but have discarded it on the
> paranoia argument. It still might be worthwhile: the downside is, that repair
> would pollute the pristine.hashed with a very large number of stale files:
> basically, any intermediate repository states that we write out, which aren't
> that few. So we really want to call clean_hashed every now and then while
> running repair (I think ext3 might have issues with very large directories,
> too, and also for space reasons...).
Yes, this is a definite problem, and I'm not sure the best way to
handle it. It shows up in get as well, I believe, and I think the
recent change to stop calling clean_hashed in
finalizeRepositoryChanges (which *was* the right thing to do) broke
some of the "optimizations" in this regard that I put into the get
code, so a similar fix might be in order there.
But my gut feeling is that the solution (as you suggest) of calling
clean_hashed every now and then is the best that we can come up with
on short notice. Ideally we'd have some interface to apply all the
patches in the repository without either keeping everything in memory,
or leaving excess files on disk, or having to periodically do garbage
collecting. But I don't see how to do that, so we're stuck with
something sub-optimal (at least until someone has a better idea). And
I'm thinking that just cleaning periodically is a good interim
solution.
> > One major advantage of this approach is that it decreases the number
> > of arguments to these functions, which should make them simpler and
> > safer to use.
>
> True, and it doesn't actually matter too much to have newpristine at all, and
> while we don't update the root pointer in hashed_inventory, we should be all
> safe. (We might need to extend clean_hashdir to take a list of root hashes
> instead of a single hash, thinking of that.)
Right, accepting a list of root hashes would be a very good idea--but
that could be added later, I think.
> >> However, there is another issue with that approach. If we encounter hash
> >> failures in pristine, we might want repair to fix those -- which it will under
> >> forceHashSlurped as proposed, but won't under hashSlurped with a doesFileExist
> >> call in it.
>
> > I don't think this is a problem.
>
> (Well, I have had experience of seeing some hash failures in http://darcs.net
> recently, although I haven't investigated too much, as I had some repository
> corruption issues before, so I have been so far assuming it's debris from that
> incident. It might still be worthwhile to have ability to repair such
> corruption if it happens. Probably rm-ing any such files at start of repair
> would be enough.)
I know there are occasional hash failures (e.g. from partial or broken
downloads), but they ought to be reasonably automatically cleaned
up--if they aren't, then we should fix it so they are. We always
check the hash, and I think we try downloading the file from a remote
repo if it fails, so some of these things should be self-healing. We
probably also ought to automatically remove or rename hash files that
fail the hash check. But I think that these issues should be
fixed/fixable without special treatment in repair. In the worst-case
scenario, if the user deletes the file (or the entire contents of
pristine.hashed), repair should work fine.
David
More information about the darcs-users
mailing list