[darcs-users] Re: Whitespace in filenames

David Roundy droundy at abridgegame.org
Fri Aug 8 17:45:36 UTC 2003


On Thu, Aug 07, 2003 at 05:57:06PM -0700, John Meacham wrote:
> > It would mean that darcs get couldn't retrieve that repository on any
> > platform that has the problem with an old filename.  This could be worked
> > around by writing a version of get that doesn't require that the repository
> > be consistent.
> > 
> > One way to do this would be to implement a sort of checkpointing-like
> > scheme in which we would store (optionally and perhaps only occasionally)
> > "snapshots" of the repository at tags.  This would allow doing a darcs get
> > of without downloading the entire repository history, which would be nice,
> > since I don't care for the idea that get is an O(n) process where n is the
> > age of the repository.  Actually, I think that it is likely that eventually
> > I'll get around to implementing this idea, which could be used to
> > effectively throw away old repository history in a nice controlled manner.
> > 
> > I had originally been thinking that I'd want to implement this by storing a
> > snapshot tarball, but after my recent testing with large repos I think I
> > can store the snapshot as one big patch, which is much nicer, since it
> > means that it can store any data that darcs is made to support (and no
> > more--which is as it should be).  I've been looking into using zlib to
> > compress patches, which would make the storing and transferring of large
> > snapshot patches somewhat less painful...
> > 
> > But for the moment, the painful situation is that a repository with an
> > invalid filename anywhere in its history cannot effectively be used.  You
> > could hack around this, for example by copying the repository manually,
> > but then you wouldn't be able to use check to see if you had done so
> > correctly (since you'd still have a corrupt repo).
> 
> Sounds like darcs could use array-indirection patching. I was going to
> suggest it as a speed improvement but it would also solve this problem.
> 
> The basic idea is that you apply a sequence of patches not directly to
> the repository files, but to a list of indexes into a array of lines.
> basically, when you come across a hunk which adds new lines, you add the
> lines to the end of the array and add indexes to those lines into the
> list which represents that file. you can apply as many patches as you
> like and they all just end up modifying these lists of indexes. when
> done, you iterate over each list in order, dumping the the lines they
> point to to the appropriate file. no intermediate versions of the
> repository need to exist and composing patches is fast-fast. I have seen
> this technique used to drastically speed up single patches, with many
> composed patches like those used in darcs, the improvement should be
> much greater. A string with the files eventual filename should be kept
> alongside it's list of line-indexes and would be manipulated as
> appropriate by the various file rename patchparts.

This actually sounds similar to what darcs actually does most of the time.
The catch is that it means that you have to hold the modified lines all in
memory while doing this.  For a few patches this is all right.  But if all
the files in the repo are modified, it means holding the entire repo in
memory, which is not usually a good plan.  For this reason, when you do a
darcs get, it applies each patch to disk in sequence.  If you did a pull,
it would get all the patches and compose them in memory (as you describe
pretty much, except using lists of lines rather than arrays, so it's a bit
slower), and then write to disk.

So yes, I could write a variant of get (chosen by a command-line flag) that
wouldn't write intermediate states to disk, so that people could deal with
repos that have states in the past that can't be expressed on their
filesystem.  Or such people could do

darcs inittree && darcs pull -a badrepo

which would have the same effect.

There would still be a problem when running darcs check, but that's as it
should be, since you do have a corrupt repo (for at least one definition of
corruption).
-- 
David Roundy
http://www.abridgegame.org




More information about the darcs-users mailing list