[darcs-users] Re: Whitespace in filenames

Fri Aug 8 00:57:06 UTC 2003

> It would mean that darcs get couldn't retrieve that repository on any
> platform that has the problem with an old filename.  This could be worked
> around by writing a version of get that doesn't require that the repository
> be consistent.
> 
> One way to do this would be to implement a sort of checkpointing-like
> scheme in which we would store (optionally and perhaps only occasionally)
> "snapshots" of the repository at tags.  This would allow doing a darcs get
> of without downloading the entire repository history, which would be nice,
> since I don't care for the idea that get is an O(n) process where n is the
> age of the repository.  Actually, I think that it is likely that eventually
> I'll get around to implementing this idea, which could be used to
> effectively throw away old repository history in a nice controlled manner.
> 
> I had originally been thinking that I'd want to implement this by storing a
> snapshot tarball, but after my recent testing with large repos I think I
> can store the snapshot as one big patch, which is much nicer, since it
> means that it can store any data that darcs is made to support (and no
> more--which is as it should be).  I've been looking into using zlib to
> compress patches, which would make the storing and transferring of large
> snapshot patches somewhat less painful...
> 
> But for the moment, the painful situation is that a repository with an
> invalid filename anywhere in its history cannot effectively be used.  You
> could hack around this, for example by copying the repository manually,
> but then you wouldn't be able to use check to see if you had done so
> correctly (since you'd still have a corrupt repo).

Sounds like darcs could use array-indirection patching. I was going to
suggest it as a speed improvement but it would also solve this problem.

The basic idea is that you apply a sequence of patches not directly to
the repository files, but to a list of indexes into a array of lines.
basically, when you come across a hunk which adds new lines, you add the
lines to the end of the array and add indexes to those lines into the
list which represents that file. you can apply as many patches as you
like and they all just end up modifying these lists of indexes. when
done, you iterate over each list in order, dumping the the lines they
point to to the appropriate file. no intermediate versions of the
repository need to exist and composing patches is fast-fast. I have seen
this technique used to drastically speed up single patches, with many
composed patches like those used in darcs, the improvement should be
much greater. A string with the files eventual filename should be kept
alongside it's list of line-indexes and would be manipulated as
appropriate by the various file rename patchparts.
        John

-- 
---------------------------------------------------------------------------
John Meacham - California Institute of Technology, Alum. - john at foo.net
---------------------------------------------------------------------------