[darcs-devel] Proposal for new format to store patch files

Ian Lynagh igloo at earth.li
Thu May 12 19:12:32 PDT 2005


On Thu, May 12, 2005 at 12:22:34PM -0700, Jason Dagit wrote:
> 
> I appologize if this topic comes up often, but the current patch
> format is very simple and I think we could do better.  (I've been
> bitten by the efficiency of the current way many times.)

A lot of the problems are to do with the way we have been using what we
have rather than the format itself. A number of improvements have been
made in darcs-unstable, and a few more have been discussed here.

> My propsal is to use a format similar to ar (or we could maybe use ar
> as is).
> 
> Currently, binary files are stored ASCII-enarmored inside the patch
> file which is then gzip'd.  This can be extremly slow to process if
> the binary file is large.  And it's not only a problem when the file
> needs to be extracted from the patch, but when darcs needs information
> about entries in the patch file that come after the large entries.  My
> proposed way of dealing with this is to have some sort of length
> encoding so that file system calls such as lseek can be used to jump
> to the next file in what amounts to essentially constant time (all the
> OS needs to do is change an offset associated with the file descriptor
> and then issue a read).

When applying patches etc this isn't a problem.

If we just want to know what files a patch affects, or what patches
affect a file, then we should have a separate index for this info.
(we have to be careful re: renames, of course).

It might be worth doing something like what you suggest for when we want
to see exactly how patch P affects file F.

I think we'd just want our own header at the start though, so rather
than having

    gzip(foo_hunks, bar_hunks)

we would have

    gzip(foo=0\nbar=sizeof(foo_hunks)\n, EOH, foo_hunks, bar_hunks)

(this would also give us one of the above indices).
(also, we have to be careful that either all the bits for one file are
together or that we get them all. Probably best to try to do both).

This would have the advantage that you can still open the files in a
pager/editor.

> Another feature I hear people asking about from time to time is
> storing of permission information (or more generally meta-data).  This
> could also be easy to do using the ar format which has fields for
> storing a small bit of meta-data.

Storing the info isn't the problem here, it's working out what to do
with it.


Thanks
Ian





More information about the darcs-devel mailing list