[darcs-devel] Idea: a new type of patch

David Roundy droundy at abridgegame.org
Sat Apr 30 05:44:07 PDT 2005


On Fri, Apr 29, 2005 at 11:43:32PM +0100, Ian Lynagh wrote:
> On Fri, Apr 29, 2005 at 06:18:28PM +0200, Juliusz Chroboczek wrote:
> >   Hunk <filname> <line> <n1> <n2>
> >   <n1 bytes of data><n2 bytes of data>
> 
> We also needs numbers of lines so we can commute it when possible
> without having to count newlines.

I'm thinking that we could have alternative on-disk representations of the
same patch--where users would have to explicitly request the new format, so
we'd have backward interoperability.

As far as the new hunk format goes, I think I'd lean towards something more
like

hunk-block <filname> <line> <lines1> <n1> <lines2> <n2>
old:
<n1 bytes of data>\n
new:
<n2 bytes of data>\n

which would be moderately more human-readable, and just as fast to parse.

I guess my proposed new Hunk !Int ([PackedString],Maybe PackedString)
... would also need to store the number of lines.  At which point maybe
we'd consider defining a type like

newtype SomeLines = SL {length_sl :: Int,
                        lines_sl :: [PackedString],
                        cat_sl :: (Maybe PackedString)}

which would basically be a FileContents, and we'd then have

Hunk !Int SomeLines SomeLines

and could write utility routines involving SomeLines.

> > > _patches/YYYYMMDDxxxxx -> _patches/YYYYMMDD/YYYYMMDDxxxxx
> > > (or perhaps drop the duplication?)
> > 
> > Again, I'd be more radical -- allow arbitrary filenames, store the
> > filename in the inventory.  This would allow different versions of the
> > client to choose where to put the patch file depending on what
> > filesystem they are running on.
> 
> I'm not hugely fussed as long as we cut down the number in one
> directory. I do like being able to find them by date, though.

Hmmm.  I really think we should make this change at the same time as we
introduce the hashed inventory.  On the other hand, if we change the
inventory to support

[Patch name...

]
YYYYMMDD/xxxxx
[Next patch...
]
YYYYMMDD/yyyyy

we could *later* modify xxxxx to be a hash of the patch contents, so there
wouldn't actually need to be a second format transition, except that darcs
would then have to deal with older repositories in which the filename
*isn't* a hash of its contents.  So I think I'd still rather do the two
changes in one go.

If we store the filename in the inventory, it isn't hard to make the
filename be the hash of its contents--except I guess that this would ruin
our lazy patch writing.  :( Ian, how hard is it to compute a sha1 "as you
go"? In particular, could you implement a

writeSha1File :: Doc -> IO PackedString

which writes (consuming the Doc as it goes) to a temporary file, computing
the sha1 of the file as it is written, and then renames the temporary file
based on the computed sha1, and returns that sha1 as its return value? You
could of course choose a different return type, such as String or FilePath
or even FileName.

Ideally we'd want two of these, one of which creates a gzipped file, where
the sha1 is of the uncompressed contents.

> > Do we have the necessary versioning in the repo and client to make these
> > migrations as easy as possible?

> No.  There was a discussion on that between David and me a good year
> ago, but we never converged on a consensus.  (David did suggest a
> solution that was significantly different from mine, and I never
> managed to grok its fulness well enough to implement it.)

Right, if I recall correctly, the big sticking point was that I wanted a
sort of keyword-based format specifier, which would specify an allowed
subset of features in the repository.

In the case of a "hunk-block" on-disk format, one would add a keyword
"hunk-block" to the format file of the repository, which would mean that
hunk-blocks are allowed.

My idea was that if a darcs can't understand all the keywords, it'll fail
to read the repository.  When writing to a repository, darcs looks at the
keywords to decide what format to use (e.g. whether to write hunk-blocks).

I imagine something like

_darcs/format containing:

keyword1
keyword2

Perhaps we could also allow either-or keywords which would indicate that
both must be written, but either can be read, so we'd have something like

old-format|hashed-format

which would mean that both formats must be written, but only one of them
need be read, which would presume that the hashed inventory file has a
different name, allowing for a transition.  This last bit is an idea I just
had, not from our old discussion (lest you think there was *too* much that
you didn't grok).  :)

I'm sending to the list a sketch of a repository format module for your
(and other people's) perusal.  I decided to make something a bit more
concrete, but stopped before finishing when I realized that it's strongly
overlapping with your darcs-git code, where you have identifyRepository and
all that.  That work seems like the perfect place to integrate this.  If
you'd take a look at it and see whether it seems useful, that would be
great.  The parsing is abominable (whitespace isn't ignoread and comments
aren't allowed).
-- 
David Roundy
http://www.darcs.net




More information about the darcs-devel mailing list