[darcs-devel] darcs patch: add an unused RepoFormat module. (and 13 more)

Juliusz Chroboczek Juliusz.Chroboczek at pps.jussieu.fr
Sat Jul 9 17:07:21 PDT 2005


> I don't really understand much of the git code...

Okay, here's a quick explanation.  I guess I'll have to document all
of this when I finally get around to finishing the merge.

A GitFileInfo is a very lightweight structure that contains just a
filename, a SHA1, and a file mode.  File contents are held in a
different structure, the GitFile, which contains file contents and
``type''; the type can be one of ``blob'' (an array of bytes),
``tree'' (conceptually a sorted list of GitFileInfo, used for
implementing directories) or ``commit'' (which is basically a log
message and a pointer to a tree).

I've tried very hard to avoid keeping the file contents in memory, so
GitFiles are computed on the fly by gitFileInfoToGitFile whenever
necessary.  (This could be made more efficient by using a weak
hashtable, but I don't know how to do weak hashtables in Haskell).

When a tree is read from the repository, it is clean.  It will get
dirtied by applyToGitSlurpy.

A GitSlurpy has two kinds of nodes: clean nodes, that contain just a
GitFileInfo, and dirty nodes, which contain the full FileContents.
PurifyGitSlurpy walks a GitSlurpy, writing out data to disk and
discarding the in-memory FileContents.

There's an additional twist, which is the Git cache.  Roughly
speaking, the Git cache is just a list of pairs (gfi, mtime), where
gfi is a GitFileInfo and mtime is the pristine modtime.  Right now,
darcs-git never modifies the Git cache, it just consults it when doing
gitSlurpyToSlurpy.  It will survive the absence of the Git cache, but
of course then it will do full file comparison rather than just
comparing mtimes.  (You can build an up-to-date Git cache by using
read-tree.)

For future reference, here's what needs to be done:

 - restore the writing functionality (which you merged in Git.lhs, but
   not in GitRepo.lhs);
 - restore the merge functionality;
 - fix the laziness issues;
 - work out what to do with the ``committer'' and ``author'' duality
   in Git;
 - implement writing to the Git cache in sync_repo;
 - implement writing Darcs mergers as Git merges (?).

> data Slurpy = SlurpDir !FormatSpecificData !FileName [Slurpy]
>             | SlurpFile !FormatSpecificData !FileName FileContents

Hmm...  That's an idea indeed.

Then you'd have

  data Match = DoesMatch | NoMatch | UnknownMatch

  matchFormatSpecificData :: FormatSpecificData -> Match

and that would avoid having explicit knowledge of matching, and allow
backend-specific heuristics to avoid comparing files.

>> By the way, there's the invariant that the children of a dirty tree
>> node are necessarily clean;

> That seems odd.  Why is that? I'd have thought that they'd be necesarily
> dirty, if anything...

Sorry for that.  The children of clean are clean.  Or, equivalently,
the parent of dirty is dirty.

                                        Juliusz




More information about the darcs-devel mailing list