[darcs-users] soc progress 5

Petr Rockai me at mornfall.net
Tue Jun 23 21:46:42 UTC 2009


Hi!

Last week has been more active again. I have mostly finished my 2.3 agenda, I
have been thinking about the repository format and I have started doing some
post-2.3 work as well.

In the pre-2.3 department, these have been mostly API cleanups and finishing
bits. There are also few pieces missing in the puzzle. I wanted to add a magic
word to the start of the index, so we could quickly identify an old (or future)
version and discard it immediately (triggering a full index rebuild). This is
not particularly destructive, since we can easily use a magic word that won't
match any realistic index start (for the current format). The other thing that
missed the 2.3 beta 1 is the endianity conversion for the on-disk format. This
can be done together with the magic word.

For repository format, so far I have been thinking about how to efficiently
pack mostly static data (think git object database) on-the-fly (as opposed to
manual, git-gc style approach) while avoiding excessive re-downloading (and
re-packing). So far, the best I could think of was that I can keep say at most
16 objects in completely unpacked form. When this threshold is reached, I take
8 of these files and pack them up into a single indexed object (with these 8
items as sub-objects). Now I can have a fixed-size header, keeping hashes and
offsets (and sizes) of those 8 sub-objects. This would work recursively:
composed objects could be again composed to bigger composed objects. This way,
we would have around 16 files on average representing the whole repository. It
would also be easy to only download the relevant parts of the newly appeared
files from remote repositories: we can grab the header and then the unknown
sub-objects with http range request. We would also need to map from primitive
object hashes to their current locations (basically all their parent
patches). I'll have to think more about a suitable data structure for this
purpose. Finally, we would still need a gc-style command, since an object
filesystem used by darcs would accumulate unreferenced garbage. Especially if
we also used the system for pristine cache. Moreover, the purely academical
N-ary tree approach would suffer from performance problems, so some
real-worldly hacks will be needed to make things work out in practice. (But the
tree structure should be useful to show some bounds on the complexity of
particular operations.)

Finally, for the post-2.3 bits. In darcs-hs, I have bitten the bullet and
flipped all unrecorded-state (basically pristine -> working copy diffing)
machinery over to Gorsvet's unrecordedState (implemented using Index and
hashed-storage). This might have introduced some performance regressions,
sadly. However, the thing now completely passes the testsuite (after I fixed a
bug in the mmap package... I have to submit a patch to the upstream
author). Nevertheless, this also means obliteration of a chunk of old code, and
a complete fix for the timestamp de-synchronisation issue of current
darcs. There's still a bunch of work to do, which would allow complete removal
of unsafeDiff and a bunch of related functionality.

Finally, changes for this week... hashed-storage:

  * Move darcs-specific utilities to separate module (Storage.Hashed.Darcs).
  * Export the TreeIO alias from Monad.
  * Also parametrise the Tree hashing function in readIndex.
  * Replace all unfold terminology with expand (breaks API).
  * Remove unused bit in Index.
  * Fix a silly bug in AnchoredPath parents.
  * Fix compilation of tests.
  * Further simplify AnchoredPath parents.
  * Do not forget to include Storage.Hashed.Test in distribution.
  * Fix AnchoredPath parents *again*.
  * Bump version to 0.3.3.1.
  * Fix build with GHC 6.8.2 (needs extension field in cabal). Bump version.

... and darcs-hs:

  * Basic "show index" implementation.
  * Also curse haskell_policy in Czech.
  * Clean up unused bits in Darcs.Gorsvet.
  * Use TreeIO alias in instance declarations (do not spell out the type).
  * Import darcsFormatHash from Storage.Hashed.Darcs.
  * Update to reflect Index API change, provide darcs-specific readIndex in Gorsvet.
  * Unfold has been renamed to 'expand' in Storage.Hashed.Tree.
  * Also provide "darcs show pristine" to go with darcs show index.
  * Put blank lines between command groups in "darcs help".
  * Cut down descriptions, so that darcs help does not wrap on an 80-column TTY.
  * Make "darcs clone" a hidden alias for "darcs get".
  * Flip "darcs changes" to index-based diffing.
  * Flip "darcs mark-conflicts" over to index-based diffing.
  * Use index-based diffing in Remove.
  * Flip AmendRecord to index-based diffing, too.
  * Use index-based diffing in unrevert.
  * Make revert use index-based diffing.
  * Also use index-based diffing in unrecord/obliterate.
  * Provide readRecorded in Gorsvet as well.
  * Factor out applyToTree in Gorsvet.
  * Use index-based diffing in "darcs wh -l".
  * Unexport get_unrecorded* from Repository, remove unused functions from
    Internal.
  * Move tentativelyMergePatches and friends to a new module, Repository.Merge.
  * Move add_to_pending to Repository, use unrecordedChanges.
  * Clean up unused bits from Repository.Internal.
  * Invalidate the index in add_to_pending, as it was getting rebuilt too soon.
  * Remove unused import from Gorsvet.

And I need to sleep now. I'm in Berlin now, so I'll be probably fairly
unproductive till about Saturday. I'll sort out the 2.3 beta 1 tomorrow, since
I really really need to sleep *now*. Goodnight!

Yours,
   Petr.

-- 
Petr Ročkai | http://web.mornfall.net
A physicist is an atom's way of knowing about atoms. (George Wald)


More information about the darcs-users mailing list