[darcs-users] David's darcs

Petr Rockai me at mornfall.net
Wed Jul 22 13:55:32 UTC 2009

Jason Dagit <dagit at codersbase.com> writes:
> Just curious, what is a "hard" merge?  Computationally difficult or is it a
> forceful merge?  If someone could look into why that hangs it would be valuable
> information I suspect.  I bet David would be interested in that bug report, for
> instance.
Hard merge is defined as one that makes darcs hang or crash. ;)

> If you run the ghc profiler on this merge, do you know where we spend all the
> time?
I'll try to get to that later, but I have other thing to get to first. :|

>     There are likely some choices:
>     - abandon current darcs core altogether
> Could you clarify how you define 'darcs core' in terms of modules?
Hard call. Basically all that defines things about patches. This is
unfortunately not very confined within darcs.

>     - try to fix it ourselves

> It sounds like you've looked at this the most recently, do you have any ideas
> on how the fix would go?  get_extra has always been a nasty little bit of
> code.  I've tried to document some of the assumptions I'm aware of in that bit,
> but I've always suspected it to be a little brittle.  It's also crucial to
> optimize that code because it is such a common code path and needs to do some
> real work.
The whole core is a little muddy to me. I don't expect to delve into this
myself, though.

>     - try to merge David's work manually (and hope that he eventually fixes the
>      bugs in the core we care about)
> A manual transplant (aka diff + patch) may be a good idea.
Basically the only reasonable way to get there for now.

> I'm not sure what you're saying :)  You feel the current core is dead; okay
> fair enough.  But, what are you proposing beyond asserting its death?  Are you
> saying someone needs to rewrite it?
Sort of. I am saying that (a) we need to take measures to make replacing the
core feasible in mid-term and (b) that someone eventually has to rewrite it.

> Rewriting the core comes with a big burden of backwards compatibility.  You
> could bump the repo format number (like we did with darcs2) and we could go
> through the process again with a darcs 3 format.  But, other approaches, like
> rewriting all the core code while calling it darcs-2 format is risky because
> you could end up with different semantics (before and after the rewrite) for
> the same repo format.  We also probably want to maintain the current code for
> quite a while so that we have true backwards compatibility with previous repo
> formats.  Again, like the darcs-1 format to darcs-2 format transition.
This will absolutely need a new, incompatible format. Part of the bugs are
likely hardcoded in the "patch" format (as opposed to repo format). Things like
file removals conflicting with edits, or hunks depending on adds (while adds
can conflict). Both are sources of problems that are nearly intractable for
current darcs. We may as well learn from those, since these have trivial

(This is, hunks operating on abstract files that are never added or
removed. Instead, the abstract files are added or removed from the "working"
set by a separate patch type, say "incarnate <some-uuid-thingy>
<path>". Conflicts on this level are very simple, and all other conflicts are
confined within a given abstract file. It may involve some trickiness on the UI
level, but the core part is trivial and fixes a whole lot of existing "core"

Also, we probably want Camp to provide us with new core bits, at least for the
things that happen inside these abstract files -- commuting and merging
hunks. I have no idea what happens to the megapatches (changesets) of Camp and
how we map things to darcs. I don't know how will the camp-core API look
like. Most of this is an open question.

Now of course, there is the question of backwards compatibility, that is
basically (a) from above. This is not only about the user level, but also about
code. So to preserve both user-level compatibility and developer-friendly
hackability, I would propose some changes for 2.4 and maybe 2.5 horizon:

We already have the "format" file, and we decide certain things upon this
format. What I would like to see is to push the format distinction to a much
higher level, to the point that we can have separate command implementations
depending on the format in force. This will make it possible to build up a new
core besides the existing one, without having to do a lot of extra
work. Basically this just needs a refactor of the UI layer, and probably not
even a hard one.

When we have this, we can start building a new, more bottom-up library, and
also use it in commands for new repository formats, without compromising
backwards compatibility or stability. To get the hackability, we could move
everything into Darcs.Legacy and start with a clean Darcs.* namespace (the
alternative is to keep Darcs. as it is and come up with a new namespace). We
would then probably move and refactor parts of the Legacy, moving them up in
the hierarchy to new Darcs. modules (out of Legacy). We can have certain unit
testing coverage requirements for such modules. We can of course hack on things
in Legacy to improve them or fix bugs or such -- nothing is frozen just because
of this move.

To back the process a little with actual experience, we have done something
very similar in a different (C++) project. We basically decided that a complete
rewrite would be probably quite useful, but quite expensive. So we stashed away
all existing code and re-used it as we could, sometimes wrapping things,
sometimes refactoring things and moving them out of legacy, sometimes replacing
them with new implementations.

This will be a little more challenging in darcs, since for the other project,
we could afford to throw away the UI, since it was mostly trivial, and write it
from scratch, so it was not part of the legacy code library. But I assert this
would be definitely possible.

> I like the idea of the core getting attention and cleanup.  In particular, I
> would love to see what impact there is to switching to a left fold based io
> system (the Oleg iteratee stuff).  It's a massive rewrite as near as I can
> tell, but I think it would give us more explicit control over the algorithms
> and allow us to fine tune our performance better.  My intuition is that we
> could apply the iteratee approach to both streaming data from the harddrive/
> network (for parsing) and also to the task of streaming patches from the
> repository (after parsing we generate patch sequences in an iteratee fashion).

Well, left fold is great abstraction, but it has some issues when it comes to
darcs... That is, folding is nice for streaming, but many darcs operations
require random access -- this is not something I expect to translate well into
iteratee-like approach. Basically, the only operations that would really take
advantage of this would be those that work from the top of the patch stack
linearly downwards: applying and unapplying patches, and even then only for the
patch input -- the working copy needs to stay random-access.


More information about the darcs-users mailing list