[darcs-devel] bit more of a summary of new conflict-handling ideas

David Roundy droundy at darcs.net
Wed Jul 12 05:37:15 PDT 2006


Hi all,

(This is a rather lengthy email, and not terribly well-organized, as I
haven't time to go back and rewrite the beginning of the email.  At
the end are some concrete things that I think need doing.  If anyone
wants to volunteer on any of these, that would be great.  I think
there is now some work that we can do that'll make the coding of darcs
2.0 much easier.  And concrete discussion of concrete issues is
welcome.  I'd like to move more of the discussion of this stuff onto
the mailing list rather than IRC, so more people will be involved, and
so there'll be a bit better of a record.)

More discussion's been going on over IRC.  Simon Marlow (JaffaCake)
asked why we need a real tree, what if there were just one branch
point.  We couldn't have much of an answer, and are thinking this may
be the best (and simplest) solution.

One issue is that this solution doesn't handle "dead ends" at all.
But since a dead end is just an already-resolved conflict, we don't
want heavy machinery for it, if at all possible.  One solution would
simply be to store the contents of all resolved conflicts in the
resolution patch itself.  I lean towards this solution.

If we go with this, it'll mean that the shape of a repository is as
follows.  An unconflicted repository will be precisely the shape of
our current repositories, which is to say a sequence of patches.
It'll still be a bit more tricky, because some of those patches will
be hiding other patches, that will need to reappear if the resolution
is unpulled, or if a patch that depends upon them is pulled into the
repo.  But storage-wise it's dead simple.

When there's a conflict, the repository will have besides the
aforementioned sequence of patches, a set of conflicting patch
sequences.  My leaning (although it's only one of several options) is
to restrict these patch sequences to be "minimal depth".  That is to
say: (a) Each patch non-terminal patch in a conflicting sequnce must
be depended upon the the terminal patch in that sequence, and (b)
Every terminal patch must not be depended upon by any other patch in
the repository.  This certainly isn't the only route we could take,
but it seems to me like the simplest.

We could consider disallowing pulling from conflicted repositories, in
which case the "minimal depth" choice wouldn't need to be set in
stone.  (This option--requiring resolutions--is particularly appealing
if we go with the crazy idea below.)

In the new scheme, I imagine an interactive "darcs resolve" which
prompts the user with alternative conflicting possibilities, and
allows the user to choose between them.

Implementation issues:
=====================

1. Do we store "dead" patches within the resolution patch itself? I
think Arjan and I lean towards this idea.

2. We're going to have to be doing considerably more rearranging and
modifying of patches than in current darcs, in which a patch file is
untouched after it's created in a repository.  This is dangerous if
someone is getting while that is happening (since we have no read
locking).  I think the best solution to this is to move to the "hashed
inventory" idea we've discussed before, which allows an in-place
update while gets and pulls are going on, provided we don't delete the
old files.  Or if we do delete them, the worst that happens is that
the gets or pulls might end up failing due to inability to read the
repo.  This is a feature that can be added before we do the new
conflict handling code, or in parallel.  And it's a good thing to have
in either case.

3. With the RepoFormat framework, we can plan ahead so that older
versions of darcs will be able to interact with darcs 2.0 (the new
conflict-handling version) as long as there aren't conflicts involved.
The idea is that we'll define a format feature "no-new-mergers", and a
repo that has that feature (which will restrict writes only, not
reads) will be writeable only by darcs 1.0.9 or later (or whenever we
do this), and darcs won't be able to deal with conflicts in that repo.
But it'll be writeable by both darcs 1.0.9 and darcs 2.0 (as long as
there are no conflicts), and will be readable by even older versions
of darcs.  So we can implement this now, and benefit later by allowing
a certain amount of interaction between the new and old versions of
darcs, which will probably be crucial in allowing a sane transition
plan for our users.

4. Disallow pulling from (or to?) conflicted repos? If we do this,
much of darcs' logic will remain the same.  We can certainly start
with this as a choice, and implement it later.

5. (The crazy idea) The new scheme is going to have to treat primitive
patches as the "first-class" objects, rather than named composite
patches as is currently the case.  When I refer above to sequences of
patches, I now mean sequences of *primitive* patches, quite different
from the situation in current darcs.  This is going to require quite a
shift in the code, particularly in how we deal with patch names.  An
idea I floated before was to move composite patches up to the UI
level, eliminating them entirely at the fundamental level of the darcs
core code, with each primitive patch having a unique name.
   I've now got a new related idea, which has a very strong appeal.
How about we make the "name" (patch id, or PatchInfo) of a patch no
longer be part of its identity, but instead be a sort of tag that's
attached to it?  So that a given primitive patch could now have more
than one name.  This would give us "for free" the feature that patches
that are identical except in name do not conflict.  Primitive patches
could now be members of more than one "named" patch.  It'll require a
restructuring, but that restructuring is required already by the whole
new approach to conflicts.
   What would this mean? I'm not quite sure.  We'd need to attach some
sort of set of names to each primitive patch.  It also means that we
wouldn't necesarily need to attach a number to each primitive patch to
give it a separate unique identifier.  We still have open the danger
of accidentally pulling just part of a named patch and not realizing
it, since patches will be split up.  One solution to that would be to
include in a patch name the number of primitive patches included, and
then count to make sure we got them all.  Which would also need to be
included if we numbered all the primitive patches.

6. Question: How will we store the inventory with these primitive
patches going every which way? Perhaps we don't need to change
anything? Perhaps we can still store each named patch in a single
file, with an annotation that the patch is missing something.  In
fact, perhaps we can arrange commutation with "resolutions" to handle
everything for us, so we don't need to do anything with the on-disk
format.
-- 
David Roundy




More information about the darcs-devel mailing list