[darcs-users] Defining patch types modularly

Marnix Klooster mklooster at baan.nl
Sat Mar 27 15:04:39 UTC 2004

Hello all,

As promised here a long time ago, here is an idea that I have for
making darcs patch types 'modular', so that they don't refer to other
patch types.  Let me know what you think.

Terminology as in


The first observation that I make is that any patch as it currently
exists in darcs can conceptually be separated in the following parts:
 - change GUID, description, and more administrative details
 - pointer to the previous patch in the repository, and thereby
   indirectly to the entire context of the patch
 - patch procedure: type (e.g., replace) and arguments (e.g.,
   original, replacement)
 - patch scope: the part of the tree that is potentially affected by
   the patch

 - diff: like a 'diff' output, but also capable of describing adding,
   renaming, and deleting files and directories.

The next observation is that the diff is theoretically redundant:
given a repository of patches, we can sequentially compute the diff of
all patches.  But in practice we don't want to do that for reasons of
efficiency.  And it turns out that we can use this diff effectively
for a number of things.


There are two ways in which we can use a diff.  Usually we apply it to
a specific tree, resulting in another tree.  But we can also use the
'position info' in the diff to 'move' something else.  For example, if
a patch applies only to file foo (so its scope is foo), and we have a
diff that renames foo to bar, then we can move 'the scope foo' by 'the
diff that renames foo to bar', resulting in 'the scope bar'.

In the following I will call this 'move X by Y', which is short for
'that which results when position info in X is moved using diff Y'.

What seems to me to be the fundamental operation that can be applied
to a patch, is 'move the patch by a given diff D'.  With this
operation we can always commute patches P and Q (P applied first, then
Q) into Q' and P' as follows:

 - move Q by the inverse of the diff of P, resulting in Q'
 - move P by the diff of Q', resulting in P'.

Both of these steps are performed as described below.  If any of these
moves fails , then the commute fails.


How can such a 'move patch by diff' be computed? We need to compute
the (1) procedure, (2) scope, and (3) diff of the new patch.

(1) The new procedure has to be computed using a computation that is
patch type specific: 'move patch procedure by a given diff'. This move
can fail (for example if the diff removes a place in the tree that is
mentioned in the procedure). (2) Moving the old scope by the given
diff D is a generic computation.  This move can also fail (for example
if the scope is the contents of a file, and the diff removes that file

With this information, we now have a new (moved) patch.  (3) Now the
most complex thing is computing the new diff.  That's what most of the
rest of this mail is about.

To sum up the problem that we have to solve now: we are given a patch,
complete with its context, but with only the diff missing.  To compute
the diff, we have several options.  (3c) It is of course possible to
use the brute-force attach: compute the tree (or even only the scope-
part of the tree) that is given by the context, and call the patch
procedure to compute the diff.  But often we can do better, because we
also have the old diff, and the diff D by which the patch is moved.

(3a) First we can check whether the diff D has any influence on the
(new) scope of the patch.  If this is not so, then we only have to
move the old diff by D.  (3b) But if D influences the scope, then we
have a more complex situation, where we cannot move the old diff, but
have to re-compute the new diff from scratch.

As an example, suppose that the patch procedure is a search-and-
replace, and the scope is the contents of file foo.  If the diff D
then removes a line from foo that contains the search expression, then
the new diff should not replace that specific occurrence anymore.

So what do we do?  In certain cases (such as for the above search-and-
replace examples), it is possible for patch type specific code to look
at the contents of diff D, and compute the new diff for the new patch
from it.  And only if that fails, we have to recompute the patch using
the brute-force attack described above.


To summarize: for each patch type we need to have implementations for
the following:
 - Move a patch procedure by a diff (this can fail).
 - Given a patch's procedure and scope, the old diff and the move-by
   diff, compute the new diff (this can fail);
 - Given a patch's procedure and scope, and a tree, compute the diff
   for the patch.  (This is used for the initial computation of a
   patch's diff, and for the brute-force recomputation after a move.)

All other computations are independent of any specific patch type.


I hope this was a little clear; feel free to ask questions, because
they will help both your and my understanding.  My apologies for
keeping this a fairly theoretical story; I don't have the time right
now to expand this with examples.

I'm currently developing a bit of proof-of-concept Haskell code to
validate the ideas outlined above, so that I can test it on various
examples, and all kinds of test types.  If I get around to putting
that out, it will contain working examples.

[In the above I focused on commutes.  I have not taken merges into
account, because I don't yet understand enough of how darcs does

What do you think?

Marnix Klooster
mklooster at baan.nl

More information about the darcs-users mailing list