[darcs-users] The theory of patches

Stephen J. Turnbull stephen at xemacs.org
Fri Jun 3 09:01:44 UTC 2005


It's very nice to see somebody applying categoric ideas (like
groupoids) to patch theory!

I've thought about applying category theory to the theory of patches a
little.  As yet I'm not very expert in either but perhaps the
following comments will be of use.

>>>>> "Antonio" == Antonio Regidor García <a_regidor at yahoo.es> writes:

    Antonio> A patch is a change to a directory tree. For example, the
    Antonio> patches "rename the file (with path) X to Y", "delete
    Antonio> file Z", "replace token C in each file of the tree by
    Antonio> token D", "replace lines 13 to 21 of file X by the text W
    Antonio> (consisting of Y lines)", or more complex patches that
    Antonio> can be defined combining these ones.

Substitute "tarfile" (or "MIME message with attachments" or "darcs
patch against the null tree"<wink>) for "directory tree".  Now all of
those patches can be expressed as hunk replacements (with tar meta
data considered to implicitly define lines), except for the token
replace patch.

    Antonio> Mathematically, a patch is a function that transforms a
    Antonio> directory tree to other directory tree. It is defined for
    Antonio> every tree--for example, a patch that tries to delete a
    Antonio> non-existing file does nothing (or prompts an error
    Antonio> message, implemented using arrows or monads).

Mathematically, this makes life simpler, of course.  But one way to
look at the theory of patches is that the whole point is that erroring
is unacceptable.  _Every patch must always succeed._  However, the
product of the patch may be something that is not a valid tree in some
sense, but nonetheless must be deterministic and "useful."
(Alternatively, you could say that such a patch can't be applied
without user interaction to resolve some indeterminism.)  Eg, a simple
but not so useful<wink> case is CVS's familiar ">>>>", "====", "<<<<"
merge markers.  Furthermore, it really should be invertible, which a
constant substitution result (such as "substitute the identity") will
not give you---it will mean that undo needs to know the context of the
failed patch as well as the patch itself.

    Antonio> It is easy to check that concrete patches form a
    Antonio> groupoid. We will denote it by "Con".

If I take your notation seriously, so that really a concrete patch is
simply a pair of trees, you can say a lot more.  Eg, for any concrete
patch (S,T), the inverse is (T,S), no?  As far as I can tell, however,
Con has no "useful" structure.  Everything is possible, all trees are
both initial and terminal, and every tree is isomorphic to every other
tree.

    Antonio> The other solution is to restrict the domain and codomain
    Antonio> of a patch (as a function) to two sets where it is
    Antonio> invertible. For example, we can require that the patch
    Antonio> "replace token C by D in every file" be applied only to
    Antonio> trees not containing the token D in any file.

I don't think it's appropriate in general.  Token replace is a good
example.  Suppose that you've discovered that as far as users are
concerned, ERR_A and ERR_B are semantically indistinguishable (ie,
they indicate the same kind of bug, but the context is a little
different), but ERR_B is rarely emitted in practice (even though it
may occur in many places in the code).  Then a hunk deletion of the
place where ERR_B is mapped to an actual message plus
token_replace(ERR_B,ERR_A) is a very plausible description of the
desired patch.

I suspect this means that a more general theory of patches will have
to give up on universal invertibility.  It may be possible to do
something useful with left and right inverses, even though they need
not be unique.

    Antonio> We say that two concrete patches (R, S) and (S, T)
    Antonio> commute if there are two general patches P representing
    Antonio> (R, S) and Q representing (S, T) that commute for R.

All composable concrete patches commute by that definition.  Take P =
{(R,S) (T,T)} and Q = {(S,T),(R,T)}.  (This kind of triviality is what
I meant above by "Con has no useful structure.")

You need to have some kind of regularity condition on allowable
general patches, and it has to be stronger than "generated by
composition of basic patches" because you can always achieve arbitrary
concrete (S,T) with a hunk patch, which I don't think we can disallow.
darcs patch theorists express this condition intuitively by saying
that "two patches have the same effect".  On the darcs-conflicts
mailing list, there has been some progress in defining "have the same
effect" in algebraic terms, but I haven't had time to follow the
actual math closely.


-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.




More information about the darcs-users mailing list