[darcs-users] The theory of patches

Antonio Regidor García a_regidor at yahoo.es
Sat Jun 4 09:48:42 UTC 2005


 --- "Stephen J. Turnbull" <stephen at xemacs.org> escribió: 
> It's very nice to see somebody applying categoric ideas (like
> groupoids) to patch theory!
> 
> I've thought about applying category theory to the theory of patches a
> little.  As yet I'm not very expert in either but perhaps the
> following comments will be of use.

Thanks, let us try to make an useful and soundless theory! I also considered to use category
theory, but soon decided to pass to the simpler framework of groupoids. The arrows of a category
form a groupoid (well, if we consider only isomorphisms), but I didn't find an use for functors
and natural transformations in darcs, so I used only what is apparently needed.

Now, thinking again about it, I realized that the operation of finding the common ancestor tree of
two parallel patches can be defined as a functor: The set Gen of general patches can be given a
category structure in the obvious way. We can consider the cartesian product Gen x Gen and give it
the product category structure (that is, (P,Q) (P',Q') = (P P', Q Q') and the identity is (id,
id)). Now, let P and Q be two patches so that P = A1 ... An B1 ... Bm and Q = A1 ... An C1 ... Ck
(the order of composition is the inverse of the usual, that is, left first) with Bi and Ci
different for any i, the patches that form P commute and also the patches that form Q commute.
Then, if we define the common ancestor tree of P and Q as the patch A1 ... An, this is a functor
from a subcategory of Gen x Gen to itself. Anyway, maybe functors are not the more natural way to
describe this operation and we need something more general.

>     Antonio> A patch is a change to a directory tree. For example, the
>     Antonio> patches "rename the file (with path) X to Y", "delete
>     Antonio> file Z", "replace token C in each file of the tree by
>     Antonio> token D", "replace lines 13 to 21 of file X by the text W
>     Antonio> (consisting of Y lines)", or more complex patches that
>     Antonio> can be defined combining these ones.
> 
> Substitute "tarfile" (or "MIME message with attachments" or "darcs
> patch against the null tree"<wink>) for "directory tree".  Now all of
> those patches can be expressed as hunk replacements (with tar meta
> data considered to implicitly define lines), except for the token
> replace patch.

No, the directory tree is a repository, and patches are changes to this repository. How patches
arrive to the repository (as tarfiles, messages, ...) is irrelevant for the theory I described. If
you describe all patches as hunk patches you lose information. A general hunk patch and a general
token replace patch can generate the same concrete patch, but if you choose the hunk patch instead
of the token replace patch, the properties the patch satisfies are weaker. 

>     Antonio> Mathematically, a patch is a function that transforms a
>     Antonio> directory tree to other directory tree. It is defined for
>     Antonio> every tree--for example, a patch that tries to delete a
>     Antonio> non-existing file does nothing (or prompts an error
>     Antonio> message, implemented using arrows or monads).
> 
> Mathematically, this makes life simpler, of course.  But one way to
> look at the theory of patches is that the whole point is that erroring
> is unacceptable.  _Every patch must always succeed._  However, the
> product of the patch may be something that is not a valid tree in some
> sense, but nonetheless must be deterministic and "useful."
> (Alternatively, you could say that such a patch can't be applied
> without user interaction to resolve some indeterminism.)  Eg, a simple
> but not so useful<wink> case is CVS's familiar ">>>>", "====", "<<<<"
> merge markers.  Furthermore, it really should be invertible, which a
> constant substitution result (such as "substitute the identity") will
> not give you---it will mean that undo needs to know the context of the
> failed patch as well as the patch itself.

I think that a merger should not be considered a patch but an operation that given two patches and
a tree constructs other patch (that maybe produces a non-compilable-non-usefull-wathever tree). In
the previous description of the theory, the operation and its result are confused (if I understood
it well).

If concrete patches are always succesful, this makes things more regular. For example, suppose
that P is a token replace patch and Q is a patch that commutes with P. Suppose someone applies the
patch P Q to the repository. Later, other person, not knowing of this (Q could be a series of
patches, submited during a long period of time by several developers), tries to apply P again.
Darcs can show an error message "there is no tokens to be replaced" and the patch fails, but it
would be nicer if darcs transforms P Q P to Q P P (commutativity) and this to Q P (two equal token
replace patches cancel) and show a warning explaining that the patch P was applied some time ago
and thus this new P does nothing.

Commutativity, merging, etc. any transformation of patches can fail, but I don't see the need for
patches that can fail. Also, it depends on what do you consider is "to fail". If to show a warning
is to fail, yes, patches can fail.

Also, it would be useful that darcs can say when a tree is
useful-clean-withoutConflicts-compilable-wathever. It can do this? (I'm still learning darcs) 

> If I take your notation seriously, so that really a concrete patch is
> simply a pair of trees, you can say a lot more.  Eg, for any concrete
> patch (S,T), the inverse is (T,S), no?  As far as I can tell, however,
> Con has no "useful" structure.  Everything is possible, all trees are
> both initial and terminal, and every tree is isomorphic to every other
> tree.

Maybe we are talking about different things here. There are two definitions of groupoid, I use the
definition explained here http://en.wikipedia.org/wiki/Groupoid The other definition can't be
applied in this setting.

Both concrete and general patches are needed, I think. Concrete patches are needed because the
merging algorithm uses trees (e.g., it computes the common initial tree of two patches), and if
you are dealing with a general patch and a tree, you are really dealing with a concrete patch,
they are isomorphic concepts. On the other hand, general patches are need because when commuting,
concrete patches are not preserved, only general patches. The pair of trees changes but the
description ("replace token X for Y") does not change. This "thing that does not change" is a
general patch. They can be defined as functions (or programs that modify a directory tree). But
when trying to commute a hunk patch and a token replace patch, the general hunk patch can change,
so maybe we need a more general definition for general patches, or get rid of commutativity and
consider transformations of lists of patches, commutativity being only one type of such
transformations. These transformations are not functors. What is needed? I don't know.

>     Antonio> The other solution is to restrict the domain and codomain
>     Antonio> of a patch (as a function) to two sets where it is
>     Antonio> invertible. For example, we can require that the patch
>     Antonio> "replace token C by D in every file" be applied only to
>     Antonio> trees not containing the token D in any file.
> 
> I don't think it's appropriate in general.  Token replace is a good
> example.  Suppose that you've discovered that as far as users are
> concerned, ERR_A and ERR_B are semantically indistinguishable (ie,
> they indicate the same kind of bug, but the context is a little
> different), but ERR_B is rarely emitted in practice (even though it
> may occur in many places in the code).  Then a hunk deletion of the
> place where ERR_B is mapped to an actual message plus
> token_replace(ERR_B,ERR_A) is a very plausible description of the
> desired patch.

I don't understand this. Why do you need a hunk replace?

> I suspect this means that a more general theory of patches will have
> to give up on universal invertibility.  It may be possible to do
> something useful with left and right inverses, even though they need
> not be unique.

Non invertibility probably is needed if you use patches that can fail.

>     Antonio> We say that two concrete patches (R, S) and (S, T)
>     Antonio> commute if there are two general patches P representing
>     Antonio> (R, S) and Q representing (S, T) that commute for R.
> 
> All composable concrete patches commute by that definition.  Take P =
> {(R,S) (T,T)} and Q = {(S,T),(R,T)}.  (This kind of triviality is what
> I meant above by "Con has no useful structure.")

Q is not invertible, so it is not a general patch.

> You need to have some kind of regularity condition on allowable
> general patches, and it has to be stronger than "generated by
> composition of basic patches" because you can always achieve arbitrary
> concrete (S,T) with a hunk patch, which I don't think we can disallow.
> darcs patch theorists express this condition intuitively by saying
> that "two patches have the same effect".  On the darcs-conflicts
> mailing list, there has been some progress in defining "have the same
> effect" in algebraic terms, but I haven't had time to follow the
> actual math closely.

Thanks, I will look onto this list. Can you give a concrete example where the weakness of the
definition of general patches is a problem? The definition maybe is weaker than needed, but
apparently this is not a problem. 

Best regards,

Antonio Regidor García

Antonio Regidor García

---------------------------------------------
Enciclopedia Libre: http://enciclopedia.us.es


		
______________________________________________ 
Renovamos el Correo Yahoo! 
Nuevos servicios, más seguridad 
http://correo.yahoo.es




More information about the darcs-users mailing list