[darcs-users] The theory of patches

Sat Jun 4 14:29:38 UTC 2005

>>>>> "Antonio" == Antonio Regidor García <a_regidor at yahoo.es> writes:

    Antonio> Thanks, let us try to make an useful and soundless
    Antonio> theory! I also considered to use category theory, but
    Antonio> soon decided to pass to the simpler framework of
    Antonio> groupoids. The arrows of a category form a groupoid
    Antonio> (well, if we consider only isomorphisms), but I didn't
    Antonio> find an use for functors and natural transformations in
    Antonio> darcs, so I used only what is apparently needed.

Well, the thing about working in category theory is that you don't
have to restrict yourself to derived objects of the same type.

    Antonio> No, the directory tree is a repository, and patches are
    Antonio> changes to this repository. How patches arrive to the
    Antonio> repository (as tarfiles, messages, ...) is irrelevant for
    Antonio> the theory I described.

It's not delivery I'm talking about, it's representation.  You can
represent all the directory information that darcs cares about with
ls -alR.  So you can represent file deletions and renames as hunk
patches to the listing.  This simplifies the theory because we only
need to worry about string manipulations, now.

On the other hand, one difference between a hunk patch and a token
replace patch is that a hunk patch can only result in a bounded number
of changes, while a TR patch clearly may result in an arbitrarily
large number of changes.  So there is no representation where hunk
patches can replace TR patches.  Or vice versa.

    Antonio> If concrete patches are always succesful, this makes
    Antonio> things more regular.

Hold it there a second!  Your notation for concrete patches is (S,T)
where S and T are trees.  You also said that a concrete patch has
"unitary" domain, which I took to mean the singleton {S}.  Ie, it's
"diff -urN S T" (plus some stuff for directory manipulations).  Such a
patch is _always_ successful because it can only be applied to S.
(Ie, it's the arrow from S to T.)

Is that you mean by a "concrete patch"?

    Antonio> Maybe we are talking about different things here. There
    Antonio> are two definitions of groupoid, I use the definition
    Antonio> explained here http://en.wikipedia.org/wiki/Groupoid The
    Antonio> other definition can't be applied in this setting.

The definition of "groupoid" is the same.  I'm not sure I understand
the definitions of the specific groupoids such as Con and Gen well.

    Antonio> But when trying to commute a hunk patch and a token
    Antonio> replace patch, the general hunk patch can change, so
    Antonio> maybe we need a more general definition for general
    Antonio> patches, or get rid of commutativity and consider
    Antonio> transformations of lists of patches, commutativity being
    Antonio> only one type of such transformations. These
    Antonio> transformations are not functors.

I suspect they're "natural transformations," or at least that will be
a plausible condition on them.

    >> I don't think it's appropriate in general.  Token replace is a
    >> good example.  Suppose that you've discovered that as far as
    >> users are concerned, ERR_A and ERR_B are semantically
    >> indistinguishable (ie, they indicate the same kind of bug, but
    >> the context is a little different), but ERR_B is rarely emitted
    >> in practice (even though it may occur in many places in the
    >> code).  Then a hunk deletion of the place where ERR_B is mapped
    >> to an actual message plus token_replace(ERR_B,ERR_A) is a very
    >> plausible description of the desired patch.

    Antonio> I don't understand this. Why do you need a hunk replace?

Because you have a switch like this:

    switch (error_code) {
    case ERR_A:
        fprintf (stderr, "Encountered error A\n");
        break;
    case ERR_B:
        fprintf (stderr, "Encountered error B\n");
        break;
    default:
        fprintf (stderr, "Well, excuse me!!\n");
        exit (-42);
    }

And after the TR you need to get rid of the second "case ERR_A" clause.

    >> All composable concrete patches commute by that definition.
    >> Take P = {(R,S) (T,T)} and Q = {(S,T),(R,T)}.  (This kind of
    >> triviality is what I meant above by "Con has no useful
    >> structure.")

    Antonio> Q is not invertible, so it is not a general patch.

You mean because the target of both components is T?  That's easy to
fix (see below).

It's true that both of the patch types darcs currently implements
(hunks and token replace) are injective (subject to the restriction
that TR(A,B) may not be applied to S if S contains instances of B),
but the "dangerous" token replace where S contains some Bs is probably
desirable.

Here's another.  Consider the general patch P defined by "sort the
characters in each line in alphabetical order."

Then P("ABC\nCBA\n") = P("CBA\nCBA\n") = "ABC\nABC\n".

It is not obvious to me that we wish to rule that out.

    Antonio> Can you give a concrete example where the weakness of the
    Antonio> definition of general patches is a problem?

Sure.  Pick a file in R.  Rename it, and call the resulting tree R'.
Now the concrete patches (R,S) and (S,T) will commute via P = {(R,S)
(R',T)} and Q = {(S,T),(R,R')}.  But if (S,T) is a hunk patch that
only affects some other file, there's no way we can say they have "the
same effect".

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.