[darcs-users] Explanation: What does commute mean?

Jason Dagit dagit at codersbase.com
Sat Oct 18 00:47:24 UTC 2008

```Hello,

People often want to understand how commute on patches works.  Usually we
start by explaining:
Given two patches, A and B, if A and B commute then: AB <--> B' A', for some
B' and A'.

Naturally people ask, "But what is the relationship between A and A' or B
and B'?"

This is a very important question and I'll provide you with some insight.

Suppose we have a repository with 2 files, a and b.  We could then make the
following operations:
mv b c
mv a b
mv c a

You can think of each operation as a transformation on the 'state' of your
repository.

Suppose also, that we make an edit to a, and an edit to b.

Let's name the above, using T for transformation:
T_bc = mv b c
T_ab = mv a b
T_ca = mv c a
T_a = edit a
T_b = edit b

You can imagine that if I gave the diff for T_a and the diff for T_b that
you could apply those diffs in either order to your repository and get the
same final 'state'.  Meaning, a and b are the same whether you update a
first or b first.

But, suppose instead that I performed T_bc, T_ab, and then T_ca.  This has
the effect of swapping a and b by name.  Now suppose you applied the diffs
T_a and T_b.  What would you want the outcome to be?

It turns out, that it matters which operations were created first.  If you
created the diffs T_a and T_b *before* you did the operations of the swap,
then you should expect that after the swap the diff for T_a actually
modifies b, whereas T_b should modify a.  On the other hand, if you created
the diffs T_a and T_b *after* the swap, then you expect T_a to modify a and
T_b to modify b.

We have an intuitive idea of 'context' now.  As in, what is the context that
T_a and T_b were created in?  Knowing this will tell us how they transform
the repository state.

Intuitively, it seems as though we need to remember the 'context' in which
T_a and T_b were created.  So let's say that the operations performed up to
the point where T_a is created is the context of T_a.  In other words, the
context for T_a is sequence of transformations that existed when T_a was
created.  Similarly, since T_a is a transformation, creating it results in a
new context, which is the old context plus T_a.  We could say that T_b has
this context.  Going a bit further, it seems like we should talk about how
T_a has a pre-context and it also has a post-context.

For example, if we created T_a before doing the swap, then the pre-context
might include two transformations, one that creates a and another one that
creates b.  The post-context would then include those two transformations
and T_a itself.  If we created T_a after doing the swap, the pre-context and
post-contexts of T_a would include T_bc, T_ab and T_ca also.

Now a side note about commutative functions.  Consider the function created
by composing T_a and T_b, let's write T_a . T_b.  Recall, that with function
composition parameters start on the right and pass through the sequence to
the left.  As discussed in the intro, T_a . T_b is equal to T_b . T_a.  This
is because T_a and T_b are independent of each other.  Thus, we would say
that the functions T_a and T_b are commutative functions.  This means, that
changing their order of application does not change the result.

We are saying that:
T_a . T_b = T_b . T_a

Because T_a and T_b are commutative it doesn't matter which order we compose
them.  If we restrict our view to just the repository above with only the
files a, b and no c, then on this restricted set of repository state how do
these two compare?
T_b . T_a
T_a . T_b . (T_ca . T_ab . T_bc)

In plain English, the first one edits a and then b, the second one swaps a
and b, edits b and finally edits a.

As far as the mathematics of it is concerned, the first one will edit a and
b, while the second one will have T_a editing a different a than the first
one and T_b editing a different b than the second one.

Going a bit further, let's say that T_a and T_b were created without any of
T_bc, T_ab or T_ca in their context.  So we could have two scenarios.

We could, for example, start with T_b and T_a, swap their order and then do
the swap of a and b afterwards.  That would give us:
T_b . T_a
and
(T_ca . T_ab . T_bc) . T_a . T_b

Intuitively, it seems like T_a and T_bc are commutative functions, eg., T_bc
. T_a = T_a . T_bc.  So we could rewrite the second one as this:
T_ca . T_ab . T_a . T_bc . T_b

Now, suppose when we commute the function T_a with T_ab, that we replace T_a
with T_a'.  T_a' is like T_a except that T_a' makes the edits of T_a to b
instead of a.  After all, this results in T_a' editing the correct file
after the rename. Similarly, when we commute T_b with T_bc, T_b is replaced
with T_b' that edits c instead of b.  When we commute T_b' with T_ca we
replace T_b' with T_b'' that edits a instead of c.

So, the above goes through these steps:
T_ca . T_a' . T_ab . T_bc . T_b (commute T_a to the left)
T_a' . T_ca . T_ab . T_bc . T_b (commute T_a' to the left)
T_a' . T_ca . T_ab . T_b' . T_bc (commute T_b to the left)
T_a' . T_ca . T_b' . T_ab . T_bc (commute T_b' to the left)
T_a' . T_b'' . T_ca . T_ab . T_bc

The last one will then have T_a' and T_b'' making edits the same file
contents as T_a and T_b respectively, even though the names of the files
were changed by the swap.

So, if you've followed me to this point, then you now have the intuition for
what we mean when two patches A and B, commute to B' and A', as AB <--> B'
A'.  You can think of a patch as being one of the above transformations
along with the context of the transformation.  You might also notice that
commute of patches must be doing something to the context of the patches.

Patch commute has the potential to update the context and transformation the
patches it swaps OR it could update the context and leave the state
transformations equal to what they were in the input.  Patch commute can
also fail, but we're ignoring that case for the moment.

Thinking back to how we arrived at the need for context, you might notice
that for each context, that is each sequence of operations, we get one
unique repository state.  This is a very important property of context.
Without it, context wouldn't really be useful.  Also, notice that the
opposite is not true, repository state does not determine the context.
Which makes sense, because there are lots of operations you can do that get
the repository to a particular state, so given a state how do you know what
was done?

The next important property we want for commuting patches is that once two
patches have been commuted, you can commute them again to undo the
commutation.  In fact, it turns out the examples above are saying we want
contexts to determine the same state if you commute the patches inside the
context (again, context is a sequence of patches!).

For R to be an equivalence relation, we need three things:
1) x R x, is true for all x
2) if x R y then y R x
3) if x R y and y R z then x R z

Here, we replace x R y with "the sequencing, or order, of x can be obtained
by commuting adjacent elements of y".  Roughly how to prove each:
1) either claim that 0 commutes satisfies definition of R or check that
commute is self-inverting
2) relies on self-inverting nature, I think
3) messier but should still be provable

I'm pretty sure both (2) and (3) could be done with a brute force proof that
considered all the pairings of patch types in their general cases.  Start
with all sequences of length 2, then 3 and I think at that point you could
make an inductive argument to hit sequences of length n.  This would be a
lot of work, and I'm not convinced it could be fully automated.

Why would we want to show the above?  Showing that R is a relation would
tell us that sequences of patches are equivalent under commute.  Now,
combine this with the idea that context determines the state uniquely and
now we know sets of patches uniquely determine your repository.

Now we hit the circular bit, I think.  In the darcs implementation we want
all the above to work out so we've defined all the pairwise commutes so that
R should be an equivalence relation for us.  I don't know of any formal
work, other than our QuickCheck properties that have tried to verify this.
While it's perfectly fine and natural to define the desired properties then
prove that some particular operations match those properties it's that proof
by inspection that we have yet to do to make this formal.

My arguments here apply to all the primitive patches that modify your
repository but other patch types, like conflictors, do something else and I
really don't know how to fit them into the theory.

Thanks,
Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osuosl.org/pipermail/darcs-users/attachments/20081017/8d6132cc/attachment.htm
```