[darcs-devel] Semantics of witnesses

Mon Aug 12 12:27:03 UTC 2019

Hi Ganesh

your last remark may point a way to reconcile our two views. Indeed it
is true that every patch has a unique minimal context and that this
minimal context is an invariant property. (I am using the traditional
definition of context here: the set of patches preceding a patch in a
repo.) This in turn gives us a /canonical/ initial state for each patch
(apply the minimal context to the empty state), which is then also an
invariant property of the patch.

Here is a proof (sketch) for existence and uniqueness of minimal contexts:

We can always minimize any particular context in which we find a patch
p by calculating (commuteWhatWeCanRL context p). It is easy to see that
the result is indeed minimal i.e. none of its patches can be commuted
past p, using permutivity. This gives us existence. To see uniqueness,
suppose we have the same patch in two repos with minimal contexts as;p
and bs;p. Now we invoke the invariant that all common patches in two
repos can be commuted to a common trunk, i.e. findCommon (as;p) (bs;p) =
Fork (xs;p) as' bs'. Since as and bs were minimal we know that as' = bs'
= Nil and therefore that as = xs = bs.

While this appears to require explicit identities for patches, I think
we can actually define findCommon for raw unnamed prim patches, too,
using patch equality, and I think the repo invariant holds for RLs of
unnamed prims as well. I should add a test to check that.

I think this means that we can use both the minimal context or the
canonical start state as representatives for the set of all states for
which a patch is defined. It is really a matter of taste: I prefer the
set of states because it is immediately obvious that this is an
invariant property and thus mathematically simpler to define. You prefer
the canonical state because it is immediately obvious how to calculate
it for a concrete patch in context, using commutation. And indeed, if we
wanted to make witnesses more precise (in a theorem proover or with some
future dependent Haskell) we should probably do it based on the notion
of minimal context.

So much for the theory.

On the practical side, as you noted, calculating minimal contexts for a
complete repo from scratch is prohibitively expensive. Even if we
calculate and store the minimal context incrementally each time we add a
patch to a repo, this is still quadratic in the repo size and therefore
too expensive. (We need to do roughly (n-m) * m commutations in the
worst case, where n is the number of patches in the repo and m is the
size of the minimal context.)

In Pijul they claim they can do it in logarithmic time... I wonder how
they do that. Here is my guess: they use a richer internal
representation of repo states, one in which every sequence of patches
always leaves a "trace" of all its actions, even if these partly cancel
each other when the state is finally mapped to a user-visible Tree.
Together with UUIDs for all parts of the state (including file content)
this may allow them to calculate the canonical representation without
having to commute patches at all. In fact, if their internal
representation of states is history-aware in this sense, then they may
indeed get away with identfying set (2) with set (3) and still maintain
consistency and associativity of merging.

Cheers
Ben