[darcs-devel] Semantics of witnesses

Mon Aug 12 21:26:49 UTC 2019

Am 12.08.19 um 22:54 schrieb Ganesh Sittampalam:
> On 12/08/2019 13:27, Ben Franksen wrote:
>> your last remark may point a way to reconcile our two views. Indeed it
>> is true that every patch has a unique minimal context and that this
>> minimal context is an invariant property. (I am using the traditional
>> definition of context here: the set of patches preceding a patch in a
>> repo.) This in turn gives us a /canonical/ initial state for each patch
>> (apply the minimal context to the empty state), which is then also an
>> invariant property of the patch.
> 
> The minimal context is definitely an invariant property of the patch
> (i.e. the thing that can be commuted into lots of different contexts,
> possibly changing representation). I don't think it helps us with
> invariant properties of patch representations though.

Oops. You are right, of course: the minimal context rather corresponds
to set (4) than set (2).

>> Here is a proof (sketch) for existence and uniqueness of minimal contexts:
>>
>> We can always minimize any particular context in which we find a patch
>> p by calculating (commuteWhatWeCanRL context p). It is easy to see that
>> the result is indeed minimal i.e. none of its patches can be commuted
>> past p, using permutivity. This gives us existence. To see uniqueness,
>> suppose we have the same patch in two repos with minimal contexts as;p
>> and bs;p. Now we invoke the invariant that all common patches in two
>> repos can be commuted to a common trunk, i.e. findCommon (as;p) (bs;p) =
>> Fork (xs;p) as' bs'. Since as and bs were minimal we know that as' = bs'
>> = Nil and therefore that as = xs = bs.
>>
>> While this appears to require explicit identities for patches, I think
>> we can actually define findCommon for raw unnamed prim patches, too,
>> using patch equality, and I think the repo invariant holds for RLs of
>> unnamed prims as well. I should add a test to check that.
> 
> I think both for named and unnamed patches, we have to rely on the
> global invariant that darcs will never let you commute something past a
> patch it depends on (and of course this is not technically true for
> V1+V2 patches, so really we have to close our eyes and ignore that!).
> 
> Without that invariant, findCommon can fail and all bets are off.

Right again. I should know better, having hacked on RepoPatchV3 for so long!

> And with raw unnamed prim patches, you still have to make up some notion
> of an invisible identity assigned at record-time. Because you can record
> exactly the same prim twice in two different contexts and they can have
> two different minimal contexts.

Yup. Minimal contexts require patch identities.

>> I think this means that we can use both the minimal context or the
>> canonical start state as representatives for the set of all states for
>> which a patch is defined. It is really a matter of taste: I prefer the
>> set of states because it is immediately obvious that this is an
>> invariant property and thus mathematically simpler to define. You prefer
>> the canonical state because it is immediately obvious how to calculate
>> it for a concrete patch in context, using commutation. And indeed, if we
>> wanted to make witnesses more precise (in a theorem proover or with some
>> future dependent Haskell) we should probably do it based on the notion
>> of minimal context.
> 
> Well, I still think witnesses are all about patch representations we> don't need to commute, and not about patches in general.

Indeed. Sorry for the confusion.

>> On the practical side, as you noted, calculating minimal contexts for a
>> complete repo from scratch is prohibitively expensive. Even if we
>> calculate and store the minimal context incrementally each time we add a
>> patch to a repo, this is still quadratic in the repo size and therefore
>> too expensive. (We need to do roughly (n-m) * m commutations in the
>> worst case, where n is the number of patches in the repo and m is the
>> size of the minimal context.)
> 
> This comment reminds me of some of the ideas I was thinking about to
> make this faster. If you think about commuting with X as a partial
> function - commute[X] (Y) = Y' iff XY <-> Y'X' - then you might also
> imagine coming up with some higher-level structure for the behaviour of
> that partial function and how to compose it - it would have to be
> specific to the patch type. And maybe that could be composed
> efficiently, i.e. you could precompute a commute[XS] that would be
> faster than just commuting with each X in XS in turn. But it all got
> quite complicated and I gave up on it.
> 
>> In Pijul they claim they can do it in logarithmic time... I wonder how
>> they do that. Here is my guess: they use a richer internal
>> representation of repo states, one in which every sequence of patches
>> always leaves a "trace" of all its actions, even if these partly cancel
>> each other when the state is finally mapped to a user-visible Tree.
>> Together with UUIDs for all parts of the state (including file content)
>> this may allow them to calculate the canonical representation without
>> having to commute patches at all. In fact, if their internal
>> representation of states is history-aware in this sense, then they may
>> indeed get away with identfying set (2) with set (3) and still maintain
>> consistency and associativity of merging.
> 
> My understanding from talking to Pierre-Etienne and Florent is that
> their internal representation is implicitly in a minimal context

This is what I thought.

> and they calculate the actual state of a repo on the fly as needed.

Interesting. I wonder how they do that.

> As you
> say this must be based on a richer internal representation but I don't
> have a good intuition of how it works.

I thought I did, at least for operations on the tree i.e. things like
addfile. But when I tried to write it down I realised that it wouldn't
work the way I imagined it.