[darcs-users] Getting ready to sprint, camp team
Florent Becker
florent.becker at ens-lyon.org
Wed Apr 15 13:30:02 UTC 2009
Hi all,
here are some points i'd like to discuss before the sprint concerning camp.
It is both my view of the current situation of camp and camp-theory
(especially camp-coq) and a call to see what we're going to do during
the sprint. There we go.
A/ camp mailing-list versus darcs-users mailing list.
I discovered yesterday the existence of camp's own mailing list, which
sees very few usage. darcs-user's traffic is also not that big, and given
that it would be nice for everyone here to know in what shape the future
darcs-3 is. I propose to use darcs-user for both camp and darcs for
normal discussion, and respectively the camp and darcs-devel lists for
automated traffic. Is it ok with everybody?
Side question: does camp want a space on darcs-wiki, what'd we do with
it?
B/ camp as darcs-3 / hashed-storage
This leads us into the second point: do we have an estimation for the maturity
of camp-core (and additionally hashed-storage)? Is it the right time to
marry them?
Do we have a plan for the various transitions?
When do we start coding new features
for camp rather than darcs?
I don't have an opinion on these questions, but I think
the crucial points to decide are:
-does camp become darcs-3, or do camp-ideas get ported to darcs'(s?)
code?
-does camp adopt hashed-storage? is it a goal for the sprint?
-if both answers are yes, when do we switch to camp? Do we plan a darcs-2.5 with
darcs-core and hashed-storage?
C/ camp-paper and camp-coq.
Now for the theory of camp. These are mostly internal to camp, but should be decided
soon.
The current situation:
camp's theory lives in two repositories:
-Ian's camp repository http://code.haskell.org/camp/devel/paper/, which
contains both the latex paper with the theory, and a (very partial) coq
formalisation.
-My repository http://patch-tag.com/publicrepos/dic (where dic
stands for darcs-in-coq), which contains a (very partial) formalisation
of Ian's theory, but diverges with it in several points (for some of which, in my
understanding, camp's implementation diverges too from the paper).
This situation, with 4 different sources of "truth" about camp must be resolved,
preferably before the end of the sprint.
For the sake of clarity, camp-code refers to camp's implementation in haskell,
camp-paper to the theory in Ian's paper, camp-coq to Ian's coq formalization, and
dic to mine. Note that the situation is not that desperate, since camp-coq and dic
are on the way to converging, but their fusion entails a couple of decisions, which
are going to affect camp-paper, as we put it back into sync. I feel that camp-code
should not be (yet) affected by these theory changes.
Here is my plan for the fusion of dic and camp-coq, Ian, i want your
opinion on that.
1/ coqdoc integration
This is a must for the final result, which we'll inherit from camp-coq.
2/ modules organisation
dic and camp-coq seem to agree that modules are the way to go. Inherit
from both.
3/ sensible sequences versus states
In darcs-theory as well as in camp-theory, not every sequence of patches
is acceptable, since not every patch applies in every situation (Not that the word
"context" has another meaning in camp-paper).
On that point, camp-paper takes the view that acceptable sequences are
defined by a bunch of axioms, and are called "sensible sequences". Sensible
sequences are thus a subset of lists of patches defined by a bunch of axioms.
Note that, according to Ian, a sensible sequence goes from the empty
repostate to somewhere (it is "grounded")
In camp-code and dic (and also in darcs), the view is that patches have
a from and a to states, and that sequences are encoded with a FL type
that makes sure that these from and to states are legally chained.
camp-coq does not seem to care about that issue (ie, at the moment, it
does not encode this constraint).
My view on this point is:
-camp-paper's axiom are incomplete. We need at least one more axioms
that says that if sp and sq are sensible sequences, then so is spp^q.
Without this axiom, nothing says that p^ undoes p, we only know its
behaviour with respect to commutation.
I also think that at some point, we'll need to introduce ungrounded sensible
sequences, if only as a device in proofs, with the axiom that if sp and ps'
are sensible, so is sps'. If we do so, then I think that one can show that
there is a set of states such that each patch can be associated with a from
and a to state, and every ungrounded sequence is sensible if and only if the
from and to states match.
-camp-code's (and dic's) stateful patches (where states are an
_abstract_ type) are much easier to work with in coq (and in code)
than a bunch of axioms, as the states prevent most mistakes.
Note that using states makes proving that every sequence you produce is
good automatic, whereas you need a proof with each operation when
using sensibility.
My preferred course of action would be to check the consistency between
states and sensible sequences (if only as a check of consistency of camp-code
and camp-paper). Then use states everywhere (if only because otherwise, proving
camp-code is going to be hell).
4/ Unnamed marked sequences
before introducing names (they don't exist yet in dic), dic relies on a
concept which is not present in camp-paper, nor in camp-coq.
That concept is that of marked sequences (shuffle_mark_[1,2] in
dic). The idea is that if s (with p marked) can be shuffled into t (with q marked),
then when shuffling s into t, p "lands" on q. (There is also a version with
two marks). Additionally, one will need a lossy version, with
s (with p marked) can be turned into t (with q marked) if by shuffling
patches in s, adding and removing patches, p "lands" on q. This allow to
prove some more lemmas, which are going to be useful for named patches
manipulation.
Notation: in the following, " s!p!s' " means " sps' " with p marked.
This concept does not make sense for camp-code, since it's a proposition
that says "it's possible to get from here to there". We're not manipulating
such propositions in the algorithm, only proving that they hold.
I also feel that this kind of properties is the real use behind names:
comparing two patches' names is just a way to know whether they
come from the same record. That's why I want to give them a bigger
role, and make them explicitly appear in the paper.
Hence, i suggest take the definitions and lemmas from dic, and add them
to camp-paper. This is especially important in the light of point 7.
5/ Zigzag sequences (and their absence)
This is a point that's absent in all sources alike, but which we will need
to take into account somehow. We can sum it up by saying that the
subsequence pp^p is evil, and should probably not appear in a repository
(at least, when camp is not being invoked). We need to be able to mark
which sequences are good in that respect.
I propose to say that a _named_ sequence has a zigzag if the same name
appears twice in it. What this means in _unnamed_ terms is that a sequence
has a zigzag if it can be decomposed into sps'p's'' and for some p'', the
sequence !p''! can be turned both into s!p!s'p's'' and into sps'!p'!s''.
A sequence without zigzag is straight. This second formalisation seems a
bit more complex when written into ascii, but it really is the kind of
reasonning you do on paper when you treat patches with the same name
("this is really the same guy that went from here to there with a serie of
commutations"). It is natural once you get through the slight notational
noise.
Then we need to check the fact that all operations preserve straightness.
6/ names as a function or a component of patches
This really is a technical coq point (even moreso than the previous
ones).
In camp-coq:
a: We have axioms about unnamed patches
b: From these given a set of names, we can build named patches
c: Then we can show (as lemmas) the unnamed patches axioms for named
patches
In dic:
a: We have a set of axioms about patches
b: We have additionnal axioms for named patches, so that named patches
are a subset of general patches. Among these axioms is the existence of a
"name" function that is preserved by commutation.
I feel dic's approach is better.
First, it allows to prove that a system where names are computed on the
fly, whereas camp-coq's approach gets us stuck with stored names
Second, Ian's proofs that 'c' holds are difficult, whereas they should
be easy (blame coq). I pretend that given his namedPatch datatype,
and projection as the name function, showing they respect my axioms of
named patches is trivial. Given such proof, i think the most reasonable
course of action is to go with my approach.
7/ Where to introduce names?
This is the biggest point of divergence between my approach and
Ian's. What I propose is quite a
radical change, but i feel that:
-My approach is more elegant, both for paper and computer proofs.
-Given my experience of manipulating functions and equality in coq, it's
good practice to try to avoid them.
Currently, we (camp-paper) only define the merge _algorithm_ and catches
on top of named patches.
This means that we have a lot of places where we say "if the names are
equal, do this, else do that". This is a very good way to define algorithms,
but horrible for when we need to prove them: it does not carry what we
gain from knowing that the two patches have
the same name or not.
The best witness of this is the existence of patch
universes. Universe. In the plural! That's a sign
that something is not going right. They are not quite convincing on paper
(try getting a paper reviewed with "the allowed operations are…"), but on
coq, it's going to be horrible to deal with them. The simple thought that i
might have to invert those guys makes me shiver. And them we'd have to
reprove the axioms a third time for patches in patch universes!
Also, if the patch universe appears in the type of patches, then this type
becomes dangerously self-referential. Otherwise, we will have a lot
of spurious "exists u: universe, … p in u -> q in u -> …"
hypotheses. With these hypotheses, everytime you have a new patch
in your proof you need to prove that it is in the same universe as p and q. Two
solutions:
-add the hypothesis that u is in fact "saturated": it is closed under
all operations but record. But then you need to saturate your universes
to use your theorems :-(
-construct u' that contains p, q and your new patch. This is going to
get boring really quick, and will make proofs ugly.
My proposal is to end the madness. I propose to add an axiom that says
that we can decide whether two patches are related, and draw the consequences.
That is, given two marked sequences s!p!s' and t!q!t', we get
-either !r! and 2 sequences of commutations/patch addition/patch
deletions getting us onto s!p!s' and t!q!t' (ie, name p = name q)
-or u!p'!v!q'!w (or u!q'!v!p'!w) without zigzag, and two sequences of
operations going from u!p'!vq'w to s!p!s' and up'v!q'!w to t!q!t'
(ie, name p <> name q).
This corresponds to "getting back in time" to where the two (maybe one)
patches were first recorded.
Then we add the requirement that comparing names does give us that decision.
This is something we'll have to do anyway, and it's important to be able to
isolate it from the construction of catches and other stuff. It's one additional
axiom, but one less hypothesis for every theorem. I also think it's
the right solution if we want to prove some weakened version of
camp. For example, instead of wanting distributed generation of unique names
(which is impossible), we might want cryptographically unfalsifiable name
(that is, if an attacker can break cryptography, he can create a patch with the
same name as a previous patch). I think that the fact that it works would be
provable in my context, but much harder if we go the "traditionnal" road.
This is the most important point for us to settle i think (maybe there
is another solution to make patch universes workable, but i wouldn't bet
much on it). Ian, i expect your opinion «de pied ferme».
Well, thanks for reading down here folks
Florent.
More information about the darcs-users
mailing list