[darcs-devel] Flattening of patches

David Roundy droundy at abridgegame.org
Wed Oct 26 06:14:26 PDT 2005


On Tue, Oct 25, 2005 at 08:06:09PM +0200, Juliusz Chroboczek wrote:
> I'm trying to get Darcs-Git to an actually usable state.  Number 2 on my
> to-do list is allowing pushing from Darcs to Git.
> 
> Pushing from Darcs to Git fails whenever it finds data that is not
> convertible.  This is a necessary property, in order to ensure
> invariance of the Darcs->Git->Darcs round-trip.
> 
> I'm thinking of adding two new flags to push, pull, send and apply:
> 
>   - --flatten, which replaces all mergers by their merger_equivalent;
>   - --flatten-all, which additionally replaces all moves with their
>     rm/add equivalent, all replaces with their hunk equivalent, and
>     discards all changeprefs (am I missing any)?
> 
> The reason for two distinct flags is that --flatten might be useful
> independently of Git in order to work around the pathological
> performance cases we've been witnessing sometimes.

This sounds at least moderately reasonable.  You'll still get conflicts
when pulling back into darcs, but at least you won't get corruption.  On
the other hand, I think there's a better approach (see below).

> Now in order to preserve Darcs' invariants, the flattened patch must
> have a new patch identity.  In order to prevent spurious conflicts,
> flattening must generate patch ids that are deterministically
> generated from the original patch id.
>
> I'm thinking of doing that by simply appending a line that says either
> 
>   Darcs-Flatten: yes

This isn't sufficient, since the identity of the flattened patch depends
not only on the original patch, but also on the context in which it was
flattened.  Somehow the new patch ID also needs to encode the context in
which the patch lived when it got flattened.  This greatly increases the
odds of introducing spurious conflicts when flattening, but perhaps with
appropriate heuristics one could arrange that often when patches get
flattened they are flattened in the same context.

Unfortunately, context tends to be large, so this doesn't make for an
appealing solution.  One could hash the context, but then you'd lose
information indicating the difference between two distinct flattened
patches.

For getting conflicting merges from darcs into git, I think the real
solution isn't to flatten, but rather to recontext.  For any patch that has
a merger in it, you can always commute that patch back in time until it
doesn't have a merger, and that patch can be applied safely to git.  It's
not an easy problem, but it preserves the actual change information, which
is particularly helpful when you've got conflicts.

For other sorts of changes (replace, mv, etc), I think the best solution
would be to add annotations into the git commit message so we could read
the patch back losslessly.

> (David: I'll take the opportunity to mention that I'm still convinced
> that patch ids should be arbitrary opaque tokens, not hashes of the
> changelog message.  All that nonsense would be unnecessary if they were.)

I'm not sure I follow.  The patch ID *is* the changelog message (plus date
and author).  Making the patch ID opaque would mean that users would have
no way of identifying patches, as far as I can tell.  That seems to me like
a major step backwards (but as I said, I'm not sure I understand you).  I'm
also not clear as to how an arbitrary opaque patch ID would help with this
issue.
-- 
David Roundy
http://www.darcs.net




More information about the darcs-devel mailing list