[darcs-devel] Rebase implementation: patch vs. repo level

Sat Jul 4 10:06:54 UTC 2015

BTW, this is extremely helpful for me to understand how things in darcs 
actually work! We should condense this into some sort of documentation, 
perhaps on the wiki.

Ganesh Sittampalam wrote:
> On 03/07/2015 09:13, Ben Franksen wrote:
>> Ganesh Sittampalam wrote:
>> [myself:]
>> 
>>          --> u
>>        /
>> s --> r
>>        \
>>          --> t
>> 
>> instead of
>> 
>> s --> r --> u --> t.
> 
> I'm not quite sure where to fit in tentative actually :-)
> 
> An alternative view might be this:
> 
>    --> r
>  /      \
> s         --> u
>  \      /
>    --> t
> 
> reflecting the possibility that tentative could end up behind recorded
> (e.g. for an obliterate) rather than ahead of it.

Right I forgot about obliterate (and amend). Your picture looks as if it is 
much closer to the truth. BTW, do we actually _move_ things from unrecorded 
to tentative during record or amend, so that during an operation we maintain

   --> r
 /
s
 \
   --> t --> u

And at the end of the operation, if it was successful, discard 'r' and 
replace it with the new 't', ending up with the same picture we started 
with?

>>> The main technical difficulty is simply tracking whether the patches
>>> being rebased have been "affected" or not. This is actually quite
>>> subtle; for example if patch B depends on patch A and patch A changes
>>> identity, then B has to too [modulo the darcs-2 duplicates hackery that
>>> means you can sometimes cheat the system, but we're both agreed that's a
>>> misfeature].
>>>
>>> So concretely, for example if B is suspended and you amend-record A
>>> that's still in the main repository, you now need to check whether B
>>> actually depended on A, presumably by doing a test commute. There are
>>> lots of other cases like this to track, and while I think this could all
>>> be done in theory, it feels hard to do reliably in practice. Also, users
>>> might be intending not to change the identity of their patches and be
>>> surprised by some dependency.
>> 
>> I wonder about this. My initial take would be: whenever we add a patch to
>> the main repo, we also add its inverse to the front of the rebase list,
>> then try to commute it as far as possible to the right. If we reach the
>> end we can drop it and know that nothing needs to be amended. If not, we
>> have a conflict and can warn the user that the rebase patch where the
>> commute failed (and perhaps some of those which follow it) may have to be
>> amended. Wouldn't that work?
> 
> That's roughly the idea, but there are plenty of subtleties that make me
> worried about getting it right - and the consequences of leaving a patch
> with the old identity could be significant.
> 
> For example:
> 
>  - As I think I mentioned in another thread recently, for an
> amend-record, the amendment is what we commute through the rebase list,
> to get the fewest possible conflicts. That could easily lead to a
> changed dependency not being spotted.

Let me see if I understand this. You are talking about an amend with a 
rebase in progress. One or more of the suspended patches might depend on the 
one we amend, which invalidates them (where by "invalidate" I mean that a 
suspended patch must be amended when finally unsuspended). But we currently 
commute only the difference between the original and the amended patch to 
the right, which might mean we don't detect these dependencies.

I think the obvious solution is to first try to commute the whole patch to 
the right _before_ it gets amended. This is just a tentative commute that we 
immediately discard; it serves solely to warn the user about any possible 
amend-on-unsuspend. This would be done before we actually start the 
operation; the commutation of the difference and coalescing with neighboring 
fixups happens later and is not affected. (BTW, obliterate would need the 
same treatment.)

>  - If we just "dirty" all the following patches once the commuting
> fails, we might miss some that we actually didn't need to dirty because
> they would have commuted with both the current fixup and the patch that
> depended on it. So we have to do some extra commuting to check that.

My idea was not to "dirty" suspended patches in any permanent way. Rather, 
we should be conservative when warning the user, possibly listing more 
patches that would need to be amended on unsuspend than would actually be 
the case. This means we (and the user) are on the safe side. I still want us 
to amend them (when they are finally unsuspended) only if this is actually 
necessary.

However, thinking about this a bit more I think we need not be conservative 
at all:

We could first move the affected patch P (the one we want to amend or 
obliterate) as far to the right as possible until we hit a suspended patch, 
say Q, as explained above. Then we try to commute --one by one-- what 
remains on the right of Q leftward until it is just before P, if possible. 
Every patch for which this fails is added to the needs-amend-on-unsuspend 
set we report to the user. I think this scheme should let us report the 
minimal set of affected ("dirty") patches.

>  - Currently, we try to simplify the fixups as much as possible using
> coalescing. I think that might complicate things, though perhaps it
> doesn't as it only matters if existing fixups are encountered in which
> case the patches to the right should already be dirtied.

See above. I may be wrong, but I think (permanently) dirtying suspended 
patches is not needed and would only complicate things.

> That said, having thought this through once more while writing this
> email to try to come up with problems, I think maybe the algorithm is
> plausible with some careful treatment for amend-record. But I still
> would feel happier with a "proper" stash command that guarantees to
> protect identities.

I am not opposed to 'stash' at all, quite the contrary. I very much like the 
idea of giving the user guarantees.

I just think that it would be nice if we could (a) also avoid unnecessary 
amending on rebase unsuspend and (b) unify rebase suspend with stash to 
avoid conceptual (and implementation) bloat. In my picture 'stash' would be 
an alias for 'rebase suspend --keep-identity'. This sets an internal flag on 
the suspended patch which turns any warning about possible amend-on-
unsuspend into an error saying that you must first rebase unsuspend (or 
unstash or rebase obliterate) these patches because I (darcs) promised that 
we would be able to re-apply them without first amending them. We could 
additionally offer a --force-amend option for the case where the user 
expressly wishes darcs to forget about its promise, i.e. reset the no-amend 
flag, effectively turning a stashed patch into a regular suspended one.

Cheers
Ben
-- 
"Make it so they have to reboot after every typo." ― Scott Adams