[darcs-devel] Rebase implementation: patch vs. repo level

Fri Jul 3 06:29:36 UTC 2015

Hi Ben,

On 02/07/2015 22:02, Ben Franksen wrote:

> a while ago we talked about the rebase implementation. You mentioned that 
> rebase should have been implemented on the repo level (but was not for 
> various practical reasons). I was thinking about this a few days ago when I 
> noted that a rebased patch always looses its identity, i.e. gets amended, 
> even if it's just suspend + immediate unsuspend.
> 
> Which makes rebase a bit less useful than it could be. I am now using 
> obliterate -O / apply if I just want to temporarily remove changes without 
> amending them. A repo-level implementation could be better in this regard, 
> making rebase suspend/unsuspend the tool of choice for operations that 
> _potentially_ rebase a number of patches.

I've had other people mention this too and it's something I'm actually
thinking about at the moment. However I'm not sure that a repo-level
implementation is the real issue here.

The main technical difficulty is simply tracking whether the patches
being rebased have been "affected" or not. This is actually quite
subtle; for example if patch B depends on patch A and patch A changes
identity, then B has to too [modulo the darcs-2 duplicates hackery that
means you can sometimes cheat the system, but we're both agreed that's a
misfeature].

So concretely, for example if B is suspended and you amend-record A
that's still in the main repository, you now need to check whether B
actually depended on A, presumably by doing a test commute. There are
lots of other cases like this to track, and while I think this could all
be done in theory, it feels hard to do reliably in practice. Also, users
might be intending not to change the identity of their patches and be
surprised by some dependency.

So what I'd prefer to do is implement a separate 'darcs stash' command
where the user is explicitly expressing their desire to preserve
identity, at the time of suspension.

I actually started on this, along the same lines as the rebase
implementation. But then I realised that following the same pattern
involved repeating quite a few of the same hacks as rebase uses, but in
a different form, to the extent that the extra technical debt created
seemed unjustified. So then I started thinking about simplifying the
rebase hacks, and I had an idea that perhaps moving the 'Suspended'
constructor into the 'Named' patch type was the right way to go. So I'm
currently exploring that particular rabbit hole....

> Now to a related question:
> 
> Am I right that implementing rebase on the repo level would mean adding yet 
> another phantom type to Repository, something like
> 
> data Repository (p :: * -> * -> *)
>   wRecordedstate wUnrecordedstate wTentativestate wRebasestate

Hmm. To be honest I'm not sure, but I guess so. Or maybe we don't really
care about the "end" of the rebase state because most operations happen
at the beginning of the state where it meets up with the main
repository, in which case we don't need a new witness at all.

> ? Or do we need two new states in order to also represent the "fixup" needed 
> to adapt the rebase patches to new records or pulls (or even just 
> modifications of the working tree, not sure if that is needed). And the list 
> of patches that makes up the rebase state, where would it start? At the 
> wTentativestate or at wUnrecordedstate or even at wRecordedstate?

I think it would start at wTentativeState. I don't think we would need a
witness for the fixups specially, that's already handled by them being
interleaved in a list along with the suspended patches.

> One thing I do not understand about the Repository witnesses: I may have a 
> wrong understanding of what "tentative" means, but judging from the comment 
> 
> -- the recorded state of the repository (i.e. what darcs get would 
> retrieve),
> -- the unrecorded state (what's in the working directory now),
> -- and the tentative state, which represents work in progress that will
> -- eventually become the new recorded state unless something goes wrong.
> 
> shouldn't wUnrecordedstate and wTentativestate always be equal whenever we 
> start executing a command and be again equal when we finish with it? And if 
> yes, could we perhaps express this more directly in the types of the 
> functions involved?

Yes, that's right, assuming you meant wRecordedState and wTentativeState.

You can see this in the type of the constructors of
'Darcs.Repository.Job.RepoJob', e.g:

      RepoJob (forall p wR wU . (RepoPatch p, ApplyState p ~ Tree,
ApplyState (PrimOf p) ~ Tree)
               => Repository p wR wU wR -> IO a)

I think Darcs.Repository.Internal.finalizeRepositoryChanges and
D.R.I.revertRepositoryChanges also re-establish this invariant, but
because they take the starting state as a parameter and the ending state
isn't explicit, you can't see that in the type.

The basic point of 'tentative' is to allow for atomic changes to the
repository - if an operation aborts halfway, the repo isn't left in a
half-done state.

Cheers,

Ganesh