[darcs-users] My current work on darcs...

Thu Sep 14 08:28:23 UTC 2017

Am 10.09.2017 um 19:58 schrieb Guillaume Hoffmann:
> thanks for walking us through another (set of) aspect of Darcs'
> codebase. I am so glad someone is taking the sufficient amount of time
> to look into this in a global way!

You are welcome!

> Among the things you have listed, almost everything sound sensible to
> me. For instance I knew about removeFromPatchSet but never took the
> time to try and solve it. I am less aware of the implementation
> details of rebase. But indeed it seems weird that rebase be stored as
> a normal patch, as opposed to a special patch like pending or
> unrevert, since they are all relevant only for the local user of a
> repository.

I have been working somewhat more on this and things became quite messy
rather quickly, so I temporarily gave up (but I may restart it, taking
into account what I learned along the way). Pulling the rebase patch
into PatchSet didn't work out as nicely as I imagined. The reason is
that we often need to manipulate (commute, merge) patch lists at the end
of a Repository/PatchSet. The most natural way to do that (and we do it
like that in many many places) is to first split the list off the
PatchSet and later re-add the changed list back. In most cases, the net
effect to the rebase patch is rather small, but if we keep the invariant
that it always cleanly applies to the end of the PatchSet to which it
belongs, then this will introduce a lot of unneeded work (and also
causes problems with amend, for which adapting the rebase patch is
somewhat special). That suggests we should allow to extract MaybeRebase
from a PatchSet and re-add it later (with the minimally needed fixups);
but then there is no point in adding it to the PatchSet in the first place!

The conclusion is that, yes, treating rebase as a special (incompatible)
patch, handled independently from the main root inventory, is the way to
go. And yes, this should be pretty similar to the pending patch. But to
do this refactor really requires that we completely rehaul the
Repository API first. My current plan is still somewhat vague, but here
is the the idea, roughly:

Like I said last time, whether we operate on the tentative or the "real"
recorded state should not be visible outside the Repository layer. Thus,
we should not distinguish between them in the witnesses for the
Repository type. Instead, we should add a witness for the pending state.
(The repo type parameter should be kept as long as we mix rebase and
normal patches.) This gives us

  Repository rt p wR wP wU

which should be understood to mean that when we read the complete repo
we get

  (recorded :> pending :> working)
  :: PatchSet rt p Origin wR :> Pending p wR wP :> Working p wP wU

where both Pending and Working are simple wrapper types around lists of
(PrimOf p) with no meta data. [An alternative would be to hide the wP in
a common PendingWorking patch but I think we would need to split this
into separate pending and working often enough that having them separate
in the first place makes more sense.] Pending and Working are
incompatible and also both incompatible to p, and neither must contain
conflicts.

Adding a file (or dir) means moving the prim patch that adds it from
working to pending, removing is the opposite. Both change only wP.
Renaming a file (or dir) changes wP and wU since we rename them in
working and add the move patch to pending.

It should now be possible to give precise types to --look-for-moves and
--look-for-replaces, too. Both remove changes from working and add
different ones to pending, affecting only wP. We should always do this
operation before we record anything, even if the options are for record
or amend. (The witnesses for these operations are currently completely
wrong.)

> Now, I recall that Ganesh had a branch that implemented a stash
> command, that modified the implementation of rebase too, but he never
> ended up sharing it. I recall that it had something to do with the rt
> type parameter you mentioned. It would be nice to have his opinion on
> your proposal indeed :)

Right. I was thinking about how stash would fit into the scheme, but not
too deeply.

> Maybe you could also prepare a first patch for the refactor of
> Darcs.Patch.Match you mention at the beginning of you mail, even if
> parts of it would later be changed again if we decide to refactor
> rebase as you propose.

Yes. I think factoring access to PatchSet internals from D.P.Match and
others into D.P.Set is a good independent move and should be done first.
I will extract this prepare a bundle.

> Also, let us not forget your current patch bundles, especially
> http://bugs.darcs.net/patch1593 (separating display and storage of
> patches)!

I am really uncertain about this one, taking Ganesh's comments into
account. My feeling is that the way I did things is the "correct/clean"
way to solve the problem, but I also agree that it seems a bit
heavy-weight for the simple distinctions it allows. I have not yet
looked in detail at Ganesh's alternative attempts. Should do so and then
come to some conclusion.

> I am telling this because, we should start preparing a new release of
> Darcs, since GHC 8.2 has been out for a while now. Maybe by next month
> we can have the codebase in an releasable state.

I can't promise I will be able to sort out all the things you mentioned
until then but I can try.

Cheers
Ben

> 2017-09-08 8:58 GMT-03:00 Ben Franksen <ben.franksen at online.de>:
>> I am currently working on an number of experimental and potentially
>> far-reaching changes.
>>
>> The starting point was a refactor of Darcs.Patch.Match to avoid the
>> repetitive low-level recursion on the PatchSet type. I extracted the
>> recursion mechanics into a few general functions that I moved to
>> Darcs.Patch.Set, with the result that Darcs.Patch.Match no longer needs
>> to import the data constructors of PatchSet. Then I went through the
>> rest of the code base and finally reduced direct access to the PatchSet
>> data constructors to a few modules, mostly D.P.Set and D.P.Depends and
>> those where PatchSets are read and written (D.P.Bundle, D.R.Hashed).
>>
>> This led me to the next thing I am working on, which is rebase. This may
>> seem unrelated at first, but you will see that it is not.
>>
>> The rebase patch (that contains the suspended patches and the fixups) is
>> currently mixed in with normal patches, so it can and will be commuted
>> with them. This leads to a number of problems:
>>
>>  * Normal patches and the rebase patch need to be made compatible by
>> wrapping both in the WrappedNamed type. This is ugly and complicates the
>> type hierarchy for patches.
>>
>> * We have to make sure the rebase patch always gets commuted back to the
>> end before we write changes to disk. This affects all commands that can
>> modify the recorded state.
>>
>> * The standard patch commutation interfaces aren't aware of the fact
>> that there should only ever be a single rebase patch, so implementing
>> them leads to "impossible" cases.
>>
>> There is also some fairly elaborated type machinery at work to
>> distinguish repos with and without a rebase in progress. I am not quite
>> sure this is necessary, though I do understand the motivation; more on
>> that below.
>>
>> I had the idea that a better place to store the rebase patch is to make
>> it an optional member of the PatchSet data type. This has two
>> consequences: the rebase patch is no longer mixed with normal patches in
>> patch sequences; so WrappedNamed can be removed and PatchInfoAnd reverts
>> back to (hopefully) containing a Named patch. I experimented a while
>> with various representations and finally came up with
>>
>> data PatchSet rt p wStart wY where
>>     PatchSet :: RL (Tagged rt p) wStart wX
>>              -> RL (PatchInfoAnd rt p) wX wY
>>              -> MaybeRebase p wY
>>              -> PatchSet rt p wStart wY
>>
>> data MaybeRebase p wX where
>>     NoRebase  :: MaybeRebase p wX
>>     YesRebase :: (PrimPatchBase p, FromPrim p, Effect p, Commute p)
>>               => FL (RebaseItem p) wX wY
>>               -> MaybeRebase p wX
>>
>> It would be possible to integrate this with the existing repo type
>> distinction by adding an rt parameter to MaybeRebase i.e.
>>
>> data MaybeRebase rt p wX where
>>     NoRebase  :: MaybeRebase ('RepoType 'NoRebase) p wX
>>     YesRebase :: (PrimPatchBase p, FromPrim p, Effect p, Commute p)
>>               => FL (RebaseItem p) wX wY
>>               -> MaybeRebase ('RepoType 'IsRebase) p wX
>>
>> But I would rather get rid of RepoType and the rt type parameter because
>> I think that this kind of type distinction is not useful enough in
>> practice to justify the cost in complexity.
>>
>> MaybeRebase is rather similar in structure to WrappedNamed: It hides the
>> right witness behind an existential, expressing that it "dangles off the
>> main repo"; and it also packs the constraints with the rebase patch that
>> are needed to adapt the rebase patch when (normal) patches are removed
>> from or added to the the end of the patch set (using slightly modified
>> versions of addFixupsToSuspended and removeFixupsFromSuspended).
>>
>> Since access to PatchSet internals is no longer spread all over the
>> place (see the beginning of this message), it is now feasible to adapt
>> the remaining functions that do access them so as to add/remove the
>> necessary fixups to the rebase patch. This work led me consider how
>> patches are writen and read to/from disk. So I studied
>> Darcs.Repository.Hashed. It exports a very large number of functions
>> that operate on the disk representation of repositories, mostly
>> inventories (storage of the patches themselves is handled by the Cache
>> subsystem; there are also the pending patch and the unrevert bundle). I
>> looked at the implementation in detail and discovered the following
>> interesting fact:
>>
>> The functions that remove patches from a repo/inventory are in no way
>> more efficient than writing out a complete (modified) PatchSet as a new
>> tentative state, but in fact rather less so. They all end up calling
>> removeFromTentativeInventory, which is, more or less,
>>
>> readTentativeRepo >>= removeFromPatchSet to_remove >>=
>> writeTentativeInventory
>>
>> And removeFromPatchSet will re-do all the commutations that the caller
>> has already done (in order to provide the to_remove FL of patches). Same
>> goes for functions that replace patches since they first remove and then
>> add them. It makes sense therefore to re-write commands to no longer use
>> the remove and replace functionality and instead directly calculate new
>> PatchSets and then write them to the tentative inventory. The latter is
>> already optimized to avoid writing inventories (which correspond to
>> Tagged sections of the PatchSet) that were not touched during
>> commutation, since these still have their hash attached. This will
>> considerably cut down on the number functions exported by D.R.Hashed. It
>> will also reduce exposure to infelicities in the patch theory for
>> RepoPatchV2 (whether commutation of Duplicate patches fails or succeeds
>> may depend on the order in which the commutation is performed). This can
>> and should be done before adding separate read and write operations for
>> the rebase patch.
>>
>> (BTW, adding patches (at the end) really just appends them to an
>> existing inventory, so we should keep these functions.)
>>
>> Back to rebase. Scrapping WrappedNamed will make things a lot simpler. I
>> would also like to get rid of the 'rt' (for RepoType) type parameter
>> together with all the type level distinctions between repos with and
>> without a rebase in progress. This is certainly debatable and Ganesh may
>> have a different opinion. My reasoning is as follows: assuming the
>> rebase patch is no longer mixed in with normal patches, and instead
>> reading and writing of inventories will handle the rebase patch
>> separately (like e.g. the pending patch is handled separately now), the
>> extra wrapping of repo jobs in order to move the rebase patch to the end
>> is also no longer necessary. The latter was, I think, the main
>> motivation for the repo type distinction. Without it, the benefits
>> (extra type safety) are not great enough to justify the cost (extra
>> complexity). Similar levels of safety can be had with more traditional
>> methods: modularity and pattern matching e.g. on MaybeRebase.
>>
>> Apropos modularity. I think this is one of the major weaknesses of the
>> current code base of darcs. We have almost no type abstraction: the
>> modules usually export all data constructors and we use pattern matching
>> and low-level recursion all over the place. That makes it almost
>> impossible to guarantee non-trivial invariants, unless they are
>> expressed directly in the types. This in turn motivates the introduction
>> of more and more advanced type shenanigans. I guess one reason darcs
>> took this direction is that in the past it allowed quick and dirty
>> addition of new features without much thought about coherent (internal)
>> API design. (I am /not/ referring to rebase here, the design of which I
>> like pretty much, except for the way it is integrated with the rest of
>> darcs).
>>
>> And a last point about typing: I have come to the conclusion that it
>> doesn't make much sense to distinguish between recorded and tentative
>> state at the type level (the wR and wT type parameters for the
>> Repository type). Here is a short description of the transaction
>> protocol: when changing the repository state, first copy the recorded
>> state to tentative; make changes only there; if all goes well and the
>> user doesn't abort, finalize by moving tentative back to recorded,
>> otherwise delete tentative and restore the original recorded state. As
>> far as I know, all commands except rebase run in one single such
>> transaction. The RepoJob machinery takes care of starting the
>> transaction, so the commands only ever see the tentative state, so how
>> could they ever accidentally mix recorded and tentative? The exception
>> for rebase is when patches are suspended in a rebase pull or rebase
>> apply. Here we commit before suspending patches for reasons that aren't
>> quite clear to me (a comment in the code suggests that it is because
>> readUnrecorded lies about the state it reads). I am pretty sure that
>> with a bit of refactoring this hole can be closed at which point I would
>> like get rid of the wT type parameter and instead move finalization
>> (commit or abort) to the RepoJob, using the standard Haskell tools
>> (finally, bracket, etc) to make sure commands never touch the recorded
>> state except through a transaction.
>>
>> Cheers
>> Ben
>>
>> _______________________________________________
>> darcs-users mailing list
>> darcs-users at osuosl.org
>> https://lists.osuosl.org/mailman/listinfo/darcs-users