[darcs-devel] darcs-3 compatibility

Mon Mar 29 13:49:21 UTC 2021

Am 29.03.21 um 09:58 schrieb Ganesh Sittampalam:
> On 20/02/2021 16:07, Ben Franksen wrote:
> 
>> I completely agree with that as far as the patch format is concerned.
>> This part is pretty well abstracted and keeping the old formats in the
>> code base is not a serious maintenance burden.
>>
>> What /is/ a serious burden is keeping compatibility at the Repository
>> layer.
> 
> Agreed to both. What I'd hope for is some kind of transition like we had
> with darcs-1 to darcs-2 and old-fashioned to hashed, but perhaps faster.
> 
> From what I recall, hashed format supported darcs-1 and darcs-2 (and
> indeed still does!) whereas old-fashioned remained darcs-1 only. It was
> possible to exchange patches between old-fashioned darcs-1 and hashed
> darcs-1 without any problems. Eventually, old-fashioned was retired.
> 
> If I name your new repository format 'new-shiny'

I prefer 'branched' ;-)

> for now then what I
> would hope for is that new-shiny would support darcs-2 and darcs-3 (and
> maybe darcs-1), and that hashed wouldn't need to support darcs-3 but
> could if it wasn't painful. We could retire hashed after a shorter
> time-frame than we did with old-fashioned, but still not immediately.
> And it would be easy to exchange patches between hashed darcs-2 and
> new-shiny darcs-2.
> 
> That way I could start using new-shiny with all my darcs-2 repos without
> being worried about the painful conversion and the risk of becoming
> trapped. Once confident in the new setup I would probably be happy to
> abandon all the old hashed copies of the repos. But I could defer the
> darcs-2 => darcs-3 migration, which I think will be painful and scary,
> and do it gradually starting with smaller repos and working up as I
> built confidence.

What you describe here as desirable is, I think, quite typical of the
user POV. I readily admit that my idea of restricting support for older
patch formats benefits mostly us as developers, as it means we (that is,
mostly, I) could forge ahead with repository changes, mostly
disregarding compatibility issues. I also like that couling teh
transition to darcs-3 with that to teh new branched repo format gently
pushes users toward upgrading to darcs-3, so they can take advantage of
new features.

But I won't insist on it. Instead, let me drop that proposal for the
moment and discuss how to manage an independent transition to the
'branched' repo format.

To quickly recap, in a branched repo, the current repo state consists of
a small file with a few "root" hashes (a "branch file"). All repo state
(even the unrevert state, the pending state, and the rebase state) will
become hashed files; unrevert will be represented as an inventory
instead of a patch bundle. Instead of _darcs/hashed_inventory we'll have
_darcs/branch, which contains one word: the name of the current branch.
That name coincides with a branch file under _darcs/branches/. (At first
we will have only one branch file, probably named "master", support for
multiple branches in the UI will be added later.)

(BTW, a "pure", that is, incompatible branched repo could dispense with
storing intermediate state in "tentative" files alltogether and instead
keep the hand full of hashes in memory.)

(Note to self: there is also _darcs/index and _darcs/patch_index; need
to think about how to properly integrate them into the transaction
protocol. Thankfully both are just caches and can be thrown away and
re-created w/o loosing any information.)

Converting back and forth between 'hashed' and 'branched' (or perhaps
'hashed|branched') is possible, as both formats contain the same
information. During the transition period we could support read-only
access to 'branched' repos for older versions of darcs. To do that,
every time we finish a transaction, before we (atomically) write the new
hashes into the current branch file, we *also* write the new state in
the current 'hashed' format by re-creating (if necessary)
_darcs/hashed_inventory, _darcs/patches/pending, _darcs/rebase, and
_darcs/patches/unrevert, just as we do now.

We could even allow write access. This requires a way to reliably detect
whether an old version of darcs has modified e.g.
_darcs/hashed_inventory (or one of the other legacy files mentioned
above). I would rather not rely on timestamps for that; instead we could
store the hash sum of the old files in addition to the new ones. It
seems to be only a small extra effort to do that. This means we can
avoid adding a new format alltogether, and instead automatically
"upgrade" the repo whenever we detect modification of a legacy file.

Directly exchanging patches via 'darcs pull' amounts to read-only
compatibility: to pull patches from a non-branched into a branched repo
we have to be able to read the old repo state; and to allow old darcs
versions to pull patches from a branched repo requires that we maintain
the non-branched state as well as the branched one. Supporting 'darcs
push' to branched repos on the same machine amounts to full
compatibility: in that case the 'darcs apply' is done using the same
darcs version as the one used for the 'darcs push', i.e. one that may
not know anything about branched repos.

So far so good. Regardless of whether we aim for full compatibility or
read-only compatibility, these are the downsides:

(1) We have to continue to maintain (and invoke) large amounts of legacy
    code during the transition period.

(2) Users may never upgrade their repos to darcs-3 because they don't
    see enough practical benefit to outweigh the (perceived or real)
    risk.

(3) It is difficult to systematically test against regressions, since we
    cannot currently invoke a legacy darcs version from the test suite.

Cheers
Ben
-- 
I would rather have questions that cannot be answered, than answers that
cannot be questioned.  -- Richard Feynman