[darcs-devel] Re: A darcs that can pull from git

David Roundy droundy at abridgegame.org
Thu Apr 28 05:42:35 PDT 2005


On Wed, Apr 27, 2005 at 04:22:53PM +0200, Juliusz Chroboczek wrote:
> There are three different handles on a patch.  One is the tuple (date,
> author, name), which I'll call the ``user-visible handle''.  One is
> the patch id, which is an abstract blob of data that is only compared
> for equality.  And one is the patch filename, which is only compared
> for equality and has the additional property of being in a format that
> can be written to the filesystem.

[...]

I'm not sure that allowing different patches with the same "user-visible
handle" is a worthy goal.  How would the user differentiate between the
two? Obviously to *some* extent, the "patch id" must be user-visible, or
our users would be simple stuck.  I hope you'll also agree that the
"user-visible handle" must be immutatable.  Given that the "patch id" must
be user-visible, and the "user-visible handle" must be immutable, why not
add the "patch id" to the "user-visible handle", and vice versa, and have a
single blob which is both the "user-visible handle" and the "patch id"?

> > In the case of git, the question would be whether there could end up in
> > git a pair of commits that are converted to identical darcs patch IDs,
> > but are not themselves identical.
> 
> Because of the mismatch between the Darcs and Git models, I'm making some
> arbitrary decisions; most notably, when I see a Git commit with multiple
> parents, I arbitrarily choose one of the parents as ``the'' predecessor
> of the Darcs patch.

Argh.  I don't like this at all.  (Switching subjects completely to the
technicalities of the git-darcs translation.)  Does this mean that you
aren't encoding the entire tree of the git history? Do both parents end up
being encoded?

What I've envisioned (in my crazy-tagging scheme) is that the conversion
would be equivalent to the following (for commit C with parents P1 and P2).

1. We must have already converted parents P1 and P2.  If not, go back and
convert them first.

2. darcs get -t P1 converted converted-temp

3. cd converted-temp
   darcs pull -t P2 ../converted

4. echo y | darcs revert -a (in case there were conflicts)

5. rm -rf everything but _darcs

6. git get C (this being very crude paraphrase

7. darcs record -a -m C-name

8. darcs tag -m C

9. cd ../converted
   darcs pull -a ../converted-temp

This would mean that the darcs repository could reproduce any state in the
(branched) history of the git repository, and would have all the parent
information.  The catch (and there are many) is that it might be slow, and
isn't simple either.  But I don't like the idea of having to "pick" one of
the two parents as being special.  Of course, you need to do one of them
first, but that's reversible by simple commutation.

> > What if we were to add (when recording or tagging) an extra
> > bit at the end of a long comment of the form:
> 
> > darcs-extra-id: adsagagdhhdlkd
> 
> > We could then make newer versions of darcs hide this ID in the user
> > interface (or optionally show it, and perhaps allow searching by
> > it).
> 
> That's not useful, unless the extra ID can be efficiently found in the
> inventory.  It's also much more confusing than the simple scheme of
> making ids opaque.

Okay, I'm confused now.  How's it either unuseful or confusing.  I'm
thinking perhaps I was unclear.  I don't mean that this extra-id will be
treated as special in any sense except that it isn't displayed to the user,
so there's never a need to "find" it in the inventory.  I'd be keeping the
equivalence of the first two IDs, the user-visible one and the one that
must be unique (i.e. not the filesystem one), except that I'd make one line
not actually user-visible.

You could stick here

git-parent: adggdh

or whatever you like.

> And now, back to our programmed nit-picking.
> 
> > E.g. mv patches or replace patches.
> 
> I have no doubt that the Git people will evolve a standard way of
> representing mv within Git, so I'll rather wait until that happens.

The prevailing opinion (Linus') seems to be that there's no need to
represent mv within git.

> I think that if we want to be able to push to arbitrary VC systems (as
> I've mentioned, it's not Git itself that I'm dreaming about), we need
> a way to discard Darcs-specific information -- replace, mergers,
> setpref patches are not likely to be representable in alien repository
> types.

Linus has expressed willingness to allow something like

darcs changes:
mv foo bar

at the end of the log message, provided the content is human-readable and
interesting to developers, and I'd say we can just go with that way of
representing certain patches.  Patches like setpref or replace are
definitely easy enough to handle this way.  I would like it to be possible
that darcs is the "main" repository, and the git users are the "clients".

For mergers see below.

> With the arbitrary patch ids outlined above, that's easy: you take the
> Darcs patch, compute its flattening (merger_equivalent and friends),
> generate a new patch id -- without changing the user-visible handle
> --, and push the result.  Of course, if that patch gets pulled back
> into Darcs you'll get a merge conflict, but what did you expect?

Here I strongly disagree.  I don't want to lose *any* information when
passing a change from darcs to git, and certainly we shouldn't get a
conflict when pulling back again!

If there are conflicts in a patch, then it's always possible to commute it
back such that the conflicts disappear, and then insert it into the git
history at that stage.  This isn't easy, but at the first stage we can
simply refuse to push to git patches that we don't yet know how to
translate.

If we knew at record time that we were planning on pushing to git later, we
could tag after each record, and by pushing those tags we could ensure that
we never had to push a merger.  I wouldn't mind this constraint on
git-interoperable repositories.
-- 
David Roundy
http://www.darcs.net




More information about the darcs-devel mailing list