[darcs-users] Colin Walters blogs on Arch changesets vs Darcs

Andrew Pimlott andrew at pimlott.net
Sat Nov 20 00:16:20 UTC 2004

On Fri, Nov 19, 2004 at 03:47:19PM +0000, Mark Stosberg wrote:
> You may be interested in this thoughtful blog entry by Colin Walters,
> explaining how Arch changesets work compared to darcs, and his thoughts
> on matter:
> http://verbum.org/blog/freesoftware/orthogonal-changesets
> He prefers the Arch system, and gives a valid 'use case' where he
> thinks it would be preferable. 
> Anyone care to follow up with their own blog entry, highlighting
> some of un-mentioned benefits of the darcs changeset design? :)

I will follow up here and leave it to someone else to publish the list
archives as a blog.  :-)

On Thu, 18 Nov 2004, Colin Walters wrote:
> I mentioned in my previous blog entry about revision control that I
> thought that the Arch model of changesets which are independent of
> project history is crucial. But why is that?
> And as I mentioned before, an Arch changeset is basically just a
> super-patch that handles binary files and renames. If projects include
> just a bit of constant-sized metadata in their tarballs, the logcial
> file identity, you can run tla mkpatch old-tree new-tree to generate a
> changeset between those two trees. You do not need access to the Arch
> repository.

This could be done with darcs:  Publish the "darcs changes --context" in
the tarball, and create a darcs command that records and sends a patch
based on two trees.  Essentially, it would trust that the context file
is accurate and the old-tree has not been modified, and treat the
old-tree like _darcs/current.  To make this robust, darcs shoud be more
defensive when applying patches than (I believe) it currently is:  It
should verify that the removed lines in hunks match the current file
contents.  Other than that, I don't see any big issues.  (In particular,
I don't see a problem with renames, as Colin suggested.  Of course, you
have to tell the command what you've renamed, but I don't know how arch
could avoid that.)

It admittedly goes somewhat against the grain of darcs.  Furthermore, it
assumes the user has darcs installed (just as the quoted example assumes
the user has tla installed), which raises the question of why he didn't
use darcs to get the code in the first place.  (And you can do a darcs
get --partial to avoid downloading the entire history.)  Nevertheless, I
think we can have this feature if people want it.  I don't see it as a
reason for pereferring arch's less strict notion of patches.

> Consider, for example, the Emacs/XEmacs fork. According to the XEmacs
> history, they forked from an early version of Emacs 19. A bit of
> research on the web places that around 1992. Let's say that RMS and
> the XEmacs leaders get together and decide that forking sucks, and
> decide to merge. Suppose further that Emacs had been using Arch at the
> time of the fork, and the XEmacs people created their fork by simply
> creating a tag from the Emacs archive into their own archive (as would
> be the sensible way to do it).
> In order for the XEmacs people to merge these changesets, this is as
> simple as doing: tla replay $changeset. That goes out to the Emacs
> Arch archive, and retrieves a single changeset tarball, and adds it to
> the XEmacs branch. This merging style is history-sensitive, because
> now the XEmacs branch records that e.g.
> emacs at savannah.gnu.org--2000/emacs--main--21--patch-347 has been
> applied. Note that crucially, one did not need to download all of the
> thousands of changsets gathered in the just 4 years of history since
> ibuffer.el was created, not to mention the 14 years of Emacs history
> since the fork.

With darcs, the XEmacs people would say

    darcs pull --patch 'some patch to ibuffer.el' http://darcs.gnu.org/...

And the darcs patch would probably be more likely to merge correctly (if
there are no conflicts) due to its more strict notion of patches.  (The
darcs command might still be running when our sun dies, but that's just
a performance detail.)  However ...

> In Darcs, this would, as far as I can tell, not work. The reason is
> that in order to correctly merge these individual changesets, you
> would require access to the entire history at once (in memory, no
> less!). That's because Darcs needs that history in order to correctly
> reorder patches and infer renames.

If I understand correctly, your basic point is correct, though it is not
quite that bad.  With the above command, darcs would have to download
all of the patches (or at least all of them up to "some patch") not in
the current XEmacs repo, ie all since the branch.  And it would indeed
suck lots of memory commuting them (again, just the patches since the
branch, but on both forks), however this again is a performance detail.
(I say this jokingly because I do believe that the performance issues
can and will be solved, and they don't bite me hard; but if you need
performance now or can't take the risk that they never get solved, darcs

What I think this really calls for is a smart darcs server that can do
all the necessary commuting on darcs.gnu.org, and send just the
appropriate representation of "some patch".  I think this will happen
some day.  Until then, I think you're right that darcs loses in this
case.  (I'm not convinced this should be a show-stopper, since if you're
going to be doing a lot of merging, downloading the gnu.org repo is a
modest price.)

I hope someone will correct any mistakes in the above.


More information about the darcs-users mailing list