[darcs-users] Colin Walters blogs on Arch changesets vs Darcs

Mon Nov 22 08:48:58 UTC 2004

On Fri, Nov 19, 2004 at 09:53:15PM -0500, Colin Walters wrote:
> On Fri, 2004-11-19 at 19:16 -0500, Andrew Pimlott wrote:
> 
> Well, as I mentioned, the changeset functionality could fairly easily be
> broken out into a separate program.  I think that would be a useful
> thing to do.  For example, if the Linux kernel added arch-tag headers to
> their files, then you could use 'tla mkpatch' to create a changeset to
> send to them, *even though they still use Bitkeeper*.  They could still
> apply that changeset, and to Bitkeeper, it wouldn't be any different
> than them applying a regular GNU patch, and manually adding in e.g. a
> firmware binary file that would have to be a separate attachment before
> changesets.

Ok, this usage is probably outside what darcs can or should do.  There
will probably always be a place for a "better patch".

> > (And you can do a darcs
> > get --partial to avoid downloading the entire history.)  
> 
> Sure; but as I understand the theory of patches, the lack of a logical
> file identity concept means that if a number of renames have occurred
> upstream since the .tar.gz release, when they receive this Darcs patch,
> it could happen to apply to a totally different file that was moved into
> the same name (a good example would be Makefile.am).  Either that, or
> the patch completely fails to apply.

I think another poster answered this, but let me stress:  A darcs patch
cannot fail to apply (assuming its context is in the target repository)!
The patch algebra ensures that when applied, the patch (actually the
patch representation) specifies the precise files and lines to
modify--no fuzz!  For example, when a patch to file A commutes with a
patch renaming A to B, the patch (representation) changes to refer to
file B.

Of course, this requires all the intervening history.  That is
fundamental to darcs.

> > > In Darcs, this would, as far as I can tell, not work. The reason is
> > > that in order to correctly merge these individual changesets, you
> > > would require access to the entire history at once (in memory, no
> > > less!). That's because Darcs needs that history in order to correctly
> > > reorder patches and infer renames.
> > 
> > If I understand correctly, your basic point is correct, though it is not
> > quite that bad.  With the above command, darcs would have to download
> > all of the patches (or at least all of them up to "some patch") not in
> > the current XEmacs repo, ie all since the branch.  
> 
> That's what I thought; you need all 14 years of history since the fork.

Oops, I think I slipped into assuming that 14 years was the pre-fork
history.  I'm not used to projects as old as Emacs!  So your point is
exactly correct.

> So in Darcs, I would assume that the "horizon" means that patches before
> that are simply dropped?

The notion of dropping history is conceptually incompatible with darcs.
However, you can exclude the patch files themselves from a repo (this is
the checkpoint and partial get features).  This makes operations that
need those patches fail, which usually is not a practical problem.  It
is assumed that somebody keeps around the full history. :-)

> I think that characterizing this as simply "performance" is a bit
> disingenuous.

It's a defense mechanism from being on this list too long.  These are
hard problems and I wish I had more time to work on them.

> > What I think this really calls for is a smart darcs server that can do
> > all the necessary commuting on darcs.gnu.org, and send just the
> > appropriate representation of "some patch". 
> 
> I'm surprised you didn't call me on this, but I realized after I posted
> that there's an obvious question - how do the XEmacs people determine
> which revisions are applicable to ibuffer.el in the first place, without
> traversing the Emacs history anyways?  I think the answer to that would
> have to be some sort of smart server, so Arch doesn't have a magic
> bullet for the initial problem either.

Good catch. :-)

> But for Arch, determining revisions applicable to a file is a problem
> that is bounded by the history *relating to that file*, whereas in
> darcs, it's bounded by the total size of history (in order to do renames
> correctly, right?).

I'm not sure quite what you mean.  It seems like (if you don't have any
fancy indices) you just have to scan linearly through the patches
looking for ones that affect the relevant file.  With darcs, you just
have to keep track of that file's current name.  Maybe I misunderstand
your example.

But there will probably inevitably be cases where darcs doesn't scale as
well.  I think darcs is truly an experiment in this regard, but a very
promising one.

> In Arch, you just do:
> 
> modified = []
> for archive in archives:
>    for changeset in changesets(archive):
>      if changeset.modifies(filename): 
>          modified.append(changeset.name)

Watch it, we don't take kindly to mutation around these parts!

Andrew