[darcs-users] metarepositories and disk space

Scott L. Burson Gyro at zeta-soft.com
Tue Sep 7 02:17:18 UTC 2004


On Sun August 29 2004 04:20 am, David Roundy wrote:
> On Sun, Aug 22, 2004 at 10:58:48PM -0700, Scott L. Burson wrote:
> > Thus there are several kinds of information that are of interest:
> >
> > () What patches form the difference between two configurations.  Darcs
> > might be able to compute this for two configurations in the same line,
> > but can't do it between branches, since it has no representation for
> > multiple branches.
>
> Yes, it can.  It can compare any two darcs repositories with common
> history.  Multiple branches are represented as multiple repositories, so
> there's no reason you couldn't compare them.  Try darcs push/pull
> --dry-run.

Okay, point taken, but...

> > () What configurations contain a given patch.  Again, darcs can't compute
> > this across branches.
>
> True, but since darcs can tell you for any given branch whether it has the
> given patch, all you'd need to do is call darcs on all your branches...

Again, point taken, but...

> Of course, if you are set up for
> write access, two-way synchronization is also pretty much trivial to
> script.  If it can be done right in under two hundred lines of python or
> perl (take your pick), I figure it doesn't need to be done in darcs.

... but what if many users find themselves writing or wanting to write those 
two hundred lines?  I guess that hasn't happened yet, and it's not clear that 
it will -- but personally I think it's likely, eventually.

> Furthermore, you suggest the most useful primitive would be two-way
> synchronization between "public repo collections" as a useful primitive.
> Of course, this is only possible if you have write access to both repos,
> which rarely happens in distributed development.

In the system I envision there's no reason to deny anyone the necessary access 
for two-way synchronization, except that it opens one to a denial-of-service 
attack by someone transferring huge masses of patches -- which can be blocked 
in other ways.  Since synchronization only adds information, and never 
deletes or alters it, nothing can be lost and no harm can be done, and the 
state of the local tree remains under the control of its user, unless s/he 
specifically cedes that control to a specific other repo (and even then, 
recovery from any damage -- a patch, say, that should not have been published 
-- is trivial).

> > To me this actually simplifies the darcs user interface, because no
> > decisions have to be made at repo synchronization time.  I don't have to
> > fiddle with patches whose names match a particular regexp, or anything
> > like that -- I just get everything.  I don't have to worry much about the
> > bandwidth or disk space involved, because no information is transmitted
> > more than once or copied more than once locally (I can choose to have
> > multiple local copies for my own reasons, of course, but darcs won't
> > impose this on me).  I don't even have to decide in advance which
> > configurations I might be interested in -- the configuration definitions
> > and associated patch sets are likely to be small enough (since
> > programmers can create patches only at a certain rate) that I don't have
> > to worry about them.  And once I have everything, I can (but don't have
> > to) look at all of it in as much detail as I wish before deciding what to
> > use.
>
> It seems that here you come back to your real point, which is bandwidth.

No, I'm still really talking about user-interface complexity.  The bandwidth 
point is subordinate: it's simpler if the user doesn't have to think about 
it.  But I think that's only one of several simplifications (which is why 
I've left my entire paragraph in).

> Each layer that is added means one more possibility for every command.
> Once you've added the "collection" layer, how do you compare branches in
> two different collections?

That wouldn't make sense; it would be like comparing branches in unrelated 
repos.

> You need commands to add branches to
> collections, to remove branches from collections.

It's not clear to me that you need either of these.  A branch is created 
within a collection rather than being added subsequently.  I don't really see 
why you'd want to remove it, or even exactly what that would mean for 
synchronization, but I can see that one might want to depublish an abandoned 
branch so as not to trouble other repo-owners with it.

I suppose you _could_ try to give branches an existence separate from 
collections, but that seems even to me to be pointless complexity.  It's not 
what I'm proposing.

Anyway, you've written a considered reply to my previous message, which I 
appreciate, and so I've tried to do the same, but I'm happy to drop this 
discussion if you're finding it tedious.  Alas, I haven't had the time to 
really get to know darcs as it is yet, which means this whole discussion is 
somewhat ungrounded on my end (though not entirely since I do have experience 
with other SCM systems).  Give me a few months to get to know it, and perhaps 
it will make sense to return to this conversation, or perhaps I will agree 
there is no need.

-- Scott




More information about the darcs-users mailing list