[darcs-users] Cheap branches (was: Re: post-1.0: "isolated directories"?)

David Roundy droundy at abridgegame.org
Sun Oct 17 18:11:23 UTC 2004


On Sun, Oct 17, 2004 at 05:36:24PM +1000, Anthony Towns wrote:
> > I like the fact that you have perl code to create patch names, etc.  In
> > general, perhaps a perl module to parse (and maybe apply, eventually)
> > darcs patches would be interesting.
> 
> Yeah, parsing seems necessary so you can work out when the "}" line is
> closing a patch rather than the contents of a "setpref" line or
> similar. And applying seems fairly easy too; possibly apart from the
> "merger/regrem" patches?

Indeed, everything but the mergers is pretty simple.  Unfortunately,
dealing with the mergers is quite important.  Someone has suggested that we
could include the equivalent patch for mergers, and I'm beginning to think
this would be a good idea.  If we did that, then although you'd need full
darcs to do merges and commutes, at least you'd be able to apply merger
patches without all of darcs' machinery.  On the other hand, perhaps
working towards a "darcs library" that could be used by programs in other
languages would be wiser, since then you'd *have* the full darcs machinery
in other languages.

> > > [0] The repo structure looks like:
> > > 	/srv/darcs-repo/
> > > 		{project name}/
> > Could we do without the {project name}? I don't think there's any reason to
> > break the darcs-repo into separate projects, each of which has its
> > branches.  In a sense, we could view all projects as being branches of the
> > null repository.  I think this would make things simpler, without any
> > significant cost.
> 
> Hrm, that's true. I kind-of like the idea of having separate repositories
> on a single server, though, just so I can keep track of things -- like
> say "du darcs-repo/foobar" to see how much space the foobar repo's taking
> up. Splitting them up by project seems about the right level to do that;
> since sharing patches across projects is pretty unlikely.
>
> > Also, if we allow the branch name to contain /'s (perhaps escaped), we
> > could perhaps emulate a full directory structure, so the user could have
> > the effect of branches by naming their projects foo/head, foo/unstable,
> > etc.
> 
> Yeah. For the moment it's a two-level thing -- you get "project foo"
> and "branch head"; if you want more levels you have to use hyphens or
> similar instead of slashes. *shrug* I prefer to start off limited and
> add features later :)

(I made the mistake of replying to the email before reading the entire
thing, and see that you make a similar suggestion below.  I won't cut this
section though, in case I had any good ideas...)

It just looks to me too much like starting off complicated, and there'll be
no hope of simplifying it later! I think if we can come up with a decent
design, we shouldn't be stuck by these limitations.  I'm fearing that later
when two levels is determined to be insufficient, a third level will be
added with a third name... and eventually we'll have something very
arch-like.  :(

Perhaps we could eliminate the project-name in favor of a base directory?
Or equivalently eliminate the /srv/darcs-repo as a distinct entity.  So
then when I do a push the pathname I use is still broken into two parts,
equivalent to your project name and branch name, but either or both of
these could have '/' in them.  The darcs-repo admin decides where the
distinction is, and to the user it just looks like there are a bunch of
repository directories.

So as a user (someone who pushes, but doesn't admin the server), I know to
push with

darcs push darcs-repo:theserver:foo/bar/baz/myproject

but I don't know (or care) whether foo is the "project name" and
"bar/baz/myproject" is the "branch name" or perhaps "foo/bar/baz" is the
project name and "myproject" is the branch name.

I guess supporting this general case isn't needed immediately, as long as
it's the sort of thing you're thinking when you're thinking of adding
features later--that is, simplifying the user interface by removing
restrictions.

> > > 			inventory-{branch name}
> > It looks like you only support here a flat inventory? This can be a major
> > slowdown as repositories get old (and have lots of patches), which is why
> > it's broken up into _darcs/inventory followed by
> > _darcs/inventories/{patchname}. 
> 
> Right, I don't understand this. Hrm, the manual doesn't seem to mention
> "inventories" anywhere; is there an english explanation of what's going
> on with that anywhere? I guess this is what happens when you
> "checkpoint"?

Ah, that's because I added the "inventories" after writing the docs (an
embarrassingly long time ago).  Basically, there was a problem in that when
the history gets long, since every pull needed to download the entire
history, pulls were getting pretty slow.  So the solution was to break up
the inventory into a series of files.  This can be done in an efficient way
as long as the breaking point is on a tag, and as long as all patches
"before" that tag are included in the tag.  This means that breaking up
the inventory requires both reading and understanding the contents of tag
patches.

The same sort of logic is used when creating the context of a patch
bundle.  The context always starts with a tag, so that tag is effectively a
break point between inventory files.  If your _darcs/inventory isn't broken
up, you can break it up by running darcs optimize with no other options.

Checkpoints are different, they store the actual contents of a repository,
and are stored in _darcs/checkpoints/.  They are created using darcs
optimize --checkpoint, and allow you to avoid downloading older patches at
the cost of not having the older history (although you'll still have the
patch names).

> > I don't think this is a fundamental problem, but needs to be done
> > right, since you can only break the inventory at tags, and only if the
> > tag contains all the patches prior to itself (poke me if I am unclear
> > here).
> 
> So, it sounds like this is what I want to be able to hook my
> "mainline--0.1" and "mainline--0.2" branches together, without bogging
> down in all the old history unless I decide I need it?

Well, one thing storing "inventories/" would gain you is that you'd save on
disk space for the inventory files as well as for the patches, since the
inventory will have a lot in common between various branches.  I think this
may be what you're saying, but I'm not sure...

> Hrm, maybe it'd make more sense to me if I asked this slightly
> differently. One of the the DarcsWiki wishlists is "I wish the repo
> browser had a 'download tarball'"; I'd like my "darcs-repo" to be able
> to do that too, though obviously it's a way away from that now. So, say
> I have versions 0.1.1 - 0.1.17 and 0.2.1 - 0.2.45; where each version
> has a tag, and there are possibly many patches between each version. I
> don't mind having to download every patch since 0.1.1 to darcs get 0.1.17,
> but I don't want to have to get all those patches when I'm getting 0.2.1.
> 
> How would I structure this? As two branches? Or as a single branch,
> with a checkpoint at 0.2.1? If I do it as two branches (so I can later
> release a 0.1.18 if I need to), can I still get "darcs changes" to go
> all the way back to 0.1.1? If someone else adds a patch to 0.1.16 and I
> want to pull that forward into 0.2.46, can darcs magically get the patch
> for 0.1.17 so it can commute it up to 0.2.1 then all the way to 0.2.46?

Hmmm.  A checkpoint is what you want here, and whether you have two
branches or not really is an independent decision.  With a checkpoint you
won't have to download all the old patches, and you'll still have all the
old history... well, you'll still have the old inventories, the patches
obviously by design you won't have.  But you'll be able to convert your
repo to a "full" (as opposed to "partial" which is what I call a repo
gotten via a checkpoint) repository later if you decide you want to examine
the old history.

> Hrm, I took the opportunity to add
> 
>    http://www.scannedinavian.org/DarcsWiki/BestPractices
> 
> and included some babble. Maybe others would like to add questions
> or improve answers or just add some links to other threads that are
> enlightening?

Hmmm.  There's also a "best practices" section in the manual, which is
empty waiting for advice.  I think I may soon add a section or two in there
myself.

> > > 			patches/
> > > 				{original patch name (uncompressed)}
> > > 				{original patch name (uncompressed)}.{branch}
> > In my less-nested picture, there would be ambiguity about the original
> > patch name.  
> 
> Not really, it'd just mean doing, say:
> 
> 	/srv/darcs-repo/
> 		patches/
> 			{patch name}
> 		some/branch/name/
> 			inventory
> 			commuted-patches/
> 				{patch name}
> 		some/other/branch/
> 			inventory
> 			commuted-patches/
> 				{patch name}
> 				{patch name}
> 
> (The theory is you either just grab the first checked in patch of the
> right name; or else there's a branch specific diff that you apply to
> the first checked in patch to get what you need.)

I see, yes this is a better design.

> But like I said, I do like having each project's patches physically
> separated.
> 
> Maybe you could instead have a file in /etc that maps branch-prefixes to
> different repositories? So you could have
> 
> 	/etc/darcs-repo.conf
> 		repo "foobar/experimental" = /srv/foobar/experimental/darcs
> 		repo "foobar" = /srv/foobar/darcs
> 		repo "aj" = /home/aj/darcs
> 		default-repo = /srv/darcs-repo/
> 
> and then say
> 
> 	darcs-repo get foobar/experimental/blah/mainline/0.1 inventory
> 
> or whatever you like, with darcs-repo automatically deciding whether
> that's part of the default repository, or the foobar repository, or the
> "foobar/experimental" repository, or whatever?
>
> Then if you want all your patches shared for all projects, you just set
> a default-repo, and ignore the other issues.

Sounds good, but I wonder if we could avoid the conf file and just have a
flag file at the head of each darcs-repo (which really should get some sort
of a clearer name).  Darcs does a similar search if you call darcs add or
something within a subdirectory, to find what repository you're talking
about.  It's a less flexible solution, but would make the "repository path"
map directly a the file path, which should make things simpler.  Symlinks
could always be used to allow the flexibility of putting the actual
darcs-repo directories wherever one likes.

I imagine there may be options one would like to set on a per-darcs-repo
basis, such as (eventually) caching limits, since I imagine one might
benefit from adding caching to the cgi script.  On the other hand, I guess
a central conf file would still be needed to determine the root directory
of the darcs-repo path, so maybe your suggestion is better anyways.

> Oh, the other reason I like having projects in different directories is
> so I can use permissions to control access.

Good point.

> > Might it be easier to use rcs for this? 
> 
> Hrm, diff/red are pretty easy; but rcs is pretty easy too. Though rcs
> is more designed for incremental changes, whereas I think darcs' usage
> is more along the lines of a variety of minor changes to an original,
> rather than cumulative changes?

Yeah, that's true.  Probably easiest to use diff.
-- 
David Roundy
http://www.abridgegame.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.osuosl.org/pipermail/darcs-users/attachments/20041017/2648078c/attachment.pgp 


More information about the darcs-users mailing list