[darcs-users] Cheap branches (was: Re: post-1.0: "isolated directories"?)

Tue Oct 19 08:41:49 UTC 2004

On Tue, Oct 19, 2004 at 02:36:59AM +1000, Anthony Towns wrote:
> On Sun, Oct 17, 2004 at 02:11:23PM -0400, David Roundy wrote:
> Unfortunately I find I can't push that to my darcs-repo repositories
> because I can't cope with the fact I tagged the 0.1 release. Pretty
> obvious what's next on the TODO :)

:)

> > Ah, that's because I added the "inventories" after writing the docs (an
> > embarrassingly long time ago).  Basically, there was a problem in that when
> > the history gets long, since every pull needed to download the entire
> > history, pulls were getting pretty slow.  So the solution was to break up
> > the inventory into a series of files.  This can be done in an efficient way
> > as long as the breaking point is on a tag, and as long as all patches
> > "before" that tag are included in the tag.  This means that breaking up
> > the inventory requires both reading and understanding the contents of tag
> > patches.
> 
> > The same sort of logic is used when creating the context of a patch
> > bundle.  The context always starts with a tag, so that tag is effectively a
> > break point between inventory files.  
> 
> Hrm, so is the rule "if the only patches that would follow a TAG in
> a context are the exact patches the TAG depends on, then don't bother
> listing them" ?

Right, except that it's really optional whether to bother listing them.
There's nothing wrong with listing all the patches in the repo.  It's just
wasteful.

> > If your _darcs/inventory isn't broken up, you can break it up by
> > running darcs optimize with no other options.
> 
> Hrm. If I have one tree that goes <a, b, TAG_1>, and another that goes
> <a, c>; and I pull from the former to the latter to get <a, c, b, TAG_1>;
> can optimize split up my inventory still? It doesn't seem to be able to?
> Does that matter? I guess not.

No, making optimize able to reorder patches has long been on the TODO list,
but never has seemed sufficiently high priority.  There are plenty of
situations where it would be a real performance advantage to be able to do
so, but I've never really run into any of them.  I guess I have, but the
repo was small enough that darcs' speed didn't bother me even when it had
to do a lot of commutation for even the simplest operations.

> Anyway. So the inventory format then seems to be:
> 
> 	"Starting with tag:" [tag patch id]
> 	[any other inventory stuff]
> 
> and that's interpreted by converting the tag patch id into a filename in
> the usual way, and looking for it in _darcs/inventories; and... iterating
> or recursing? Can "Starting with tag:" appear anywhere but at the very
> top of an inventory? If it can't, that means it's iterative (easy!) not
> recursive (haskell...)

Correct.  The "Starting with tag:" can only happen at the beginning, so
it's pretty easy.

> Hrm, these inventories files can change quite a bit too. If I have, say:
> 
> 	repo1: <a, TAG_1, b, c, TAG_2>
> 	repo2: <a, TAG_1, b, d>
> 
> 	repo2: <a, TAG_1, b, d, c, TAG_2> (pull from repo1)
> 	repo2: <a, TAG_1, b, d, c, TAG_2, TAG_3> (tag)
> 
> 	repo1: <a, TAG_1, b, c, TAG_2, d, TAG_3> (pull from repo2)
> 
> then repo1 and repo2 have the same contents; but repo1's inventories
> have three optimized tag inventories which just have patches (d, TAG_3),
> (b, c and TAG_2) and (a and TAG_1); whereas repo2's inventories match
> the names of the (d, TAG_3) and (a, TAG_1) inventories from repo1,
> but the first one has (b, d, c, TAG_2, TAG_3) as its contents instead.
> 
> But, on the other hand, that's completely coincidental -- if we're looking
> for the "TAG_3" inventories, it seems like any one would do equally well,
> so once we've committed one, we don't need to bother worrying if someone
> wants something different later -- the differences don't matter any more
> than 3/6ths is different to 1/2. It might be worth trying to keep the
> simplest inventory for a tag, but otherwise doesn't make a difference.
> 
> Is that about right?

Yeah.  In practice, you don't want to break your inventory into the
smallest pieces, since there's an overhead for each file that is
downloaded.  Especially if there are lots of tags, breaking everywhere that
is possible could be very bad, giving almost as many inventory files as
patches.  Ideally, your cgi script (or the darcs-repo script it calls)
should be smart enough to return just two or three inventory files, since
normally you only need the first one, and if you need more than that, quite
likely you need the entire inventory.  But that's an optimization.

> > Checkpoints are different, they store the actual contents of a
> > repository, and are stored in _darcs/checkpoints/.  They are created
> > using darcs optimize --checkpoint, and allow you to avoid downloading
> > older patches at the cost of not having the older history (although
> > you'll still have the patch names).
> 
> Okay, so for my 0.1.1 - 0.1.17; 0.2.1 - 0.2.45 case, what I'd want is
> two branches, one the goes from 0.1.1 to 0.1.17; and the other going
> from 0.1.1 to 0.2.45, with a checkpoint at (at least) 0.2.1. And I'd
> want to share _darcs/inventories/* and _darcs/patches/* to save space.
> I'd only want to bother having two branches if I actually want to release
> a 0.1.18 at some point; and I could easily create the additional branch
> when I decide to release 0.1.18.
> 
> And when I do regular development, I just want to get --partial the 0.2.45
> branch, which'll get me my revision history from the latest checkpoint.
> 
> Cool, I think that makes sense.

Right, except that for real development I'd not get --partial.  I'd really
only recommend get --partial for either occasional develpment (e.g. someone
sending in doc patches for darcs) or read-only use (e.g. if you just want
to keep up with the latest darcs).  For real development, sooner or later
you'll want to do something requiring a full repo.  On the other hand, the
more people who use partial repos, the sooner the partial-repo-related bugs
will be found...

> > > Not really, it'd just mean doing, say:
> > > 	/srv/darcs-repo/
> > > 		patches/
> > > 			{patch name}
> > > 		some/branch/name/
> > > 			inventory
> > > 			commuted-patches/
> > > 				{patch name}
> > > 		some/other/branch/
> > > 			inventory
> > > 			commuted-patches/
> > > 				{patch name}
> > > 				{patch name}
> > > (The theory is you either just grab the first checked in patch of the
> > > right name; or else there's a branch specific diff that you apply to
> > > the first checked in patch to get what you need.)
> > I see, yes this is a better design.
> 
> And now there should, it seems, be an "inventories/" directory at
> the top level too, with a single inventory file for each TAG in the
> repository. (Well at most a single inventory file for each TAG. Some
> mightn't have any, maybe.)
> 
> Hrm. That should also let me keep fewer commuted patches around; but then
> I'd need to store a commuted patch per TAG instead of per branch. Tricksy.

Either that, or you'd need to store commuted-inventories.  *Something*
tricksy needs to be done, either storing the inventory of a given tag
multiple times, or making sure that when a tag moves from "inventory" into
"inventories" in a given branch that the patches in that tag have the
correct representation.

> Can a patch ever be applied to a branch twice? It can't, can it? (Because
> _darcs/patches/ can only hold one copy of the patch)

Right, it can't.
-- 
David Roundy
http://www.abridgegame.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.osuosl.org/pipermail/darcs-users/attachments/20041019/1b3e1ccc/attachment.pgp