[darcs-users] Cheap branches (was: Re: post-1.0: "isolated directories"?)

Anthony Towns aj at azure.humbug.org.au
Mon Oct 18 16:36:59 UTC 2004


On Sun, Oct 17, 2004 at 02:11:23PM -0400, David Roundy wrote:
> Indeed, everything but the mergers is pretty simple.  Unfortunately,
> dealing with the mergers is quite important.  Someone has suggested that we
> could include the equivalent patch for mergers, and I'm beginning to think
> this would be a good idea.  If we did that, then although you'd need full
> darcs to do merges and commutes, at least you'd be able to apply merger
> patches without all of darcs' machinery. 

Yeah. That'd provide an incentive to keep the darcs repo format fairly
constant, rather than introducing new features when you work out better
ways of doing merging too.

> On the other hand, perhaps
> working towards a "darcs library" that could be used by programs in other
> languages would be wiser, since then you'd *have* the full darcs machinery
> in other languages.

I like the idea of being able to get at my source code without all the
darcs machinery, personally (in the same way you can get at the contents
of a .deb with ar, gzip and tar; whereas you need rpm to get at the
contents of an .rpm -- even if only to convert it to a cpio first). But
libraries are good too.

> > Yeah. For the moment it's a two-level thing -- you get "project foo"
> > and "branch head"; if you want more levels you have to use hyphens or
> > similar instead of slashes. *shrug* I prefer to start off limited and
> > add features later :)
> (I made the mistake of replying to the email before reading the entire
> thing, and see that you make a similar suggestion below.  I won't cut this
> section though, in case I had any good ideas...)

:)

> It just looks to me too much like starting off complicated, and there'll be
> no hope of simplifying it later! 

Nah, I just don't care about breaking compatibility completely in some
places. :) For example, the darcs-repo: "URLs" staying the same seems
much more important than keeping the command line darcs-repo accepts
constant, since it's only called by scripts.

> Perhaps we could eliminate the project-name in favor of a base directory?
> Or equivalently eliminate the /srv/darcs-repo as a distinct entity.  So
> then when I do a push the pathname I use is still broken into two parts,
> equivalent to your project name and branch name, but either or both of
> these could have '/' in them.  The darcs-repo admin decides where the
> distinction is, and to the user it just looks like there are a bunch of
> repository directories.

Yeah; I think Tim Berners-Lee once said he wished he'd made URLs work
that way too, so that "http://au/org/humbug/azure/aj/foo/bar" could
be on a machine called any of "humbug.org.au", "azure.humbug.org.au",
"aj.azure.humbug.org.au" depending on what happens to be convenient.
Darcs -- better than the web! :)

> So as a user (someone who pushes, but doesn't admin the server), I know to
> push with
> 
> darcs push darcs-repo:theserver:foo/bar/baz/myproject

(atm it's "darcs push darcs-repo:theserver/foo/bar/baz/myproject" -
slash not colon)

> I guess supporting this general case isn't needed immediately, as long as
> it's the sort of thing you're thinking when you're thinking of adding
> features later--that is, simplifying the user interface by removing
> restrictions.

Oh. I already implemented it. My bad. :)

Unfortunately I find I can't push that to my darcs-repo repositories
because I can't cope with the fact I tagged the 0.1 release. Pretty
obvious what's next on the TODO :)

> Ah, that's because I added the "inventories" after writing the docs (an
> embarrassingly long time ago).  Basically, there was a problem in that when
> the history gets long, since every pull needed to download the entire
> history, pulls were getting pretty slow.  So the solution was to break up
> the inventory into a series of files.  This can be done in an efficient way
> as long as the breaking point is on a tag, and as long as all patches
> "before" that tag are included in the tag.  This means that breaking up
> the inventory requires both reading and understanding the contents of tag
> patches.

> The same sort of logic is used when creating the context of a patch
> bundle.  The context always starts with a tag, so that tag is effectively a
> break point between inventory files.  

Hrm, so is the rule "if the only patches that would follow a TAG in
a context are the exact patches the TAG depends on, then don't bother
listing them" ?

> If your _darcs/inventory isn't broken
> up, you can break it up by running darcs optimize with no other options.

Hrm. If I have one tree that goes <a, b, TAG_1>, and another that goes
<a, c>; and I pull from the former to the latter to get <a, c, b, TAG_1>;
can optimize split up my inventory still? It doesn't seem to be able to?
Does that matter? I guess not.

Anyway. So the inventory format then seems to be:

	"Starting with tag:" [tag patch id]
	[any other inventory stuff]

and that's interpreted by converting the tag patch id into a filename in
the usual way, and looking for it in _darcs/inventories; and... iterating
or recursing? Can "Starting with tag:" appear anywhere but at the very
top of an inventory? If it can't, that means it's iterative (easy!) not
recursive (haskell...)

Hrm, these inventories files can change quite a bit too. If I have, say:

	repo1: <a, TAG_1, b, c, TAG_2>
	repo2: <a, TAG_1, b, d>

	repo2: <a, TAG_1, b, d, c, TAG_2> (pull from repo1)
	repo2: <a, TAG_1, b, d, c, TAG_2, TAG_3> (tag)

	repo1: <a, TAG_1, b, c, TAG_2, d, TAG_3> (pull from repo2)

then repo1 and repo2 have the same contents; but repo1's inventories
have three optimized tag inventories which just have patches (d, TAG_3),
(b, c and TAG_2) and (a and TAG_1); whereas repo2's inventories match
the names of the (d, TAG_3) and (a, TAG_1) inventories from repo1,
but the first one has (b, d, c, TAG_2, TAG_3) as its contents instead.

But, on the other hand, that's completely coincidental -- if we're looking
for the "TAG_3" inventories, it seems like any one would do equally well,
so once we've committed one, we don't need to bother worrying if someone
wants something different later -- the differences don't matter any more
than 3/6ths is different to 1/2. It might be worth trying to keep the
simplest inventory for a tag, but otherwise doesn't make a difference.

Is that about right?

> Checkpoints are different, they store the actual contents of a repository,
> and are stored in _darcs/checkpoints/.  They are created using darcs
> optimize --checkpoint, and allow you to avoid downloading older patches at
> the cost of not having the older history (although you'll still have the
> patch names).

Okay, so for my 0.1.1 - 0.1.17; 0.2.1 - 0.2.45 case, what I'd want is
two branches, one the goes from 0.1.1 to 0.1.17; and the other going
from 0.1.1 to 0.2.45, with a checkpoint at (at least) 0.2.1. And I'd
want to share _darcs/inventories/* and _darcs/patches/* to save space.
I'd only want to bother having two branches if I actually want to release
a 0.1.18 at some point; and I could easily create the additional branch
when I decide to release 0.1.18.

And when I do regular development, I just want to get --partial the 0.2.45
branch, which'll get me my revision history from the latest checkpoint.

Cool, I think that makes sense.

> > So, it sounds like this is what I want to be able to hook my
> > "mainline--0.1" and "mainline--0.2" branches together, without bogging
> > down in all the old history unless I decide I need it?
> Well, one thing storing "inventories/" would gain you is that you'd save on
> disk space for the inventory files as well as for the patches, since the
> inventory will have a lot in common between various branches.  I think this
> may be what you're saying, but I'm not sure...

Nah, I was on completely the wrong page. In the right book though! :)

> > Hrm, I took the opportunity to add
> >    http://www.scannedinavian.org/DarcsWiki/BestPractices
> > and included some babble. Maybe others would like to add questions
> > or improve answers or just add some links to other threads that are
> > enlightening?
> Hmmm.  There's also a "best practices" section in the manual, which is
> empty waiting for advice.  I think I may soon add a section or two in there
> myself.

Yeah; I was thinking the wiki page could be the place to dump "good
practices" for editing so that they could be included in the "best
practices" manual. Maybe it should've been called GoodPractices :)

Anyway, I've added some stuff about checkpointing to the page now too.

> > Not really, it'd just mean doing, say:
> > 	/srv/darcs-repo/
> > 		patches/
> > 			{patch name}
> > 		some/branch/name/
> > 			inventory
> > 			commuted-patches/
> > 				{patch name}
> > 		some/other/branch/
> > 			inventory
> > 			commuted-patches/
> > 				{patch name}
> > 				{patch name}
> > (The theory is you either just grab the first checked in patch of the
> > right name; or else there's a branch specific diff that you apply to
> > the first checked in patch to get what you need.)
> I see, yes this is a better design.

And now there should, it seems, be an "inventories/" directory at
the top level too, with a single inventory file for each TAG in the
repository. (Well at most a single inventory file for each TAG. Some
mightn't have any, maybe.)

Hrm. That should also let me keep fewer commuted patches around; but then
I'd need to store a commuted patch per TAG instead of per branch. Tricksy.

Can a patch ever be applied to a branch twice? It can't, can it? (Because
_darcs/patches/ can only hold one copy of the patch)

Sounds pretty doable.

Cheers,
aj

-- 
Anthony Towns <aj at humbug.org.au> <http://azure.humbug.org.au/~aj/>
Don't assume I speak for anyone but myself. GPG signed mail preferred.

``[S]exual orgies eliminate social tensions and ought to be encouraged.''
      -- US Supreme Court Justice Antonin Scalia (http://tinyurl.com/3kwod)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
Url : http://lists.osuosl.org/pipermail/darcs-users/attachments/20041019/b2448176/attachment.pgp 


More information about the darcs-users mailing list