[darcs-users] Conceptual questions about REPO s

Thu Dec 7 04:39:08 UTC 2006

Sean writes:

 > The first thing I need to get my head around, is what to put in a
 > repo ?

At the bitflicking level, start by putting nothing that's generated in
the repo; instead, make sure it gets regenerated whenever out of date
(typically with a Makefile, but there are other build systems out
there).  Eg, Postscript figure -> PNG image.  The .ps goes in the
repo, the .png does not.

If you are using the SCM to communicate improvements to your clients,
they may have a minimal runtime, or you may wish to enforce the
implementation of the translation.  Then you will want to put such
generated files in the repo, too.

 > Do I put all of my development code in a single repo, or do I maintain
 > different repos for each specific code project ? (Does it matter if
 > you end up with 10's or 100's of repos)

Different SCMs do this in different ways.  Darcs has a model where
branch == repo == workspace.  It's possible to keep multiple branches
in a single darcs repo, but it's somewhat artificial, and requires
work and attention from you.

Darcs is excellent at managing renames and copies, so the direct cost
of reorganizing your projects (splitting a project into two projects,
combining two projects into a single one) is low.  Of course you still
face the conceptual issues (reflected in the question "now in which
file in which directory did I put that code, anyway?"), but Darcs will
not get in your way if you decide that "A and B belong together here,
but C should be over there" after originally setting it up as A and
B+C.  The overhead is sufficiently low that the one time I wanted to
do this, I simply did something like "cd Aparent; cp A Bparent/; darcs
remove A; cd Bparent; darcs add A" for each file that was clearly in
the wrong space until I achieved tolerable sanity.  (Caveat: there are
potential problems with my approach, if you have the need come back
here and ask for advice!)

Darcs will not have a problem handling as many repos as you have leaf
directories, but you probably want to organize at a somewhat higher
level.  The issue is entirely how to remember which repos communicate
with each other.  (Ie, which are branches of a single project.)

 > I tend to keep most work in customer specific directories and then in
 > project specific subdirs
 > 
 > customers/
 > customers/abc/
 > customers/abc/web
 > customers/def/
 > customers/def/db
 > customers/ghi/
 > customers/ghi/db
 > customers/ghi/web
 > 
 > I suspect, that I should create _darcs repos for each specific
 > customer project dir.

If there's substantial commonality across customers in the web
directories, and similarly in the db directories, I would have each
"web" and "db" as a separate repo (interpret each as a workspace/
branch of the mainline kept in /dev, not as a separate project).

If the great majority of customers have both web and db, and also
they use the generic code in most cases, then maybe a customer-level
repo containing both web and db might be best.

It really depends on how much dependency there tends to be between
your "web" changes and your "db" changes.

 > This code tends to live out of the customer heirachy, in my personal
 > development workspace.
 > 
 > Therin lies the problem. I have may have a generic code module in:
 > dev/python/formail.py and subtle modifications of the same in various
 > customer/project subdirs.

If you can identify a subset of "web" that is generic, then you'd want
to collect that code in the "web" project mainline, and then derive
the "abc/web" branch from that, I think.  For this to work well within
Darcs, I think you want to start by organizing that way, rather than
having loose collections of python modules and perl CGIs and C++ db
adaptors as your "dev/python" naming suggests.

 > One way to do handle this, might be to create a repo in the main dev
 > directory, and keep different customer specific revisions in the same
 > repo ? That way, I am saving the slight changes between customers
 > specific versions as patches.

I don't think this works very well with Darcs's model, at least I've
always handled such persistent variants by creating a new repo.  The
patches are reasonably space-conserving; unless you have a huge number
of patches, most of the space will be taken by the customer workspace,
which doubles as the repo.

With the caveat that to make certain operations such as diff fast,
darcs keeps a "pristine" tree, approximately doubling the space for a
bare workspace.  As a rule of thumb, a darcs branch with all the
trimmings will cost about three times as much as the bare source tree.
But when you start to add in generated files, test scaffolding, editor
backups, and other detritus of the development process, I found that a
darcs repo with about 500 patches added much less than 100% overhead.
(In my case that's a bit optimistic, since "detritus" includes CVS
byproducts, the central project repo is in CVS.)

 > This may get difficult to manage however (How do I know that
 > customer xyz's formail handler is actually maintained under
 > dev/python/formail.py) ?

If xyz's formail handler is *not* maintained in /dev, I'd darcs remove
it from the xyz branch.  Then in an arbitrary customer's branch you do
a darcs pull.  If formail.py is generic, darcs pull'ing a patch to
formail.py from /dev to the customer branch will succeed; if it's a
local version, you'll get a conflict.

"Conflict" sounds ugly, but in fact it's safe enough in darcs.  It's
just an error meaning that darcs can't apply the patch, and darcs
maintains the necessary metadata to allow you to chose how to handle
the issue quite flexibly.

 > If I have different repos for each customer, and I patch the main
 > repo, can I automatically push the changes out to the customer
 > specific repos ?

That depends on the semantics of the patch.  If your organization is
such that all customers should have identical versions of common code,
then you can script this easily enough.  But in that case, why isn't
the common code organized as a linkable or importable library, or as
file system links to the generic code?  I think that would be more
reliable if feasible.

If you are in the habit of tweaking the different customers' versions
in-place, automatically pushing changes will be unsafe.

 > Can links be established between the repos ?

What do you mean by "links"?  If you mean, "Can patches be exchanged
without special effort?", the answer is "yes".  If you mean, "Can I
include one repo in another by just 'pointing' to it?", no, Darcs
doesn't support that directly yet, although you can probably hack up a
personal framework of scripts to make it work.

                                *****

In the end I don't think I've said anything that Max Batcher didn't
say, but I said it different.  :-)  HTH.

Steve