[darcs-users] sharing files

Fri Jul 25 03:06:37 UTC 2003

On Thu, Jul 24, 2003 at 09:45:02AM -0400, David Roundy wrote:
> > sharing code between projects is very easy, just add an already existing
> > guid to your '_darcs/map' and pull in all the patches which reference
> > that guid. you could even have the same file multiple places in a single
> > repository by repeating a guid.
> > 
> > no need to arbitrarily split up or rename patches based on whether a
> > file is shared or not. submit the whole patch which modifys the shared
> > file as well as whatever else needs to be modified atomicly with it,
> > when someone else pulls that patch, the parts which modify guids which
> > are not in their _darcs/map will naturally be ignored, if they add those
> > files to their repository later then the necisarry patches will be
> > applied.
> >
> > no need to be careful about which patches you pull or push with reguards
> > to file naming or unexpected side effects. pull will grab all patches
> > which reference any of the guids in your _darcs/map, any extra guids
> > mentioned in said patches won't affect you, as opposed to the current
> > case, where the might mention file names which exist in your repository
> > but are distinct from the version in the other persons repository.
> 
> This isn't quite true.  You'd still need to be careful not to pull certain
> changes between repositories.  You would have to be careful never to pull a
> patch which contains a file add between repositories (unless you really
> want the file in both repos), and you'd have to be moderately careful not
> to pull file renames between repos.

I think we are talking about different things. 
file renames and additions are nothing more than edits in the _darcs/map
text file. since each repositories _darcs/map file is different (has a
different guid) changes to one repositories version will not be propegated to
another. any patches which reference the remote _darcs/map will be
handled like any other and the irrelevant parts (those refering to the
guid you don't care about, the other map file) will be ignored.

The only way file rename, delete and addition operations can affect
another repository is if you explicitly use the same map or specify
the other map file as a submap. in both those cases you want the
rename/addition/deletion operations so it is okay.

> It seems to me that with these caveats you haven't gained a whole lot.  You
> could get as much by modifying darcs to ignore any FilePatches to files
> that don't exist (rather than choking as it does now--intentionally, since
> such a situation would indicate a bug).  You'd also have to modify the
> commute routine to accept a directory of which files exist so that it would
> know that FilePatches to files that don't exist always commute.

it can handle several situation that the current scheme cannot. 
imagine a project forks, both forks add a 'foo.c' file, suddenly, they
cannot pull each other patches which involve foo.c since they would
conflict. things get more complicated as projects branch, you would be
unable to pull patches back and forth which should not conflict but do.
the solutions are not easy since they involve knowing before hand that
other people branching your project are not creating files of the same
name so you can 'break up' patches into seperate ones which modify the
contested files independently, you then must remember to not pull
patches which reference them.

with the system i was talking about, this would be a non-issue. every
'darcs add' creates a new unique file, independent of the name given.
two developers doing 'darcs add foo.c' creates two distinct files with
distinct histories.

> But on top of this, you'd start running into all sorts of other problems.
> What happens when you add a file to a repository and you've already pulled
> patches that would have modified those files? Do those old patches get
> reinterpereted? One of the basic ideas of darcs is that a patch always
> means the same thing (regardless, for example, of the order in which
> patches are pulled and merged).

darcs add would ALWAYS creates a new unique file . There is no way to
darcs add something which will conflict with something in another
repository since they would have different guids. the only way to get
the 'same' file from another repository is to explicitly 'pull' it in
which grabs all apropriate patches which must exist in the other
repository for the file being pulled to exist there in the first place. 

> > Just an idea... perhaps you thought about this and there are good
> > reasons not to do it? I am more just brainstorming. I find darcs very
> > inspiring and always found prcs's guid filename thing to be a very clear
> > and easy way to handle meta-data.
> 
> I have thought about this, since I got started working on darcs through
> disillusionment with arch, which file ids.  File ids are definitely
> intuitive, and are easy in some senses, but are also unnecesary in a
> sufficiently advanced revision control system, because the *current*
> filename always uniquely identifies a file.  File ids are only necesary if
> you don't bother paying attention to what version of the repository the
> patch refers to.

arch's file ids are used different than in PRCS AFAICT. the important
thing is not the file-ids, but the ability to store meta-info (the
entiry directory tree layout) in it's own editable text file and
abstract the concept of a files contents away from how it is used in
anyones particular repository. tags and preferences also fall easily
into this scheme. 

> In practice I think that forcing people to make patches to a shared file
> not include changes to other (unshared) files is a good thing.  A patch
> should (and this isn't revision control, but best use of revision control)
> include only one logical change.  If the file is shared then any changes to
> it *can't* depend on changes to an unshared file, athough it may require
> changes to unshared files if, for example, you change an API.  But if you
> change the API provided by a shared file, all the repositories using that
> file but one will have to have two patches anyways, one to change the API
> and one to support the change to the API.  So there's no compelling reason
> to put the two into one patch (although if it weren't shared I'd definitely
> want to do so).

but this requires knowledge a priori that a file will be shared. as well
as communication between distributed branches. the safe way to do
things is to create one patch per file which leads to patchspace
crowding and a lot of effort on the developers part to remember which
patchs they should not apply. 

> So to summarize what I think is the most important reason, I guess the only
> advantage I see in your proposal would be for shared files (or nested
> repos, which I haven't addressed), and the advantage here would be simply
> to allow you to make a single patch modifying both shared and unshared
> files.  Since I think this would be bad practice, it doesn't stand up as an
> argument in favor of such a change.

I think we were talking about slightly different things, arch does not
use file-id's to their real potential (among other things). part of the
reasons i looked for something better, darcs :)

        John

-- 
---------------------------------------------------------------------------
John Meacham - California Institute of Technology, Alum. - john at foo.net
---------------------------------------------------------------------------