[darcs-users] darcs conflicts/dependencies -- is patch theory the place to start?

Stephen J. Turnbull stephen at xemacs.org
Thu Sep 20 08:56:59 UTC 2012


AntC writes:
 > Stephen J. Turnbull <stephen <at> xemacs.org> writes:

 > > bzr does this already.  It only uses it to track renames/copies, and
 > > for most users these really aren't that big a deal.  It also can give
 > > confusing results if used improperly.  For example, if starting at
 > > version N you copy file A to B, then delete 5% of A and 95% of B, and
 > > pull a bunch of related functionality into B from various places, you
 > > can get very voluminous (and therefore confusing) diffs of tip
 > > vs. version N.
 > 
 > Careful here! A move/rename is not the same as a file copy. Tentatively:
 > - move/rename must retain the file's identity

Which means just *what* if from the author's point of view the
functional set of changes involves ending up with two files, both with
new names, each containing 50% of the premove file?

I have heard of some cases (eg, the Java refactoring case I mentioned)
where it's pretty obvious that identity is the same.  But I myself
have rarely seen cases where renaming was *not* accompanied by
significant role change for the renamed file.  Not to mention that in
almost all such cases git's content-tracking copy-move detector would
get it right.

 > - file copy creates a new file identity
 > - but (tentatively) copy retains the identity of the contents
 >   (lines)

Again, which one is the original file identity, if the purpose of the
copy is to divide the file into two by deleting a range of lines from
each?  Suppose that you have decided on an interface-implementation
split.  Then it is quite likely that the interface file retains the
"identity" (ie, the relationship to other modules in the application,
but gets a new name).

This actually happened historically in XEmacs, or maybe Epoch before
it.  The original GNU Emacs 19 code had rather different interfaces
for TTY and X consoles, but XEmacs modeled its "virtual console" API
on the more powerful X console.  So first, tty.c was thrown away, and
console-tty.c was written to conform to the middleware API in
console-x.c.  Finally, console-x.c was split into console.c (the
virtual console, which maintains its relationships to the redisplay
engine and to the new tty implementation) and console-x.c, which was
relatively unchanged content-wise (deletion of about 20% API-related
code, and some formerly static APIs had to become externally visible)
but has a completely different role in XEmacs than it did previously.

On the other hand, how often do you see a file copy within a project
where the file content stays mostly the same across copies?

In all cases, git is able to track code motion across files as long as
lines are unchanged.  If lines are changed, but you need to track code
motion, many cases can be handled by the so-called "pickaxe", which
tracks redistribution of lines matching a regexp.  (Of course, the
"pickaxe" is quite expensive in wall time.)

 > Similarly if we rename a file, we also want to change any #include
 > in other files that refer to it. Perhaps this means token replaces
 > apply to the file system _and_ file contents?

This is precisely the problem that "container tracking" is intended to
address.  It clearly is *not* a token replace as Darcs knows it,
because it needs to be syntax-aware (eg, you wouldn't want it changing
a comment like

    In version 42, this class was renamed from Foo to Bar.  It is
    possible to enable a backwards-compatibility layer by also
    importing the new Foo module, which simply wraps Bar.

to

    In version 42, this class was renamed from Bar to Bar.  It is
    possible to enable a backwards-compatibility layer by also
    importing the new Bar module, which simply wraps Bar.

Much hilarity would ensue....

 > Thanks Steve for the info re bzr. The usefulness of the
 > unique/persistent file id is all for the VCS's housekeeping, not
 > for the programmer. (Or at least the only use for the programmer is
 > to be able to show that in comparing repos, two files with the same
 > dir/name are not really the same, or vice versa that two
 > different-named files are really the same.)

Indeed.  My (rather restricted) experience is that this facility isn't
needed so often.  Other people (including the bzr devs and Mark
Shuttleworth himself) say it's very important.



More information about the darcs-users mailing list