[darcs-users] questions on patch theory

Sun Sep 21 22:18:08 UTC 2003

droundy at abridgegame.org (David Roundy) writes:
>
> Well, it is possible to acheive "trivial" commutation whenever there is no
> conflict or dependency, but the cost would be maintaining an archive of
> metadata.  We'd have to have a unique ID number for each file and each line
> of every file, and then every patch would commute trivially... well, there
> might be problems with getting this to work with the token replace
> patch.

Right.  This is what Tom Lord's Arch does, and gets around the problem
of maintenance by storing the metadata as ordinary files in the tree,
which get diff'ed and patched just like any other file.  It does have
the benefit that moving files and directories around pretty much "just
works" without costly book keeping.  But it comes at the expense of
polluting the user's tree with revision-control metadata, which I find
highly objectionable (to the point that it actually puts me off using
Arch).  It also serves to place constraints on the user's data, and
means that Arch-based archives really only interoperate well with
other Arch-based archives.

> I personally think that the nontrivial commutation is a lot cleaner, and
> it's certainly less trouble, since you don't have to bother keeping track
> of a lot of IDs.  Also, I don't think anyone would seriously consider
> giving a unique ID to each line of each file, so we need non-trivial
> commutation in any case.

Well, it is possible that people would be willing to define regions in
a file with the use of textual tags, but it would be less than optimal
to have to depend on that.

>> Given the above, what I have been calling "influence" is perhaps more
>> aptly something like "distortion".  Though the theoretical ideal of
>> zero distortion is probably not universally achievable, it may be
>> approachable, given a suitably powerful grammar for describing
>> changes.
>
> :) Hmmm.  That's an interesting idea.  I'm not sure that zero distortion is
> always desirable.  I guess probably where you can reduce distortion without
> introducting metadata it's probably generally good.

I agree.  More to the point, I would argue that introducing metadata
is simply introducing distortion in another form, so in reality it
doesn't help.

What I mean by approaching zero distortion is approaching a
representation which precisely defines the user's unchanging intent.
In a hypothetical English-language patch format, we could define a
patch as, for example, "change the return type of the function 'foo'
to 'void'".  A patch like that would commute without distortion with
any other patch except one that changed that function's name.  So, to
address that case, we could change that patch from referring to the
function by what it does, rather than what it is named.  But then,
what if that changes in a way that we haven't anticipated?  So, even
with a specification language as powerful as English, and an intimate
knowledge of the file's syntax, we cannot achieve zero distortion
across the board.  On the other hand, it is always possible to define
a finite state grammar with zero distortion for some finite set of
changes.

>  This would be
> something to consider when thinking about how to define patches on a
> different data type, such as XML (which people have shown interest in).  In
> the case of XML patches one would have to figure out how to specify which
> element is modified (or moved), and it would definitely be nice to be able
> to do so in a distortion-free way...

That would be nice, if it could be done in a way that doesn't
completely blow its mind when an XML file is archived. :)

> For example, there is no way to sort a sequence of primitive patches in
> less than O(n^2), which is extrememly painful.  Actually, one could do
> better by going through and classifying them into "distortion sets",
> i.e. sets that have certain distortion properties.  So if they are all
> filepatches, you could bin them by filename and then sort each bin
> separately, and then put them together in whatever order you like.  But
> when binning them by filename you'd have to do better than using a list to
> store the patches of each filename, or the binning process would be
> O(n^2)... This actually could work, and I should look into it.  Currently
> if you have more than a few hundred file adds, if you run a darcs record
> without the --all flag, it churns forever trying to sort all the adds so it
> can present them in a nice order, and an algorithm like this could pretty
> much eliminate the problem.  I'd just need to be careful in defining the
> distortion sets, since there could be a file rename in the middle of the
> list.

That sounds promising; if we can eventually establish a partial
ordering which obeys transitivity, then quicksort will work.

> In case you couldn't tell, I like your distortion terminology.  :)

Thank you.  Time will tell whether it eventually pans out as a useful
concept.

cheers,
--Mirian