[darcs-users] locking

Stephen J. Turnbull stephen at xemacs.org
Fri Feb 24 06:57:41 UTC 2006


>>>>> "Ketil" == Ketil Malde <ketil.malde at bccs.uib.no> writes:

    Ketil> So it is not a complete end-all solution.  However, it
    Ketil> would localize changes to smaller pieces, and thus be
    Ketil> easier to commute patches across.

Not necessarily.  The easier it is to commute syntactic patches, the
more carefully you need to check semantic prerequisites.  Cf the
"files in the target of a tokreplace patch must not contain the
replacement token" restriction.  I strongly suspect the reason that
line-oriented hunks work as well as they do is that good style, if not
syntax, in most languages is line-oriented.

    >> - In quite a few instances (such as, e.g., the typical mmm-mode
    >> emacs files, or literate programming source files) the notion
    >> of a token must be different in different parts of the file.

    Ketil> Must?  I'm not sure what mmm-mode is (my Xemacs doesn't
    Ketil> seem to have it),

It's in the packages.  Check M-x list-packages.

    >> - Whitespace (or indentation) is obviously significant in some
    >> languages like, e.g., Haskell.  How would token diffing
    >> distinguish between things that differ only in indentation?

    Ketil> By keeping track of indentation.  Just like current diff
    Ketil> deals with blank lines.

I've thought about this (a lot of time, not enough big chunks to come
to firm conclusions, unfortunately) and have the intuition that trying
to make darcs (or any of the other text-oriented "advanced" SCMs) do
more than handle line-oriented text is the wrong idea (unless you want
to turn it into Yet Another Editor That's Smart).

Git has it right: define some object types (git is missing the "patch"
type, but I think that's typical Linus "don't do what you don't yet
know how to do" brilliance), and delegate intelligence to a
combination of smart scripts and humans.

In other words, make darcs a library callable from your favorite
scriptable editor.

    Ketil> The obvious downside is the cost of diff, the standard
    Ketil> algorithm is O(n²) using dynamic programming, which, at ten
    Ketil> words per line would make it two orders of magnitude more
    Ketil> costly.  I seem to remember diff tools cheating a bit,
    Ketil> though.

You can probably get that down to O(n lg n) or better on average by
defining a git "file" type which would simply handle sequences of
blobs.  Pick some syntactic partitioning scheme (eg, a group of what
Emacs calls "defuns" that is bigger than some fixed block size in
bytes) for the file objects (which are a sequence of names of block
objects), do a diff on the file object, then refine to diffs on the
differing blocks.

You'd want to be a little tricky about splitting and coalescing blocks
to localize areas under active development.  Maybe something a little
smarter than merely a sequence of blocks.  Which is another reason why
the algorithms should be under control of a scriptable editor....

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.




More information about the darcs-users mailing list