[darcs-devel] Re: [issue267] Wishlist item for darcs

Tommy Pettersson ptp at lysator.liu.se
Tue Sep 19 03:35:52 PDT 2006


On Mon, Sep 18, 2006 at 04:08:28PM +0000, Tuomo Valkonen wrote:
>   * UTF-16 is, of course, a rather different case than just a change in
>     encoding. The way I'd go about it, is to make the current patch type
>     polymorphic to input in arbitrary character types, if it isn't already,
>     and add skeleton support for plugging in and specifying different patch
>     type for files of arbitrary formats. (So, one day, support could be 
>     written for structural formats to have structural instead of line-based
>     patches, and so on.)

I think this could be a worthwhile task, although not so easy.
If darcs could handle MS Word documents and other "industrial"
file formats, it would become a "real" RCS in one more sense of
the word real. And it would probably boost the development of
new patch types, which would be interesting.

One complication is the diff algorithm. It forms hunks, and
would form the UTF-16 hunks and many of the eventual plug-in
structural format hunks. It needs to be polymorphic as well, or
worse...

My number one wish for new patch type, once I finally get time
to finish the replace-with-space patch type, is be a hunk-move
patch type that can move a block of lines between files, and
trivially within the same file. This would sort of be a higher
order patch type, since one would want it to be able to move
_any_ kind of hunk, also the plug-in ones.

It would be nice if the user didn't have to ask for a specific
diff algorithm on each record. The diff function could take
functions that partition each file in one (or more) levels and
automatically produces hunks of a sort. But it would probably be
a waste of time to run the diff for each different type on each
different file. And it would be a weird dialogue with lots of
strange dependencies between changes when recording. And there
must be some way to guarantee that "general hunks" abide the
rules of the patch algebra and to automatically compute the
commutings between them. This would probably limit the plug-in
"language" they'd be expressed in.

Hand-coding a UTF-16 hunk would be easier, but there's still the
problem of how to do it in the diff algorithm and the "select
changes" dialogue, unless there should simply be _either_
Raw8-bit or UTF-16, which wouldn't be so nice, I think.


-- 
Tommy Pettersson <ptp at lysator.liu.se>




More information about the darcs-devel mailing list