[darcs-devel] Re: [issue267] Wishlist item for darcs

Tuomo Valkonen tuomov at iki.fi
Tue Sep 19 09:03:34 PDT 2006


On 2006-09-19, Tommy Pettersson <ptp at lysator.liu.se> wrote:
> On Mon, Sep 18, 2006 at 04:08:28PM +0000, Tuomo Valkonen wrote:
>>   * UTF-16 is, of course, a rather different case than just a change in
>>     encoding. The way I'd go about it, is to make the current patch type
>>     polymorphic to input in arbitrary character types, if it isn't already,
>>     and add skeleton support for plugging in and specifying different patch
>>     type for files of arbitrary formats. (So, one day, support could be 
>>     written for structural formats to have structural instead of line-based
>>     patches, and so on.)
>
> I think this could be a worthwhile task, although not so easy.
> If darcs could handle MS Word documents and other "industrial"
> file formats, it would become a "real" RCS in one more sense of
> the word real. And it would probably boost the development of
> new patch types, which would be interesting.
>
> One complication is the diff algorithm. It forms hunks, and
> would form the UTF-16 hunks and many of the eventual plug-in
> structural format hunks. It needs to be polymorphic as well, or
> worse...

Of course the diff algorithm should be polymorphic to various character
types/strings (FastPackedString8, FastPackedString16, etc.?), but I 
don't see that as a problem. I have my doubts that the diff algorithm is 
even applicable to many structural formats, so I wouldn't worry about it.
Not that I would familiar with it, or have given any thought to what kind 
of patches different structural formats would have.

> My number one wish for new patch type, once I finally get time
> to finish the replace-with-space patch type, is be a hunk-move
> patch type 

I'd like that.

> It would be nice if the user didn't have to ask for a specific
> diff algorithm on each record. 
>
> ...
>
> Hand-coding a UTF-16 hunk would be easier, but there's still the
> problem of how to do it in the diff algorithm and the "select
> changes" dialogue, unless there should simply be _either_
> Raw8-bit or UTF-16, which wouldn't be so nice, I think.

I was thinking of something along the lines of 'darcs add
--format=text16 file.txt', 'darcs add --format=xml file.xml',
and so on, with this information stored somewhere. Then _only_
LineBasedPatch16 or XMLPatch or whatever ever get used on that
file. Different patch types (in the sense of the 'data' directive) 
do not interact. Something like:

    data RepoPatch = Move ...
        	   | ...
                   | FileChange FileName Dynamic
				
    class FilePatch a where
	commute :: a -> a -> ...
	...

    data FPSType a => LineBasedPatch a = HunkLP ...
		                       | MergerLP ...
				       ...

    instance FPSType a => FilePatch (LineBasedPatch a) where
	...

(I wish Dynamic wasn't needed...)

-- 
Tuomo





More information about the darcs-devel mailing list