[darcs-users] Character-based patch type
and_j_rob at yahoo.com
Tue Nov 18 04:05:24 UTC 2008
I have been working on a character-based diff algorithm for making certain hunk patches easier to read. In situations where there is quite a lot of tab-conversion, this may also tend to compress patches as well. Initially, the idea was to make a separate "diff -u" like tool that would not only print changes of lines, but also changes _within_ lines. I have talked with lispy about this, who encouraged me to post to this list. Since there is already a bug about this ( http://lists.osuosl.org/pipermail/darcs-devel/2006-October/004915.html ) I would like to incorporate these ideas into Darcs, if possible.
My proposal is to add one of the following patch types to FilePatchType:
(1) ModifyLine !Int !Int PackedString PackedString
(2) ModifyLine !Int !Int [PackedString] [PackedString]
(3) ModifyLine !Int [(Int, PackedString, PackedString)]
(4) ModifyLine !Int [(Int, [PackedString], [PackedString])]
where the first Int is the line number, and the second Int is character offset, and of course the strings are what to remove and/or insert.
For the sake of simplicity I would prefer the first, but then again, there may be some patch theory magic that would require the others for some reason. Regardless of which exact representation is chosen, I believe that all forms above have equivalent commutation properties.
The ModifyLine patch type affects a subset of what the Hunk patch type affects, which means they should commute and merge similar to Hunk patches. When commuting two ModifyLine patches, it is much easier than with Hunk patches, since they do not change the indices of any lines, only the indices of characters within a line. When merging two ModifyLine patches with themselves, however, it is about as hard as Hunk patches, since the complexity of context-sensitive indices that existed at the line-level has been brought down to the character-level. In order to represent these patches, one could consider a list of ModifyLine patches of the form (1) which requires more processing after canonization, or a single ModifyLine patch of the form (3) which requires more processing during canonization. Each would have certain repercussions, and I think it is best to have a discussion about this.
I have already done some initial work on adding this new patch type, and I am willing to do anything needed to polish off all the commuting and merging algorithms to accommodate the new patch type. Just thought it would be a good idea to tell someone.
Any suggestions, comments?
Andrew Robbins (adu)
PS. Alternative names for the patch type could be: Hunk2, CharacterHunk, CharHunk, Chunk, Indent, Dent, Vertical, ModLine, or HappyPatch
PPS. I'm also a big fan of spinning off libraries.
More information about the darcs-users