[darcs-users] Detecting hunk moves [was: Automatic detection of file renames]

AntC anthony_clayden at clear.net.nz
Mon Aug 26 04:24:48 UTC 2013


> Ganesh Sittampalam <ganesh <at> earth.li> writes:
> 
> Another thought is that even though inodes are a really nice way to
> capture user actions, it would be good to also have an alternative
> approach such as detecting similar content, ...
> Detecting similar content would also be useful in
> future for inferring hunk move patches once we support those.
> 

I'm interested how this "similar content" does or could work. From 
previous discussions on darcs, I believe that git concentrates more on 
matching up content than matching source line numbers(?)

As I understand it, a 'hunk move' is a hunk delete in one place, and a 
hunk insert of the same content in a different place (possibly a different 
file).

Suppose we have this sequence in Repo A:
+ create file F
+ add hunk text H1 to F
+ insert hunk text H2 into F (into the middle of H1)

The author knows (but darcs can't) that the content of H2 'links to' H1.
(For example, H2 is program code that refs names declared in H1.)
(I'm not sure if H2 is dependent on H1 in a darcs sense, because you can't 
commute the two hunk operations -- you'd have to split H1 into 
two 'hunklets' of text.)

There's then this sequence in Repo B.
+ pull create file F
+ pull add hunk text H1
+ create file G
+ hunk move text H1 to file G (-cut+paste)
- this leaves file F empty
? pull hunk text H2

I'm guessing that darcs will put text H2 into file G, as the only content -
- then compile would fail(?) Would git do the same?

Is there any validation or warning from darcs that the 'content/context' 
text for H2 doesn't match -- even though it does exist in the target repo, 
in a different place?

If in Repo B file F was _renamed_ to G, rather than copy+paste content, 
then I'm guessing darcs would insert H2 into file G OK (in the middle of 
H1) -- even if there was a file F created after the rename(?)

(I'm assuming no editting of the text -- that would make the hunk move 
harder to trace.)

At first sight, perhaps the pull of H2 should 'follow the content/context' 
into H1 in file G. But it's easy to change this example slightly: after 
pulling H2 into file G, insert a line " #include G ". Now all is well.

Perhaps a VCS should say in the target Repo B: I could follow the file 
name/line numbering, or I could follow the content/context; that gives a 
different result; which would you prefer? (It's important in that case, 
that Repo B 'knows' that hunk H1 is already pulled.)

AntC



More information about the darcs-users mailing list