[darcs-users] Detecting hunk moves [was: Automatic detection of file renames]

Ganesh Sittampalam ganesh at earth.li
Tue Aug 27 20:54:11 UTC 2013


Hi,

On 26/08/2013 05:24, AntC wrote:
>> Ganesh Sittampalam <ganesh <at> earth.li> writes:
>>
>> Another thought is that even though inodes are a really nice way to
>> capture user actions, it would be good to also have an alternative
>> approach such as detecting similar content, ...
>> Detecting similar content would also be useful in
>> future for inferring hunk move patches once we support those.
>>
> 
> I'm interested how this "similar content" does or could work.

The general idea of detecting patches like renames or hunk moves is to
record better patches that the user might have recorded for themselves.
Note that "hunk move" is still hypothetical as a darcs patch type.

Once a patch is inferred based on whatever heuristics and recorded, it
would be treated just like the user had recorded it by hand in future
merges. The heuristics would play no further part; darcs would have to
follow deterministic rules in merging and commuting the patch in order
to maintain the key properties of darcs.

So, given that the "similar content" thing would just be a heuristic, my
general idea would be simply along the lines of looking at the
unrecorded content just in terms of changes to file contents, and then
searching for ways that it could be turned into shorter patches by
making use of renames and (in future) hunk moves.

> From 
> previous discussions on darcs, I believe that git concentrates more on 
> matching up content than matching source line numbers(?)

git will apply heuristics at merge time, whereas darcs has to apply the
heuristics at record time if ever. I'm not particularly familiar with
git merge algorithms but my gut feeling is that beyond that difference,
they behave quite similarly to darcs.

> As I understand it, a 'hunk move' is a hunk delete in one place, and a 
> hunk insert of the same content in a different place (possibly a different 
> file).

Yep.

> Suppose we have this sequence in Repo A:
> + create file F
> + add hunk text H1 to F
> + insert hunk text H2 into F (into the middle of H1)
> 
> The author knows (but darcs can't) that the content of H2 'links to' H1.
> (For example, H2 is program code that refs names declared in H1.)
> (I'm not sure if H2 is dependent on H1 in a darcs sense, because you can't 
> commute the two hunk operations -- you'd have to split H1 into 
> two 'hunklets' of text.)
> 
> There's then this sequence in Repo B.
> + pull create file F
> + pull add hunk text H1
> + create file G
> + hunk move text H1 to file G (-cut+paste)
> - this leaves file F empty
> ? pull hunk text H2
> 
> I'm guessing that darcs will put text H2 into file G, as the only content -
> - then compile would fail(?) Would git do the same?

Yes, that would be what I would expect to happen. If H2 depends on H1
then this seems like exactly the right thing to do and I don't see why
compilation would fail.

Given what you say below you might have meant to say "file F" here; but
I don't think H2 would ever end up in F automatically. Either the merge
would fail with a conflict or it would end up in G.

Nonetheless, there are certainly more complex scenarios where darcs
would do the wrong thing, as there are without hunk move. For example if
H2 depended on other content in F.

> Is there any validation or warning from darcs that the 'content/context' 
> text for H2 doesn't match -- even though it does exist in the target repo, 
> in a different place?
> 
> If in Repo B file F was _renamed_ to G, rather than copy+paste content, 
> then I'm guessing darcs would insert H2 into file G OK (in the middle of 
> H1) -- even if there was a file F created after the rename(?)
> 
> (I'm assuming no editting of the text -- that would make the hunk move 
> harder to trace.)
> 
> At first sight, perhaps the pull of H2 should 'follow the content/context' 
> into H1 in file G. But it's easy to change this example slightly: after 
> pulling H2 into file G, insert a line " #include G ". Now all is well.
> 
> Perhaps a VCS should say in the target Repo B: I could follow the file 
> name/line numbering, or I could follow the content/context; that gives a 
> different result; which would you prefer? (It's important in that case, 
> that Repo B 'knows' that hunk H1 is already pulled.)

If I understand your terminology correctly, then Darcs always follows
content/context. I don't think following file names and line numbering
would work well in general merges would usually produce bad results -
e.g. merging a file change to A and a rename of A to B should result in
a file change to B, not to A.

Cheers,

Ganesh


More information about the darcs-users mailing list