# [darcs-users] Re: an interface for splitting hunks

Benedikt Schmidt ry102 at rz.uni-karlsruhe.de
Wed Mar 30 15:21:31 UTC 2005

Ketil Malde <ketil.malde at bccs.uib.no> writes:

> Mark Stosberg <mark at summersault.com> writes:
>
>> I think we are talking about the same thing. Right now I believe that
>> hunks equal all contingous changed lines.
>
> But that's not unabmigous.  A hunk like
>
[...]
>
> but one of them is definitely nicer.

Here is how the diff code i'm working on for darcs (similar to the one in
GNU diff) creates the diff:

oldfile: CAA
newfile: AAAA
1. Search for a minimal edit script changing oldfile to newfile.

-C
+A
A
+A
A

2. This script is minimal in the number of inserted and deleted lines, but
the hunks should have maximal length.
Try to shift hunks forward and backward to get maximal number of
consecutive insertions/deletions (a).

This one is maximal

-C
A
A
+A
+A

but this one is better since the deletion lines up with the insertion (b).

-C
+A
+A
A
A

That's why the algorithm selects the edit-script where deletions and
insertions line up, if there is one where (a) is maximal.

> ----------
>  \end
>
> +\begin
> +foo
> +\end
> +
>  \begin
> ----------
>
> is equivalent to
>
> ----------
>  \end
>
>  \begin
> +foo
> +\end
> +
> + \begin
> ----------

In this case neither (a) nor (b) help to identify the first edit-script as
the better one, because the empty line isn't special in any way. It should
be possible to add "starts/ends with an empty line" to (a) and (b) and see if
that helps to get the first diff in most of the cases. Anyone has some sample
files for testing where darcs and/or gnu diff get it wrong?

Benedikt