[darcs-users] revert slowness

Sat Sep 25 13:46:37 UTC 2004

On Sat, Sep 25, 2004 at 02:56:19AM -0400, Andrew Pimlott wrote:
> I took a stab at reproducing the conflict slowness reported by Lee
> Cantey, and ran into slowness in revert.  I was trying to simulate the
> case of merging a small change with a large "reindent" change.

The problem here is in the LCS (longest common substring) algorithm used.
(And this is distinct from Lee's problem.)  The trouble is twofold.  First,
I use a "real" LCS algorithm, while GNU diff (for example) uses only an
approximate LCS.

Then secondly, the particular LCS algorithm I used is O(nlogn) when all
lines in the files are unique (i.e. appear only once in each file), but is
O(n^2logn) in the limit where a small finite set of lines are used, which
is the limit that your test case stresses.

> I don't know if this is important in practice, but I thought it could
> provide a clue to Lee's issue.

Mostly, I suspect that this isn't much of a problem in practice.  Usually,
the fraction of lines in a file that are repeated is pretty small.

The solution would be to implement another LCS algorithm, perhaps
optionally.  It would be nice to be able to intelligently switch algorithms
when the optimal one seems likely to be slow, but the problem is that I
have no idea how one could recognize pathological situations.

The LCS is a very "easy" and safe part of the code to play with, since it's
quite easy to test that your LCS is valid (i.e. is at least a common
substring).  On the other hand, I spent a long time looking into how to
efficiently implement this a couple of years back and am not eager to work
on it any more.  Still, it's not a bad project for someone who just wants
some haskell practice and/or algorithm experience.
-- 
David Roundy
http://www.abridgegame.org