[darcs-users] optimising darcs annotate

Ganesh Sittampalam ganesh at earth.li
Thu Oct 23 21:29:37 UTC 2008


I've been thinking in detail about how to optimise darcs annotate, which I 
hope to do at the sprint this weekend. I'd appreciate comments on my 
current plan, which is not yet fully fleshed out.

- The basic idea is to keep track of what patches might touch what files, 
as has been suggested in past discussions about this topic.

- In each repo, maintain a local numbering of all the patches in the repo. 
This will facilitate referring to the patches efficiently. If a patch is 
deleted or commuted, there is no need to remove it from the numbering, so 
this might end up as a superset of the patches currently present.

- Maintain equivalence classes of (a) filenames and (b) directory names. I 
still haven't quite worked out the details of this, but the basic idea is 
that two names are related if there's ever been a move operation from one 
to the other in the repo history (again, supersets are allowed), and then 
we take the reflexive-symmetric-transitive closure to make the equivalence 
relation. We need to track directories and files separately because a 
directory move can implicitly cause file moves, and commutation means that 
we can't be sure what the complete set of affected files is at the point 
of the move.

- Maintain a mapping from the equivalence classes to sets of patches 
(referred to by the numbering). A patch is in the relevant set if it 
touches any member of the equivalence class.

- Implement some kind of restricted repository/patch reader (perhaps by 
adding parameters to the existing code) that is constrained to a set of 
patches and a set of names. This should run much quicker than a full 
repository read, but be equivalent from the point of view of the annotate 
code itself if given an appropriate restriction based on the file the user 
wanted to be annotated. The equivalence classes should ensure that we 
don't care if the user specifies a patch as well as a file.

- I'm planning on using Data.Binary for persisting the structures 
described above. Probably with a version number at the beginning for any 
backwards compatibility issues.



More information about the darcs-users mailing list