[darcs-users] Possibly a very simplistic solution
droundy at abridgegame.org
Fri May 21 11:07:49 UTC 2004
On Fri, May 21, 2004 at 09:10:35AM +0200, Ketil Malde wrote:
> David Roundy <droundy at abridgegame.org> writes:
> > This could be avoided by creating a simultaneous diff-and-sync function,
> > but that would be a bit nasty, since diff itself is an ugly function.
> > Also, it would eliminate the laziness in diff, which would be unfortunate.
> > Another way around it would be to not sync every time--we could randomly
> > decide whether or not to sync, which would speed things up most of the time
> > by a factor of two on large repos. The catch would that if you record a
> > very very large change, you may find afterwards that whatsnew/record are
> > very very slow for a while, since they would keep running diff on all the
> > files that you touched in that previous record.
> Couldn't you sync (lazily) after a diff has been needlessly run on
> identical files, instead of randomly? Then only the first whatsnew
> will slow.
The trouble is that the diff function is a "pure" function (i.e. no IO),
which is nice, and is what will make the proposed transition to a database
for _darcs/current relatively easy, but means that the syncing can't
easily be done in response to the diff. Basically I'd need to create a
separate "IO" version of the diff routine that would also do a sync.
On the other hand, this would definitely be the most efficient way to
go--it would eliminate all unnecesary directory traversals and stats.
There might even be a "performUnsafeIO" way I could do this with minimal
> > A third possibility would be to run the sync after each record, pull or
> > apply, rather than before each whatsnew or record. The advantage here is
> > that the "fast" whatsnew isn't slowed down, but instead the "slow" pulls
> > and applies are slowed down. Also, it eliminates redundant syncs.
> Sounds like the right thing to do, doesn't it?
It does sound like the right thing to do (and I hadn't thought of it until
writing the email). The only catch will be that this would mean if a user
does something like copy a repository by hand, the timestamps will be
messed up and whatsnew will be slow until a record or pull is done, which
could be confusing.
I've decided to implement this choice, since it's pretty easy, and for
large repos will significantly speed up whatsnew without slowing down other
operations, the only catch being that sometimes the repo might not get
re-synced as soon as one might like.
> (BTW, I'm not really bothered by darcs's speed, for the small scale
> stuff I'm doing, it seems plenty fast enough.)
Yeah, it makes a huge difference how big the repo is. My "big" test repo
is the linux kernel repository, and on that repo (on my iMac--which seems
to have slower system calls than linux) just the reading of the directories
and stating to get modification times takes about 45 seconds--so whatsnew
takes a minute and a half--when there are no changes, and the repository is
already in sync! Thus the motivation to speed this up... :)
Another great benefit would be to take full advantage of situations where
the user asks for changes only in a certain directory or to a certain file
(when doing whatnew, record, revert etc). Currently darcs just calculates
the entire diff and then filters out the changes that aren't in the
specified files or directories. The trouble is that mvs make it a little
awkward figuring out how to compute the diff of a single directory
(e.g. what happens when a file was moved into that directory). Perhaps
simply checking whether any "interesting" patches like mv are pending, and
then in the common case we could avoid traversing unneeded directories...
More information about the darcs-users