[darcs-users] bug reports for darcs diff

Fri Sep 26 14:52:23 UTC 2003

On Friday 26 September 2003 15:24, David Roundy wrote:
> Hello.
>
> On Thu, Sep 25, 2003 at 09:26:37PM +0300, Aggelos Economopoulos wrote:
> > here are some questions and a few bug reports that were delayed during
> > the exams period.
> >
> > i) 'darcs diff' produces bogus trailing output. Note that I haven't been
> > able to reproduce for trivial repositories so I can't provide a simple
> > test case.  Some sample output that demonstrates the problem is attached
> > in "bogus_diff".
> >
> > ii) darcs passes absolute path names to the diff program, producing very
> > long diff header lines; see "bogus_diff". Running it with relative paths
> > and _darcs as the current directory would produce much more readable
> > output, without revealing the absolute pathname of the repository and
> > would probably be more friendly to anyone who tries to apply the patch.
>
> The fix for the first two was the same!  :) The bogus output was caused by
> the temporary file used to hold the output of diff, so when I set the
> current directory to _darcs, everything is fine.

Thanks for fixing them so quickly!

> > iii) Is darcs diff -m 'whatever' supposed to produce a diff to the
> > 'current' tree? Please let me know, so I can submit a documentation patch
> > to DiffCommand.lhs.
>
> Yes.

Ok, I'll push some doc patches sometime after Monday (when I'm done with the 
exams period).

[...]
> > (I assume there isn't an easy way of using the defines in the system
> > header files for the constants?)
>
> Actually it's not hard, now that I have a C file for FastPackedString
> support functions.  I've coded it up right now (and probably broken win32
> compile again).  In the process I noticed why I was getting bus errors, so
> now mmap should work considerably better.
>
> I'm still not sure it gives any noticable improvement.  It probably would
> be most helpful when reading in the patch files because they can be pretty
> large and get held in memory for quite a while, but because they are large
> I support compressing them, and haven't implemented code to check if they
> are compressed and mmap them if they haven't been compressed.

Well, there are other advantages too. In modern operating systems, mapping a 
file does not start i/o until you actually take the page fault (i.e. until 
you actually try to read the data) and even then you only get the pages you 
access (but I think darcs needs to access all the file data anyway, so no big 
gain there). If you were mapping many files and _then_ trying to access the 
pages (that is, if the pattern were 'map A, B, C, ...; access A, B, C, ...' 
instead of 'map A; access A; map B; access B; ...;') you could probably save 
more than a few disk seeks by calling madvise(MADV_WILLNEED) which will start 
i/o asynchronously (linux does this, I'm not sure about the bsds), thus 
giving the kernel a chance to batch the page transfers. Have you tried to 
measure darcs performance (e.g. on a 'get') with and without mmap? Obviously, 
that's the only way to see if it matters performance-wise (and I should 
definately append this to my TODO file).

Another advantage of using mmap is that you're being friendly to the operating 
system when there is memory pressure in the system. If you used read(2), 
every page would have to be swapped out (should the OS decide to evict one of 
your pages). However, clean memory maped pages (pages that have not been 
written to) can just be thrown away. Dirty pages can be written back to the 
filesystem instead of the swap space. So, even if mmap usage doesn't 
noticably improve performance, I'd still prefer it to read() calls.

Aggelos