[darcs-users] darcs patch: make reading the pending lazy in summary mode

Gwern Branwen gwern0 at gmail.com
Tue Apr 29 22:10:21 UTC 2008


On 2008.04.28 09:27:27 -0700, David Roundy <droundy at darcs.net> scribbled 1.7K characters:
> On Mon, Apr 28, 2008 at 8:39 AM, Gwern Branwen <gwern0 at gmail.com> wrote:
> >  > I'm not sure what you mean by co_slurpy being strinct.  It looks to me like
> >  > it's got adequate unsafeInterleaveIO to make it lazy.
> >  > --
> >  > David Roundy
> >
> >  Well, it does have plenty of unsafeInterleaveIO, that is true, but the issue here is readFilePS: readFilePS is completely strict, it reads the entire file into memory (per docs and implementation). So, actually running readFilePS may get delayed to the last second, but once readFilePS gets inspected, it'll immediately do its best to suck in all 9 gigs or whatever.
> >
> >  This is why replacing readFilePS in co_slurp_helper with mmapFilePS is such a time saver - it is lazy and pretends to read in all 9 gigs immediately, but since with -s, we ultimately only read the first 4096 characters, only a little bit will ever actually get page-faulted into memory.
> >
> >  (The problem with mmapFilePS is that as lispy mentions, on my 64-bit system, mmapFilePS can no longer handle >3 gig files while readFilePS scaled up to at least 9gigs, albeit slowly.)
>
> The other problem is that mmapFilePS will cause darcs to fail entirely
> on large repositories (with more than 1k files) due to sucking up all
> the system's file handles.  I think this is a more common use case in
> darcs than 9g files.  Of course, we could refuse to mmap small files
> (we already do this for very small files), and that could alleviate
> the problem considerably.

(Just a side note; with Lispy's type sig changes, I can now handle >3 gig files just fine, albeit more slowly than with readFilePS.)

Hm. I'm not sure about that. Perhaps you mean it'll fail on 32-bit systems? It works for me:

gwern at localhost:2849~/foo>echo "make sure we're using lispy's mmap version" && duh bigtempfile                                      [ 6:04PM]
make sure we're using lispy's mmap version
3.9G bigtempfile
3.9G total
gwern at localhost:2850~/foo>cd ~/bin/ghc && darcs query manifest | wc [ 6:05PM]
aclocal.m4    compat/       configure.ac  distrib/           ghc.spec.in install-sh      LICENSE   quickcheck/  validate
ANNOUNCE      compiler/     _darcs/       docs/              gmp/ InstallShield/  Makefile  README       WindowsInstaller/
bindisttest/  config.guess  darcs-all     driver/            HACKING libffi/         mk/       rts/
boot          config.sub    darcs.prof    extra-gcc-opts.in  includes/ libraries/      push-all  utils/
   1191    1234   33726
gwern at localhost:2851~/bin/ghc>echo "ok, so there's 1200 files here. Let's see whether whatsnew -s fails due to filehandles" && darcs whatsnew -s
ok, so there's 1200 files here. Let's see whether whatsnew -s fails due to filehandles
No changes!
gwern at localhost:2847~/bin/ghc>echo "maybe the problem was masked by the lack of changes?" && rm HACKING ANNOUNCE LICENSE README [ 6:07PM]
maybe the problem was masked by the lack of changes?
gwern at localhost:2848~/bin/ghc>whatsnew -s [ 6:07PM]
R ./ANNOUNCE
R ./HACKING
R ./LICENSE
R ./README

> Another problem is that using mmap on files in the working directory
> can lead to segfaults, since the user is allowed to edit files in the
> working directory while darcs runs--or at least I don't want to
> segfault if the user does this.
>
> David

Hm, that does sound bad. Is there no way to handle this (set read-only, catch exceptions, etc)? I'll admit I've never tried to edit files while using Darcs, but that's just me.

--
gwern
Kilo remailers BOSS Medco mass CIDA Fetish bullion USCODE spies
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.osuosl.org/pipermail/darcs-users/attachments/20080429/ea40d641/attachment.pgp 


More information about the darcs-users mailing list