[darcs-users] [issue1067] darcs 2.0.2 (+ 76 patches) fails where darcs 1.0.9(release) succeeds (windows, ghc repo, get)

Eric Kow kowey at darcs.net
Thu Sep 11 08:46:52 UTC 2008


On Wed, Sep 10, 2008 at 00:24:14 +0100, Claus Reinke wrote:
> >(1) files which are introduced by third party tools are ignored
> >    because they do not appear in the index
> >(2) modifications to pristine files can be detected through hash failure 
> 
> Neither of these depend on changing the file names, just on a 
> second index to check against for presence and contents.

I agree.  It doesn't really matter how the files are stored as long as
we don't rely on the what the filesystem tells us.
 
> >(3) relying on darcs-internal filenames avoids the case
> >sensitivity issue as well as http://bugs.darcs.net/issue53
> 
> This is the only reason where encoding the file names could make a
> difference.

There is also a psychological (4), which you would see as a bug.
Changing the file names also discourages people from trying to do things
with the pristine cache.

One silly example is that I occasionally like to make mass modifications
to my source code by looping sed over, say, all files that end in .hs.
To detect these files I would just run Unix find on the current
directory, forgetting that there is a pristine cache.  My script runs
and all the .hs files in pristine and there we have corruption.  Now due
to #2, this is less of a problem because the files would just trigger a
hash failure, but it is nice if we can avoid this kind of trouble in the
first place.  This example is slightly hokey, I grant you because we
were spared the trouble only by virtue of the .hs extension being
missing in our darcs-internal filenames.

More broadly, this removes the temptation from third party tool authors
to rely on the directory listing, which would work most of the time, but
then break on the occasion that a file was renamed (your suggestion
below).

> But pristine is only a copy of a version of working, so you
> don't avoid the trouble at all.

Ah but we do! I know you don't like the tolerant application of patches
to the working directory, and that you have made some nice suggestions
for it to become a little sturdier.  In the meantime, having both the
darcs-internal filenames (to avoid trouble in pristine) and high
tolerance for working directory failures means that this works in
practise.

> Unless you adopt a system that applies the same (and user-readable)
> renaming to working and pristine, warning about non-portable naming
> sounds like the best option (actually, I'd like that to be on by
> default - better safe than sorry and all that).

Yes! This is the kind of solution I want to see for
http://bugs.darcs.net/issue53 and the case-sensivity issue.  Darcs
should keep some kind of mapping between human-readable renamings and
expected filenames.  It might be a pretty invasive code change -- we
would have to make sure the table gets consulted at the right time --
and there could be pitfalls to watch for (what happens when we have a
darcs mv), but I do definitely want to see this happen.

It's also where the manpower shortage comes in.  Help wanted!

> Yes, it would simply be a question of concrete vs abstract interface,
> but for: (a) the encoding of file names addresses none of the issues
> you listed and (b) being able to bypass darcs improves the chances
> of survival when darcs screws up, thereby improving confidence (that
> applies for all darcs ops, eg, some issues would be devastating if one
> couldn't bypass darcs and remove bogus pending files, left-over locks,
> etc.).

For what it's worth, I think it should be a pretty straightforward and
potentially useful Perl script to write; one that recreates a directory
out of a hashed pristine directory and its starting hash.

> $ darcs +RTS --info
> [("GHC RTS", "Yes")
> ,("GHC version", "6.8.3")
> ,("RTS way", "rts_thr")
> ,("Host platform", "i386-unknown-mingw32")
> ,("Build platform", "i386-unknown-mingw32")
> ,("Target platform", "i386-unknown-mingw32")
> ,("Compiler unregisterised", "NO")
> ,("Tables next to code", "YES")
> ]
> 
> My first bet would be memory usage. I have only 1Gb, so once there
> are 1.5Gb in use, someone is going to suffer.

Hmm... on second thought, issue973 is not related to the slowness in
getting hashed repositories (it was a whatsnew issue, an optimisation
being crippled so that darcs mistakenly thought that /every/ file in the
working directory was potentiall new).

It would be interesting to see if this slowness was limited to Windows
and/or getting from old-fashioned repositories to hashed ones.  As I
mentioned in the bugtracker, getting hashed repositories from hashed
repositories seems to be a lot faster (and allows for things like global
cache use and lazy patch fetching)

Anyway, Claus, you seem to have quite a few nice ideas for darcs.  As
you can see we don't have the manpower we need to implement all the nice
ideas we want, so we could definitely use some help.

If you only have one weekend to spare, please consider coming to the
darcs sprint! (25-26 Oct, Portland, Brighton, possibly Paris)... or
consider being there over email/IRC :-)

Cheers,

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
Url : http://lists.osuosl.org/pipermail/darcs-users/attachments/20080911/2784384b/attachment.pgp 


More information about the darcs-users mailing list