[darcs-users] case insensitivity and inconsistent state

Wed May 13 02:43:53 UTC 2009

Hi Simon,

I'm replying on the darcs-users mailing list because I think this is of
general interest and also the beginning of a discussion.

For the interested, this is a follow-up to Simon's comments on issue1453
  http://bugs.darcs.net/issue1453

On Thu, Apr 30, 2009 at 08:53:36 +0100, Simon Peyton-Jones wrote:
> a) Rather than saying "darcs failed:  File './stackTransitions.pdf' already exists!", could you not emit a message saying
>         "Looks as if your repo contains two files that differ only in
>         the case of their filename.  Use darcs get --hashed"

That's true.  This scenario is now covered by darcs using the --hashed
switch by default (Trent Buck just submitted a patch yesterday).

> b) Similarly, in the other case I had, the failure said that the repo
> was now in an inconsistent state.  That's alarming, if (as was happily
> not the case) I'd had a lot of uncommitted changes in the tree.  *Was*
> it inconsistent?  Couldn't the message say something about using
> --hashed too?

It was inconsistent.

You're right in that the user interface for this isn't very good.  It's
a question of generality and also ignorance.  The error message covers
the general situation of "uh-oh! something unexpected happened in the
pristine cache".  It is also ignorant of the fact that the underlying
cause for this unexpected thing was case sensitivity.

If this was a very frequently occurring cause of unexpectedness -- and
it is *fairly* frequent -- it may indeed make sense to introduce an
additional check with a nicer special-case error message, before
throwing the more general error.

> [These suggestions apply even if --hashed is the default, in case you use --no-hashed.]

Better output would be helpful, but it comes at a price of
special-casing the code.  I'm not sure what the right way to handle
these sorts of tradeoffs are.  My inclination is to say that the
joint probability of somebody simultaneously explicitly using
--old-fashioned to force an old fashioned repo, getting a repository
with filenames that differ only in case, and being on a case insensitive
file system is low enough to make it more worthwhile to keep the code
general.  But I could be wrong!  At the very least, the joint
probability of just the latter two combined is high enough.

> c) What *does* happen if you use --hashed and there are two files with
> the same name? Does one overwrite the other?  Do you make X and X_0?
> Should darcs not at least tell the user that something
> presumably-unintended has happened.  Silence is not golden here.

As far as internal consistency is concerned (by this I mean the patches
and pristine cache), darcs will do the right thing.  Now for the working
directory, it depends.  I don't mean to be flippant and write off the
working directory.  Of course we should take the working directory
seriously, but however seriously we take the working directory, we need
to take the long-term history (pristine + patches) even more seriously
than that.

So what happens in the working directory?  That depends: darcs 2 and up
have a habit of applying batches of patches (perhaps 100 at a time?) in
memory before then re-applying the lot to disk.  This could mean that if
the simultaneous co-existence of these files arises and vanishes within
the same batch of patches, we get an effective no-op, in other words,
you also get exactly the right behaviour in the working directory.   On
the other hand, if the simultaneous co-existence happens to cross a
patch boundary then some sort of havoc is possible.  You could
accidentally apply patches for two different files (foo and Foo) to the
same working directory file.

The attached script illustrates the problem of two patches accidentally
applying to the same file.  I think I should probably add this to our
list of known bugs as a regression testing script.  Unexpected things
are happening in the user's working directory, which is clearly far
from ideal!  This has convinced me that Ganesh's plan (which he thought
up during darcs hacking sprint #2) is the right way to go.  We need to
allow multiple files (with a unique id) to be associated with the same
name and somehow keep track of the correspondence between these unique
IDs, the expected file name and the actual file name in the working
directory (foo-1 vs foo-2).  This is long term Darcs-3 work, but at
least, we do have some ideas on the matter.

Thanks much for your comments!  Let us know if there is anything in
particular we can do to help.

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20090512/c6e1b0d4/attachment.pgp>