[darcs-users] darcs patch: Add LANGUAGE pragmas for explicit langua... (and 3 more)

Wed Oct 29 04:12:53 UTC 2008

On Tue, Oct 28, 2008 at 7:39 PM, David Roundy <droundy at darcs.net> wrote:
> On Tue, Oct 28, 2008 at 03:36:32PM -0700, Eric Kow wrote:
>> Salvo 6!
>>
>> Sun Oct 26 00:01:40 BST 2008  Don Stewart <dons at galois.com>
>>   * Add LANGUAGE pragmas for explicit language extensions
>>
>> Sun Oct 26 16:40:09 GMT 2008  tux_rocker at reinier.de
>>   * add a get_unrecorded_in_files to check for unrecorded changes in a subset of working directory
>>
>> Sun Oct 26 19:06:12 GMT 2008  tux_rocker at reinier.de
>>   * make get_unrecorded_private work with type witnesses again
>>
>> Sun Oct 26 19:46:36 GMT 2008  tux_rocker at reinier.de
>>   * make whatsnew use the lstat-saving functions to scan the working copy
>
> These look very appealing, and have some nice ideas, but are also buggy.  :(
>
>> ] hunk ./src/Darcs/Repository/Internal.lhs 417
>>      ftf <- filetype_function
>>      Sealed pend <- read_pending repository
>>      debugMessage "diffing dir..."
>> +    -- the unsafeCoerceP below is necessary to be able to concatenate
>> +    -- pend with NilFL to form dif. See http://hpaste.org/11480
>>      let diffs = if null files
>> hunk ./src/Darcs/Repository/Internal.lhs 420
>> -                  then [smart_diff opts ftf cur work]
>> -                  else catMaybes (map (diff_at_path opts ftf cur work) (map fn2fp files))
>> -    let workdiff = foldl (+>+) NilFL diffs
>> +                  then smart_diff opts ftf cur work
>> +                  else let diffsPerFile = catMaybes (map (diff_at_path opts ftf cur work) (map fn2fp files))
>> +                       in foldr (+>+) (unsafeCoerceP NilFL) diffsPerFile
>>          dif = if AnyOrder `elem` opts
>
> There may be other bugs, but this one is sufficient to cause trouble.  This
> unsafeCoerceP really is unsafe.  Combining patches together in this way has
> serious potential for repository corruption! In particular, if a file is
> referenced more than once in the paths, you can get multiple copies of the
> same change.

It's not actually the unsafeCoerceP that makes this case unsafe, and
I'll explain why I think that.  The unsafeCoerceP is just there to
deal with the problem in this psuedo code:
foo :: a -> FL Prim C(x y)
foo _ = NilFL
foo a = -- do something legit with 'a' to create a sequence

The problem being that this won't actually type check unless x = y.
Basically, our type signature prevents foo from returning a list, such
as the empty one, that claims to not modify context.  If we add
unsafeCoerceP on the NilFL the problem 'goes away'.  I'm calling this
the 'foo idiom' for now.

You might argue that adding unsafeCoerceP to foo is unsafe in terms of
Haskell's types but this argument depends on two reasons: 1) We have
unsafeCoerce inside (=\/=) and (=/\=) and with it you can use foo to
reimplement unsafeCoerce. 2) x and y are unconstrained by a
relationship to the input of foo.

Now, while we shouldn't encourage functions that make it possible to
implement unsafeCoerce the above foo idiom isn't really a problem by
itself.  The usefulness of functions like foo is evidenced by unsafe
functions like smart_diff.  Being able to create a patch, with its
context, 'out of thin air' is useful and we just have to be careful
about it because of the next point.

The next level of unsafety is the one we assign to x and y.  We are
embedding our patch theory constraints in Haskell's types and
sometimes we can't do this flawlessly.  Sometimes what we want to do
is type safe in Haskell's type system but does not respect the
invariants of patch theory.  In this case the flaw is that smart_diff
is really unsafe_smart_diff.  Our type signature for smart_diff is
equivalent to replacing 'smart_diff' with '(unsafeCoerceP .
smart_diff)'.

Again, the problem here isn't with the unsafeCoerceP on NilFL.  Which
is my way of saying, the problem isn't the fact that we are doing
something that can be used to implement unsafeCoerce (while admitting
that it is undesirable).  The real problem is that darcs has an
invariant that our type encoding is ignorant of.  That invariant, the
one you point out, seems to be about multiple copies of the same
change.  I agree that we need to fix the code above to respect that
invariant.

> The get_unrecorded_in_files idea is great, and looks like the right sort of
> interface, but concatenating patches from separate diffs is way too
> fragile.  A better approach would be to pass all the interesting files into
> a diff_at_paths.

Okay, so that pushes the invariant down a level.  I can't say for
sure, but I think you're saying  that makes it easier to deal with.  I
certainly can't think of a way off the top of my head to make a
context threaded version of smart_diff.  Well, with the possible
exception of returning Sealed (FL Prim C(x)) like we do with
readPatch, which is flawed but harder to misuse.  And one more
exception for dependent types; I'm pretty sure Agda could represent
this even though I don't have an implementation to prove it.

Generalizing your suggestion I think this is what you're saying: To
make our flawed type encoding easier to work with, pass all the
information necessary to construct the sequence to just one function.
So that instead of O(n) calls to a (patch theory) unsafe function
(smart_diff or diff_at_path) which can be threaded/concatenated in a
(patch theory) unsafe way we make O(1) (patch theory) unsafe call(s).
The code above is already sound in Haskell's type system (but could be
made unsound by introducing (=\/=) or (=/\=)).

Note: I'm hence forth making a distinction between things that are
patch-theory unsafe and things that are haskell-type-system unsafe.
Where the latter is sufficient for the former.  It would also be handy
to make a special case for the foo idiom in the places where it is
audited and known to be used safely.

> I'm right now pushing a test that demonstrates this bug in whatsnew.
> (Except it passes, since I'm not applying these patches just yet.)

Ah, tests are good.  They help us document these invariants.  Patch
review is good, but I would certainly feel more comfortable with test
centric patch acceptance over patch-review centric.  I certainly miss
a lot of things when I review patches.  Also the research I read in my
first year of graduate school, about psychology of programming,
agrees.  If I recall correctly Nancy Pennington has some great studies
on the subject.

We could of course have the best of both worlds when we get the code
more factored so that we have a small kernel that must be correct at
all costs and libraries built on top by contributors that just want
feature X where X is tolerant to bugs, eg., doesn't corrupt data (like
interactive change viewing, which is buggy by the way but it won't
cause corruption, mild confusion at best, I made a ticket for this but
I can't find it).

> P.P.S. It's nice to see a case where the type witnesses really did catch us
> a bug... Note that this code is right at the border between where the type
> witnesses protect us and where they don't, since Diff itself isn't
> protected by type witnesses.  Fortunately, it was on the safe side of the
> border, or I might not have caught it... also fortunately, the bit on the
> diff side of the border is sufficiently simple so as to be easily
> reviewable.

I disagree that we are on the useful side of the type witnesses here.
I've made my case for why above.  It is good that you saw the
unsafeCoerceP and started investigating, but it's really the
diff_at_path and smart_diff that should have caused the audit in this
case.  Here I think it's unfair to say that unsafeCoerceP is the
culprit, given it's usage.  It's solving a different problem, namely
the foo idiom.

I'd argue that since I also reviewed this during the sprint but
couldn't spot the bug that this is evidence that an invariant is
missing from the type encoding.

I'm not arguing that the type witness stuff is without use.  It's
incredibly handy, I just think we have to be careful about attribution
here :)

Thanks,
Jason