[darcs-users] darcs-hs/hashed-storage review

Petr Rockai me at mornfall.net
Thu Aug 6 14:36:54 UTC 2009


Ganesh Sittampalam <ganesh at earth.li> writes:
>> The line/line diff is not a problem -- as you say, this is done through hunk
>> canonization. The problem is that unsafeDiff does unhealthy amount of magic
>> about empty files, tacking newlines here and there. I have tried to simplify
>> the logic while retaining the behaviour, but bugs could have happened, even
>> though unlikely, as I'd expect these to show up on the testsuite. (I
>> previously got it wrong and it has shown up... but, you never know.)
>
> I guess some QuickCheck is in order then, but it's not necessarily crucial that
> treeDiff actually does replicate unsafeDiff, so long as it does actually
> properly represent the diffs (otherwise users might be left with unrecorded
> changes which would be very weird/confusing).
I have read the code again and I take that back. There are just four
problematic cases that need checking:

- empty -> text with trailing newline
- empty -> text without trailing newline
- text with trailing newline -> empty
- text without trailing newline -> empty

I have thrown together a small testcase that checks these four cases
explicitly. (I must say that darcs creates quite curious hunk patches for the
no-newline cases...) It is available in tests/trailing-newlines.sh in current
darcs-hs.

> Also, darcs check should check the index (and darcs repair fix it).
Hm, it is always safe to rm _darcs/index, so repair could just do that. As for
check, it should be easy to implement, I'll do that in a bit (all it needs to
do is read the index and pristine, check that the file lists match and then
recompute the hashes from actual file contents of working files... did I miss
something?).

> I've read through it again and I now understand what's going on. The fact that
> IO actions (or whatever monad) are all over the place - in the Tree, in the
> sync operation, and as a lower layer of the monad - is rather confusing, though
> I can see why it's that way now I've got to grips with it. I think the main
> improvement to the haddocks that would have helped me is to change "get somehow
> reflected in the actual real filesystem" to "might get somehow reflected ...".
Aha. Yes, virtualTreeIO is an exception in this case. I'll work on the Monad a
little more, probably over the weekend. (I do use virtualTreeIO in place of
apply_to_slurpy in darcs-hs... or more precisely, we use it to implement
applyToTreei -- that's basically one of the intended use-cases. The other is to
do read-only things -- it may be however better to have a read-only monad for
that use-case -- basically a counterpart to ReadableDirectory.) This will be
made easier by newtype-wrapping the monad I guess.

>> Well, hashedTreeIO is currently not used. Well, it is used by darcs
>> check/repair in darcs-hs, which does its own cleanup. This is however
>> something that will need addressing.
>
> OK, I'm now rather lost about what is needed for what. I'd have expected
> hashedTreeIO to be needed for everything that touches pristine in a hashed
> repo. Is this just a work in progress?
Eventually, yes. For now, it's a work in progress. (In darcs-hs, I have a patch
to port check/repair to hashedTreeIO and it even works, although it's slower
than current code by a fair margin.)

>> > - I'm very dubious about semantics of finish - why doesn't expandPath >
>> call it? Presumably the documented usage semantics of readIndex have >
>> something to do with this, but it feels fragile, and I only sort of >
>> understand the whole dirs_changed/finish stuff in readIndex'
>> You are right about being dubious. I however haven't found a better way to
>> reasonably implement index updates. In retrospect, it may have been better to
>> just provide a specialised "expand" for the index tree, that would also do
>> the update. I'll check if there's a reason preventing that, and if no, I'll
>> go for it in hashed-storage 0.4 and just get rid of finish.
>
> That would definitely be an improvement, but readIndex would still return
> a Tree that wasn't quite like normal Trees, which makes me uncomfortable.
> We should discuss a bit more what it's trying to achieve (perhaps on IRC).
This is now done as well. The readIndex function now gives a pair: the Tree
object, and an action to "update" it. I have documented the new behaviour [1]
in haddock. It is indeed a little different from the usual Tree objects in the
sense that it gives you what is in the index file -- which may be out of
date. It would however be a waste to update all of the index every time
(although this may be what git does, IIUIC). To make it possible to update only
a part of the index, you can trim down the index Tree and then update the
result. It however wouldn't be much of a problem to also provide a function
that will always give you full, up-to-date index. For darcs however, we need
(when we say darcs whatsnew subtree) to use the partial update functionality.

[1]: http://repos.mornfall.net/hashed-storage/dist/doc/html/hashed-storage/Storage-Hashed-Index.html#v%253AreadIndex

Yours,
   Petr.


More information about the darcs-users mailing list