[darcs-users] darcs patch: Use index-based diffing in Record. (and 57 more)

Ganesh Sittampalam ganesh at earth.li
Sat Sep 12 22:11:03 UTC 2009

On Sat, 12 Sep 2009, Petr Rockai wrote:

> Petr Rockai <me at mornfall.net> writes:
>> I am sending the darcs-hs patchbomb, since I just went over the code and
>> changes and fixed a few outstanding issues. Now darcs-hs passes the testsuite
>> again, and also -- compared to mainline -- fixes issue1488 and issue252.
> Ok, on the darcs-hs repo (http://repos.mornfall.net/darcs/darcs-hs) I have now
> also purged the Darcs.Gorsvet module. I have also updated
> http://wiki.darcs.net/Review/DarcsHS -- we are now down to 8 bullet points.

I should mention that I haven't looked at the actual changes to darcs-hs 
in much detail yet, so that side shouldn't be considered properly reviewed 
yet. I will however do my best to do so asap. In addition my general 
feeling is that the changes to darcs itself are intrinsically likely to be 
correct because they just build on the correctness of hashed-storage.

Overall I really like the hashed-storage work and I think it's a strong 
step towards making darcs faster and more maintainable. As Petr says 
below, there are some quite significant points of disagreement that remain 
between us over a few aspects of the overall design, but those shouldn't 
obscure the essential success of his work, and I would be willing to sign 
off on the current state of things, albeit somewhat reluctantly.

I think that Petr's summary below is pretty fair. I've added my view.

> Out of these, 2, 3 and 8 are mostly trivialities that can be addressed with
> just a little bit more time (over the next week, I suppose). I believe 5 is
> just a documentation issue, and 6 is partially documentation and partially
> thinking on how to make the situation better.


> Point 7 is a lot more work, and the question whether it's worth the investment
> is a hard one. The problem can definitely be fixed by keeping an extra file
> with hashes of the pristine files. This will require a certain amount of
> infrastructure work to make a clean interface -- it would basically constitute
> a new format in hashed-storage. The result would be a format that is basically
> plain and addresses the consistency issues (it'll keep hashes around) for the
> most part. It however still breaks down on case-insensitive filesystems (even
> though this time, it'll likely lead to unusable repository in all cases, unlike
> the existing behaviour of sometimes producing silent corruption). Go
> figure... However, I'd like to ask that whatever we decide, we should not block
> the merge on this.

I also agree about not blocking the merge, because keeping the branch 
alive for too long is a waste of Petr's time. The only issue is that if we 
decide that the performance regression needs to be fixed, then we need 
to be reasonably sure that it will happen in time for 2.4.

> However, the situation with 1 and 4 is even more complicated. We are basically
> locked up in disagreement with Ganesh about, I believe, basically two things:
> (a) Ganesh believes that it's vital to move Storage.Hashed.Darcs (the 
> current module that holds everything darcs-specific in hashed-storage) 
> into the darcs darcs repository. I have both practical issues (the 
> hashed-storage testsuite relies on this code)
> and theoretical reservations about such move (I think it's
> fair to offer this functionality without dragging in libdarcs). A compromise
> would have a separate cabal package in the darcs repo, but this does not fix
> the testsuite problem. I still prefer to keep the Darcs module in
> hashed-storage proper to make life easier for non-darcs users of hashed-storage
> (but there currently aren't that many, it's just about lowering the bar...),
> but I guess a separate cabal package in the darcs repo would be acceptable as
> well (a different compromise would have a separate cabal package outside the
> darcs repo, but hosted on darcs.net... hard to weigh pros and cons here).
> (b) Ganesh thinks it would be better to overload the Tree type and a number of
> associated functions over the Hash type, while I favour the current variant
> with a single global Hash data type. We both think that our preferred version
> is more readable. I currently can't really think of any serious advantages of
> one over the other, as of the current status, with darcs-specific code factored
> into a separate module.

In my preferred world, hashed-storage would be a library that users can 
instantiate with whatever specific hash type and format they choose. Darcs 
would then be a client with a specific instantiation that takes account of 
the general weirdnesses of the darcs hashed format (it uses both SHA1 and 
SHA256 hashes, the hashed filenames in existing repos are prefixed by a 10 
digit size specifier, and the way that directories listings are hashed is 
also somewhat specific to darcs). The testsuite could also be general and 
instantiated appropriately, which I think addresses Petr's point above, 
though I'm not certain.

Clients of hashed-storage that have nothing to do with darcs
could use a cleaner hash type and format (e.g. just plain SHA256). 
hashed-storage could export this "standard" instantiation itself.

For clients that want to work with the darcs repo format, I think the 
solution Petr mentions above of having a separate cabal package 
(hashed-storage-darcs?) in the darcs darcs repo is the way to go. It's a 
good step towards the goal of having proper API for darcs, but without 
getting us into a position where changes to the darcs format potentially 
have to be coordinated between hashed-storage and darcs. I think that if 
darcs supported nested repos so that we could nicely write patches that 
spanned both repos, this would be less of a consideration, but it doesn't. 
The downside of this idea is that it further complicates the build process 
for darcs - but hashed-storage-darcs should be small and quick to build so 
hopefully this wouldn't be too much of a problem.

I should mention that I've already prototyped much of the parametrisation 
in a throwaway fork of Petr's repo and I'd be happy to bring this up to 
date and extend it to the testsuite in order to minimise any extra work 
this change would place on Petr.



More information about the darcs-users mailing list