[darcs-users] David's darcs

Petr Rockai me at mornfall.net
Wed Jul 22 15:45:41 UTC 2009

Jason Dagit <dagit at codersbase.com> writes:
> It would be interesting to see someone write down the commutation rules for
> those patches relative to the other patch types.
Yeah, well, let's see...

- hunk commutation is identical as in darcs (file path is replaced with id)
- anything other than (de)manifest vs (de)manifest is always trivial

- manifest/demanifest commutations are either trivial:

    manifest id1 path1 <-> manifest id2 path2
    manifest id1 path1 <-> demanifest id2 path2
    demanifest id1 path1 <-> demanifest id2 path2

  - when id1 /= id2 && path1 /= path2
  - fail when id1 == id2 || path1 == path2

- inv(manifest id path) is demanifest id path and vice-versa

That's a quick sketch, it may have some holes in it. But it shouldn't be hard
to think this through...

> Interesting. I didn't realize there were plans to incorporate bits from Camp.
Well, Eric has been talking about camp as the darcs-3 core for a while. I am
not generally opposed, at least not to using the primitives from Camp if
nothing else. If they emerge, they will likely emerge with much better
correctness and complexity guarantees than darcs-2.

> The darcs UI has some non-trivial bits in places like SelectChanges, but for
> the most part the stuff in the Commands/* modules is pretty light.  We have
> some implicit zippers in the UI code too that could probably be farmed out to a
> library modulo some low level unsafePerformIO hacks we have in there for
> optimizations.  (At some place we have a pure interface and under the hood it
> stores references and mutates them but the caller can't tell so it's still
> safe.)
Yes and no -- many assumptions about sequencing requirements are coded into the
command code, as far as I can tell. As well as details of inner workings of the
"core" at times. Most of these things are due to monolithic architecture and
too broad interfaces. It would however be hard to fix all these things without
introducing new bugs (that's why I suggest to only change this existing code
conservatively: no-one really knows that changing something is not going to
explode in someones face later, elsewhere).

> I think we basically agree about the approach to clean things up.  My opinion
> is that more unit tests, more minispecs, and better source level documentation
> would improve the situation more than a rewrite would.  We need those artifacts
> (unit tests, minispecs, documentation) in both scenarios (refactor vs.
> rewrite).  There is a lot of code that I wanted to refactor more aggressively
> when I was doing the type witness stuff, but I just didn't know why the code
> was there or what API it was expected to have versus what it really had.  I
> know from experience that when I don't understand a bunch of source code and
> it's hard to read that I usually just want to throw it all away and start
> over.  But, I also know from experience that refactoring in place until the
> code is readable again is usually better.  In other words, I'm hesitant to take
> a bunch of code from Camp or throw away a bunch of stuff.  Incremental
> refactoring is better in my opinion in this case.
We'll have to agree to disagree. Incremental refactoring is good, if you have
working code. But the darcs core is quite buggy and very poorly
understood. Moreover, we don't even know if there is a polynomial solution to
the conflict resolution defined by the (implicit) theory behind darcs-2. You
may know more than we do, but you have also stated that you are not willing to
expend too much time on this. We are not going to obliterate that code: if you
change your mind and want to fix it (or David wants) later, you are absolutely
welcome to do that -- a better implementation for darcs-2 format will
definitely have its uses.

> For things like grabbing a file and inserting/deleting a hunk this may very
> well be true.  But, I also know that a lot of the execution paths that eat up a
> ridiculous amount of resources tend to stream things "lazily".  Yes patch
> formats could be optimized so that we can fseek to the data, take slices from
> the inventory, and we need to jump to directory entries efficiently, but I
> don't think this is the end of the story.  I'd love to be able to tell people
> that darcs never uses more than X megs of ram and you can specify X on the
> command line.  I think iteratees will get us much closer to this goal than lazy
> IO will.  Partially, I simply think lazy IO is broken and I'd love to try just
> about anything that can replace it.
Hashed-storage does not use lazy IO if that makes you happy. It also doesn't
use iteratees, since they are not the right tool for the problem at
hand. Either way, putting hard bounds on memory use are impossible with current
design and impractical with any, since you have to commute unbounded number of
patches. You would just find yourself re-implementing virtual memory in darcs
itself. The reason lazy IO is used in darcs is because people wanted to pass
complete working or pristine trees as input to pure code. This is not necessary
nor a good idea. That's why I got rid of that in hashed-storage...


More information about the darcs-users mailing list