[darcs-users] darcs patch: Begin using RIO (and 39 more)

David Roundy droundy at darcs.net
Sun Sep 14 21:51:18 UTC 2008


On Fri, Sep 12, 2008 at 01:37:22PM -0700, Jason Dagit wrote:
> > I haven't found where the code you're quoting is located.  Maybe I
> > haven't got all your changes?
> 
> Hmm....if you applied the whole patch bundle it should have been in
> there.  I generated this email by looking at the submitted patch
> bundle and a repository where the patches were applied.  Strange.

Can we revisit this one later? I'm at home now, and don't really have the
computational power to make the creation of an extra fork pleasant.  (Just
compiling with witnesses will take a half hour or hour.)

> > I would break this into two stages, with two calls to withRepoLock.
> > In the first one, you revert all unrecorded changes, and then in the
> > second one you can simply verify that unrecorded and recorded are
> > identical.  It's a bit hokey, but it's safe and relatively simple.
> 
> Okay, so the final version will have some sort of:
>   do unrec <- get_unrecorded_unsorted
>        case sloppyIdentity unrec of
>         IsEq -> do add_to_pending ....
>         _ -> bug "there are still unrecorded changes"
> 
> I think I understand how to fix this one up.  Thanks.

Yes, but I'd use impossible rather than bug, to indicate that you have
sound reason to believe that case is impossible.  Also, that avoids the
need for grep in order to find the bug (in case it does turn out to be
possible).

> > It might be a good idea to introduce a second monad for functions that
> > modify the unrecorded state.  I'd rather keep this distinct from RIO,
> > because it's not generally safe to intermingle changes to the working
> > directory with changes to the recorded/tentative state.
> 
> Interesting point about intermingling changes.  I guess the 'cute'
> name for such a monad would be UIO, for Unrecorded Repository IO.  I
> think URIO is too long.  If we only need it for this one spot it seems
> sort of heavy handed if the above trick suffices.

Agreed.  Of course, many commands only modify the working directory, so
add, replace, revert, unrevert and remove could all use it as well.  But I
agree that it's pretty low priority at the moment, as these are all pretty
well-served by a single function (or two) that does the updating.

> > Part of the problem may relate get_patches_beyond_tag has a wrong
> > type.  Its type (and in future, when introducing new functions, please
> > try to give them new names) is only right if you've already called it
> > once to find out the type of the tag in its minimal context.
> 
> I wasn't trying to introduce a new function and that is why I stole
> the name.  It wasn't until quite a bit down the road that I realized
> I'd still need the old type signature is several places.  I really did
> mean for it to replace the old type signature.  In the final version
> of these changes I can give this more typeful one a different name.
> Actually, just adding get_patches_around_tag and phasing out the old
> ones seems best.  You don't think doing so will introduce unwanted
> inefficiencies do you?
> 
> I'm glad you've had a chance to give me feedback on it now though.  I
> didn't  catch on to this minimal context bit from reading the code.
> My next iteration of patches will have some haddocks about it as it
> seems highly relevant and non-obvious to at least some people
> (myself).
> 
> I'm confident there are still issues here, but you'll need new patches
> from me to discuss them I reckon.

Sounds reasonable.  The get_patches_around_tag below seems like a good
place for starting a refactor of some of this stuff.

> >> Actually, I think that middle element needs to have the same context
> >> as the PatchInfo, so:
> >> get_patches_around_tag :: RepoPatch p => PatchInfoAnd p C(x y) ->
> >> PatchSet p C(z) -> (RL (RL (PatchInfoAnd p)) C(() x),  (PatchInfoAnd p) C(x y),  RL (RL (PatchInfoAnd p)) C(y z))
> >
> > No, this would be wrong, for the same reason that
> > get_patches_beyond_tag is wrong, which is that there's no guarantee
> > that the tag input is in its canonical order (i.e. that there aren't
> > any patches preceding it that are not in the tag).
> 
> Ah, hmm.  I didn't catch on to that from reading the code.  My
> proposed solution of get_patches_around_tag may not be sufficient now
> as I seem to recall the reason I changed the type signature was to get
> the return value to have a specific context.  Either way, it seems
> that adding a correct get_patches_around_tag would still get me closer
> to the right solution.  So that is still something I should do it
> seems.

It'll be interesting to see.

> > This one is tricky.  The canonical answer (which is wrong in this
> > case) would be to avoid computing the patchset twice.  The type
> > witnesses tend to fail when we do redundant work, which is often a
> > good thing, since it means that the type system helps us to catch
> > inefficiencies.
> 
> I only know of uniqueness typing very vaguely, but is that what we're
> recreating here?  I should probably find a paper or two and read them
> for the RIO section I'm writing...

Hmmm.  I'm not certain that uniqueness typing would be relevant here, but I
also know very little about it.

> > In this case, the inefficiency is very intentional, however, in order
> > to avoid a memory leak.  So the simple solution of eliminating
> > patchset2 and replacing it with patchset, which would satisfy the type
> > checker and save on disk IO is a very bad idea.
> 
> Is this related to the lazy scan_bundle stuff or did you find a
> different memory issue here?  I don't (yet) spot these memory leaks,
> so if you have some time to comment on this one I would appreciate it
> as it would be very educational.

No, unrelated to the lazy scan_bundle.  It's just that when putting, we
need to create and use a patch bundle describing the entire repository.
Holding the entire parsed repository in memory is not a feasible option, so
we instead read the repository twice, so as to consume each copy lazily.
If we used lazy bytestrings in hash_bundle, we wouldn't even need to hold
the entire patch bundle in memory, which would be pretty handy.  But the
patch bundle itself is around an order of magnitude cheaper than holding
the parsed repository, since it's a flat packed string, so we're still
gaining hugely in memory use.

You can look at make_bundle2 to see how the two copies are used.

> > I think the best solution may be one that is designed for precisely
> > this problem, something like:
> [snipped examples]
> > etc.  But I'm just brainstorming at the moment.
> >
> > This is a tricky problem, or at least it seems tricky to find an
> > elegant and general solution.
> 
> If I get to this point you'd be okay with the less general
> moderatelySafePerformIOtwice/moderatelySafePerformRIOtwice solutions?
> 
> I'm hesitant to go for the generality of the other solutions on the
> basis that we're not likely to need it.

Right.

David (skipping comments of yours that didn't seem to require comment)


More information about the darcs-users mailing list