[darcs-users] darcs patch: Begin using RIO (and 39 more)

Fri Sep 12 17:13:20 UTC 2008

Hi Jason,

You've guilted me into responding to your questions, although I'm not
clear what good it'll do you.

On Mon, Sep 08, 2008 at 05:08:20PM -0700, Jason Dagit wrote:
> Moving on to the next one, still in Changes.lhs.  I added a comment to
> this one hoping I would get some feedback.  It appears to happen
> because we have this function:
> get_changes_info :: RepoPatch p => [DarcsFlag] -> [FilePath] -> PatchSet p C(x)
>                 -> ([Sealed2 (PatchInfoAnd p)], [FilePath], Doc)
>
> And it gets used like this:
> +  case mF (:<:NilRL) (headRL (slightly_optimize_patchset r)) of
> +   FlippedSeal ps -> do
> +    when (not $ nullRL r || unsealFlipped nullRL (headRL r)) $
> +     putDocLnWith simplePrinters $ changelog opts' NilRL $
> +                  -- This unsafeCoerceP seems to be neccessary due to
> +                  -- inventory splits at tags.  Without the context
> +                  -- would be too long (and redundant) for practical
> +                  -- usage.
> +                  get_changes_info opts' [] (unsafeCoerceP ps)
> +  where opts' = if HumanReadable `elem` opts || XMLOutput `elem` opts
> +                then opts
> +                else MachineReadable : opts
> +        mF :: (FORALL(y) a C(y x) -> b C(y x)) -> FlippedSeal a C(x)
> -> FlippedSeal b C(x)
> +        mF f (FlippedSeal a) = flipSeal $ f a
>
> The problem here is that we don't give a PatchSet to get_changes_info,
> we give it a patch sequence (specifically the starting context is not
> ()).  It seems as though we are trying to treat all the patches since
> the tag as being all the patches in the repository, as if starting
> from ().  The easiest fix here is to get rid of what I interpreted as
> an optimization and just pass the result of slightly_optimize_patchset
> directly to get_changes_info.  That seemed like a pretty big change
> for me to make without your feedback.

I haven't found where the code you're quoting is located.  Maybe I
haven't got all your changes?

> Next we have to Add.lhs, in particular addp is problematic, but I
> think I may have that one figured out finally.
>
> After that is MarkConflicts.lhs:
>      -- These unsafeCoercePs are safe because above, if there were
>      -- unrecorded changes we reverted them by applying their inverse
>      -- to the working copy.  Thus, r = u.
>      do add_to_pending repository (unsafeCoerceP res :: FL Prim C(u x))
>         applyToWorking repository opts (unsafeCoerceP res :: FL Prim
> C(u x)) `catch` \e ->
>             bug ("Problem marking conflicts in mark-conflicts!" ++ show e)
>
> Again we have an issue with which state is tracked in RIO.  Just a few
> lines up from that we check that pending is NilFL and if it's not
> apply the inverse pending, but doing that in RIO won't help.  What
> should we do about that other than unsafeCoerceP or extending RIO?

I would break this into two stages, with two calls to withRepoLock.
In the first one, you revert all unrecorded changes, and then in the
second one you can simply verify that unrecorded and recorded are
identical.  It's a bit hokey, but it's safe and relatively simple.

It might be a good idea to introduce a second monad for functions that
modify the unrecorded state.  I'd rather keep this distinct from RIO,
because it's not generally safe to intermingle changes to the working
directory with changes to the recorded/tentative state.

> I moved on to Mv.lhs and I was able to get rid of one more
> unsafeCoerceP by using the Maybe (FL a C(x y)) trick above, but I
> don't see how to eliminate this one:
> mapMFL :: Monad m => (FORALL(u v) a -> m (Maybe (b C(u v)))) -> [a] ->
> m (FL b C(x y))
> mapMFL _ [] = return $ unsafeCoerceP NilFL
> mapMFL f (a:as) = do b <- f a
>                     bs <- mapMFL f as
>                     if isJust b
>                       then return (fromJust b :>: bs)
>                       else return bs
>
> I spent an entire morning trying to get rid of the unsafeCoerceP in
> the above function but so far I have failed to remove it.

As usual, you change the return type to be sealed:

mapMFL :: Monad m => (FORALL(u v) a -> m (Maybe (Sealed (b C(u)))))
       -> [a] -> m (Sealed (FL b C(x)))
mapMFL _ [] = return $ seal NilFL
mapMFL f (a:as) = do b <- f a
                     case b of
                       Nothing -> mapMFL f as
                       Just (Sealed b) -> do Sealed bs <- mapMFL f as
                                             return $ seal (b:>:bs)

> The next one I see is in Optimize.lhs:
> choose_order :: forall p C(z). RepoPatch p => PatchSet p C(z) -> PatchSet p C(z)
> choose_order ps = case last_tag of
>  Sealed NilRL -> ps
>  Sealed (lt:<:_) ->
>      co lt $ mapSeal slightly_optimize_patchset $ get_patches_in_tag
> (info lt) ps
>  where last_tag = dropWhileRL (not . is_tag . info) (concatRL ps)
>       dropWhileRL :: (FORALL(x y) a C(x y) -> Bool) -> RL a C(r v)
>                   -> Sealed (RL a C(r))
>       dropWhileRL _ NilRL = seal NilRL
>       dropWhileRL p xs@(x:<:xs')
>                   | p x       = dropWhileRL p xs'
>                   | otherwise = seal xs
>       co :: (PatchInfoAnd p) C(a b) -> SealedPatchSet p -> PatchSet p C(z)
>       co lt' (Sealed ((t:<:NilRL):<:pps)) =
>            case get_patches_beyond_tag lt' ps of
>             -- This unsafeCoerceP is safe because t = lt'.  We would use
>             -- (=\/=) to prove it, except that get_patches_in_tag must first
>             -- be fixed to return a meaniful context.
>             p :<: NilRL -> (p+<+((unsafeCoerceP t):<:NilRL)) :<: pps
>             _ -> impossible
>       co _ _ = impossible

Part of the problem may relate get_patches_beyond_tag has a wrong
type.  Its type (and in future, when introducing new functions, please
try to give them new names) is only right if you've already called it
once to find out the type of the tag in its minimal context.

> My comment is talking about get_patches_in_tag which is passed a
> PatchInfo.  I looked at refactoring get_patches_in_tag but it was
> quite daunting.  It's a 40 line Haskell function that seems to be
> quite related to get_patches_beyond_tag.  I think, that we should
> probably have something like,
>
> get_patches_around_tag :: RepoPatch p => PatchInfo -> PatchSet p C(z)
> -> (RL (RL (PatchInfoAnd p)) :> (PatchInfoAnd p) :> RL (RL (PatchInfoAnd p))) C(() z)

Yes, something like this would definitely be the way to go.

> Actually, I think that middle element needs to have the same context
> as the PatchInfo, so:
> get_patches_around_tag :: RepoPatch p => PatchInfoAnd p C(x y) ->
> PatchSet p C(z) -> (RL (RL (PatchInfoAnd p)) C(() x),  (PatchInfoAnd p) C(x y),  RL (RL (PatchInfoAnd p)) C(y z))

No, this would be wrong, for the same reason that
get_patches_beyond_tag is wrong, which is that there's no guarantee
that the tag input is in its canonical order (i.e. that there aren't
any patches preceding it that are not in the tag).

> Actually, since the first param should be a tag, it could be
> PatchInfoAnd p C(x x), but I think that would require a proof every
> time we pass something to the above function which may get tiresome
> without adding any real safety.  This is another possibility:
>
> get_patches_around_tag :: RepoPatch p => PatchInfoAnd p C(x y) ->
> PatchSet p C(z) -> (PatchInfoAnd p C(x x), RL (RL (PatchInfoAnd p)) :>
> RL (RL (PatchInfoAnd p))) C(() z)
>
> But, ultimately, I've didn't do this yet because get_patches_in_tag
> didn't explain itself enough to me and I just didn't feel confident
> with which types it should have.  Does it really return all the
> patches that the tag depends on or not?

Yes, it returns the patches that the tag depends on.

> How is the unsafeCoerceP in Put.lhs avoidable?  patchset == patchset2,
> but due to the unsealing their ending contexts are considered
> different by the type system.  The only way to fix it is by updating
> this function:
> get_one_patchset :: RepoPatch p => Repository p C(r u t) ->
> [DarcsFlag] -> IO (SealedPatchSet p)
> get_one_patchset repository fs =
>    case nonrange_matcher fs of
>        Just m -> do ps <- read_repo repository
>                     if nonrange_matcher_is_tag fs
>                        then return $ get_matching_tag m ps
>                        else return $ match_a_patchset m ps
>        Nothing -> (seal . scan_context) `liftM` mmapFilePS
> (toFilePath $ context_f fs)
>    where context_f [] = bug "Couldn't match_nonrange_patchset"
>          context_f (Context f:_) = f
>          context_f (_:xs) = context_f xs
>
> That function seems to call several Sealed returning functions.  What
> is the solution here?  Obviously the unsafeCoerceP is currently quite
> safe and is the easiest solution.

This one is tricky.  The canonical answer (which is wrong in this
case) would be to avoid computing the patchset twice.  The type
witnesses tend to fail when we do redundant work, which is often a
good thing, since it means that the type system helps us to catch
inefficiencies.

In this case, the inefficiency is very intentional, however, in order
to avoid a memory leak.  So the simple solution of eliminating
patchset2 and replacing it with patchset, which would satisfy the type
checker and save on disk IO is a very bad idea.

I think the best solution may be one that is designed for precisely
this problem, something like:

data SealedDuple a where
    SealedDuple :: a C(x) -> a C(x) -> SealedDuple a

moderatelySafePerformIOtwice :: IO (Sealed p) -> IO (DupSealed p)
moderatelySafePerformIOtwice j = do Sealed x <- j
                                    Sealed y <- j
                                    return $ DupSealed x (unsafeCoerceP y)

Where the names here are very stupid, but I'm not sure the idea is so
very bad.  It's pretty ugly, since it is only correct if the input
function makes no changes.  So a better type might be something more
like:

moderatelySafePerformRIOtwice :: RIO C(r u t t) (Sealed p)
                              -> RIO C(r u t t) (DupSealed p)
moderatelySafePerformRIOtwice j = do Sealed x <- j
                                     Sealed y <- j
                                     return $ DupSealed x (unsafeCoerceP y)

which at least ensures that its argument doesn't modify the tentative
state.

Another not unrelated option which is less scary would be to introduce
a new

data ManyIdenticalSealed a where
  ManyIdenticalSealed :: a C(x) -> [a C(x)] -> ManyIdenticalSealed a

and then we could gradually switch functions to return ManySealed
instead of Sealed.  With care, we might be able to ensure that all the
objects inside a ManyIdenticalSealed are actually identical.

e.g. we could write the safe function

sealMany :: a C(x) -> ManyIdenticalSealed a
sealMany x = ManyIdenticalSealed x (repeat x)

and

data ManyIdentical a = ManyIdentical a [a] -- hide this constructor

manyIdentical :: a -> ManyIdentical a
manyIdentical x = ManyIdentical x (repeat x)
unsafeManyIdentical :: [a] -> ManyIdentical a
unsafeManyIdentical (x:xs) = ManyIdentical x xs

mapManySealed :: (a -> Sealed b)
              -> ManyIdentical a -> ManyIdenticalSealed b

readRepositoryMany :: RIO C(r u t t) (ManyIdentical (PatchSet C(r))

etc.  But I'm just brainstorming at the moment.

This is a tricky problem, or at least it seems tricky to find an
elegant and general solution.

> I was able to remove the unsafeCoerceP from Remove.lhs.

I don't see this unsafeCoerceP...

David