[darcs-users] darcs patch: Is this Monadish?

David Roundy droundy at darcs.net
Thu Aug 28 18:22:34 UTC 2008


On Thu, Aug 28, 2008 at 10:30:03AM -0700, Jason Dagit wrote:
> > Something like this (it's equivalent to a Reader Monad, but it's nicer
> > in my opinion to avoid those nasty transformers, and make behavior
> > explicit.  Monads aren't actually hard to define.
>
> Er.  Transformers are nasty?  In my experience, newtype deriving
> plus transformers = modern Haskell.  I've used it to great success
> in some of my side projects.  Sure, defining your own instances is
> easy, but why do something the compiler can do for you?  Do you only
> refer to it as nasty because you want to do some type hackery?

I just don't like modern Haskell, that's all.  It's nasty.  Maybe it's
my bias in favor of portable, comprehensible code.  In this case,
using newtype deriving, you got a Monad instance that was wrong, and
it was very hard to see from your code that it was wrong, since you
can't see what is being derived.

> > data RIO p C(r u t t') a = UnsafeRIO (Repository p C(r u t) -> IO a)
> >
> > (of course you won't be able to use t' because of the C
> > preprocessor, but I'll use it anyhow in this email)
>
> So, t' is a phantom-phantom type?

It's an ordinary phantom type, just like any other phantom type.

> > The type here reflects the fact that the RIO (stands for Repository
> > IO) can only modify the tentative state.  That's how we manage
> > atomicity in darcs, is by setting up a "tentative" state and then
> > calling finalizeRepositoryChanges.  Of course, this constructor cannot
> > be safely exported, which is why it's marked UnsafeRIO.
>
> Hmm...I think you must use "tentative state" to refer to the tentative
> *recorded* state.  Over the history of our discussions this has been a
> big source of confusion for me.  For a long time I thought it was the
> pending, then when I found it didn't correspond to the pending, I
> couldn't figure out how to use it consistently.  Then I noticed we
> have several tentative "things", such as the tentative pending.
> Hence, why I think it's important to be exact here and give "tentative
> state" a clear definition, otherwise I could waste even more time
> being clueless.

Pending is not a state, it's a patch.  It describes a change that
starts at the recorded state, and goes part way to the unrecorded
state, with the rest of that change being something we can determine
by diffing the working directory.  Since pending starts at the
recorded state, when you change the recorded state, you need to change
pending at the same time, or you've got a corrupt repository.

The tentative state is the state that will be the new recorded state
when finalizeRepositoryChanges is run.

> > First off, we should change the type of
> >
> > withRepository :: [DarcsFlag] -> (forall p C(r u). RepoPatch p
> >               => RIO p C(r u r r) a -> IO a
> >
> > to reflect the fact that we can't modify repositories unless we've got
> > a lock, and the fact that the tentative state always starts out as the
> > recorded state.
> >
> > We can now modify withRepoLock to be of a type something like
> >
> > withRepoLock :: [DarcsFlag] -> (forall p C(r u). RepoPatch p
> >             => RIO p C(r u r t) a -> IO a
> >
> > Where the type system will now allow us to make modifications to the
> > repository.  Note that we could do better than this, and could ensure
> > that any modifications are protected by withGutsOf, but that can be a
> > later refactor.
>
> Hmm...I don't see (yet) how this is helpful to solving our real
> problems.  I agree that it's good that withRepository documents that
> the tentative recorded state starts out equal to the recorded state.

It seems to me that our real problem is that in order for type
witnesses to ensure that IO operations on repositories are performed
in a safe manner, and that means ensuring that correct code can
compile.  Since correct code cannot possibly compile unless we include
the proof that the initial tentative state is the same as the recorded
state, this seems critical to me.  The fact that my withRepository
type also ensures that we cannot modify a repository without taking a
lock on it is also a Good Thing, but is considerably less critical.

But perhaps there is some real problem that I'm not aware of?

> > So now we need our monad-like functions (and I agree that we don't
> > need or want a class for these).
> >
> > unsafeUnRIO :: RIO p C(r u t t') a -> Repository C(r t u) -> IO a
> > unsafeUnRIO (RIO f) = f
> >
> > This is your lowerIORepo function, but marked (properly) as unsafe.
> >
> > (>>>=) :: RIO p C(r u t t') a -> (a -> RIO p C(r u t' t'') b)
> >       -> RIO p C(r u t t'') b
> > (RIO f) >>>= g = RIO (\ (Repository ...) ->
> >                      do a <- f (Repository ...)
> >                         unsafeUnRIO (g a) (Repository ...))
> >
> > (>>>) :: RIO p C(r u t t') a -> RIO p C(r u t' t'') b
> >      -> RIO p C(r u t t'') b
> > (RIO f) >>> (RIO g) = RIO $ \ (Repository ...) ->
> >                            do f (Repository ...)
> >                               g (Repository ...)
>
> And this one I think can just be defined by (>>>=) like in the Monad
> class.  Is there a reason to prefer writing it not in terms of (>>>=)?

No, there's not a strong reason, although I don't see that you gain
much.  It would depend on whether the compiler is able to do any
tricky optimizations with the (>>) operator in the IO monad.

> > We'd also want an
> >
> > instance Functor (RIO p C(r u t t')) where
> >  fmap f (RIO g) = RIO (\repo -> fmap f (g repo))
>
> So you're not deriving here so that t' on the RHS is different than on
> the LHS?  As I explain below in the (>>=) case, I think the type
> signature of fmap forces t' to be equal on both sides.  In which case,
> if we allow this I think we can just derive it.

Yes, this could be derived (unlike the Monad case).

> > And as you suggest, an
> >
> > instance Monad (RIO p C(r u t t)) where
> >  (>>=) = (>>>=)
> >  (>>) = (>>>)
> >  return = returnR
> >  fail = io fail
> >
> > But we definitely don't want this monad instance derived from newtype
> > deriving, because that would break the type system (since you could
> > sequence two (RIO p C(r u t t')) actions to get a single (RIO p C(r u
> > t t') action, which strongly violates the type witnesses.  By writing
> > the Monad instance using the safe RIO operators (>>>=) and (>>>), we
> > ensure that this instance is safe.
>
> Hmm.  I thought the plan was that if it modifies the repository we
> make it an RIO action instead of an IO action.  So then using the
> Monad instance is for when you do IO but you're not modifying the
> repository in a way we *choose* to track.  Here I'm planning to work
> incrementally.  We need a strategy that allows us to update the code
> in pieces, right?

Yes, updating the code in pieces is by far the best approach.  But I'd
say that the unconverted code shouldn't be using RIO at all.  We can
leave the existing IO functions in place as unsafeTentatively... with
a simple rename, and then develop new and truly safe RIO functions.

But the Monad instance could be quite useful for safe functions that
happen to have no effect.  e.g. we could (not necessarily should) put
a read_repo variant (probably renamed to readRepo to match the newer
naming scheme) that is in the RIO, which would be

readRepo :: RIO p C(r u t t) (PatchSet C(r))

This function could be use in the RIO monad safely, because it is
known to have no effects.

> Also, I'm not sure the above type checks in a meaningful way, for example,
>
> (>>=) :: (Monad m) => m a -> (a -> m b) -> m b
>
> Abusing the notation so I can be explicit:
> (>>>=) :: (Monad (RIO p C(r u t t)) = m, Monad (RIO p C(r u t t')) =
> m', Monad (RIO p C(r u t' t'')) = m'')
> => m a -> (a -> m' b) -> m'' b
>
> So for that to unify with the type of (>>=) it would require t = t' =
> t''.  Isn't that the very case you're trying to avoid?  If not, do you
> think you could help me understand the difference?

That is exactlythe type that I specified.  Note:

> > instance Monad (RIO p C(r u t t)) where

there is no t' or t'' here, and there must not be.  The Monad only
allows actions that don't change state.

> Maybe that's what you mean in the next paragraph (but I don't think so...)?
>
> > I'm a bit afraid this monad instance won't work after all, since I
> > don't think that we can define an instance with 't t' repeated like
> > this.  :(
>
> I'm not following you.

See below about FlexibleInstances.

> > In which case we may need to switch to a different approach that I
> > wrote about and then deleted (when I saw your idea of a Monad
> > instance).
>
> That sounds like Fermat's last theorem.  Email needs wider margins I
> suppose :)

Sorry, I wrote a long description and then edited it away, and then
realized it might be useful after all.  :(

> > A little checking shows that instances like
> >
> > instance Monad (RIO p C(r u t t))
> >
> > are allowed if we turn on FlexibleInstances.  I'm not sure how safe
> > that is, though.
>
> I don't know the ramifications either.  Does this get around t = t' =
> t'' that I mentioned above?

That's the point.  We can only have a safe Monad instance for t = t',
which requires FlexibleInstances.

> > Anyhow, these are my thoughts.  The backup plan--if we can't define a
> > safe Monad instance--would be to use RIO only in the bits of the code
> > that modify the repository, in which case withRepository and
> > withRepoLock would both look like:
> >
> > withRepository :: [DarcsFlag] -> (forall p C(r u). RepoPatch p
> >               => Repository p C(r u r) -> IO a) -> IO a
> >
> > which itself is an improvement over the current type (without changing
> > anything else), and we'd just change withGutsOf to run RIO:
> >
> > withGutsOf :: Repository p C(r u r) -> RIO p C(r u r t)
> >           -> IO (Repository p C(t u t)
> >
> > and then we'd be safe so long as withGutsOf is only called once (which
> > we could check in haskell_policy.sh).  This lesser approach might be a
> > reasonable stepping stone, as it'd give us almost-complete safety, and
> > would involve modifying far less code.  Oh, and note that withGutsOf
> > would ideally to be modified to call finalizeRepositoryChanges.  In
> > this latter approach, only repository-modifying command would need to
> > be moved into the RIO.  This has an additional advantage in that it
> > would help motivate us to keep the repo-modifying commands together in
> > a tight sequence--which is important on darcs-1 repositories, where we
> > don't have atomicity, and want to keep the critical section of the
> > code (where a crash would corrupt the repository) as small as
> > possible.
>
> The problem for me is that I don't see how to apply this approach.  I
> thought we were trying to fix things like tentativelyMergePatches.
> But, here we're now concerned with withGutsOf.

withGutsOf brackets all the functions like tentativelyMergePatches.
Or rather, it currently brackets all such cases that are dangerous on
darcs-1 repositories (amend, obliterate, optimize --reorder).  I'm
proposing that we could extend it to bracket all uses of tentative
functions, which would enable us to move those functions into RIO,
without touching the rest of the code.

> Even if we fix withGutsOf to export a safe interface, what would be
> the RIO (or IORepo) type of tentativelyMergePatches?  The entire
> reason I went down this road again, was to make it so that we can
> continue with the other type witness work.  Without tracking the state
> manipulations I see no reason to resend that big patch bundle that
> tries to make the commands safer.  Without RIO we either have "buggy"
> type signatures in Internal or bogus unsafeCoercePs in the commands.

This is the point, in that it would allow us to write
tentativelyMergePatches and all its tentative friends with a safe
interface.

> From my point of view that means:
> 1) the RIO approach we take needs to have a way to address
> tentativelyMergePatches
> 2) we need an incremental transition plan like C()
> 3) it has to make sense to me
>
> IORepo had #3 more or less, and I think by using the monad instance
> (modulo safety) we could achieve #2.  Then again, I don't understand
> how RIO's monad instance is more or less safe than one for IORepo.

RIO's monad instance is the following

instance Monad (RIO p C(r u t t))

where your IORepo instance, as far as I can tell, is like

instance Monad (RIO p C(r u t t'))

which is unsafe and wrong.

> With RIO, I almost have #3 but I don't see how #2 is possible.

We could easily leave both functions in place.  So we'd have

unsafeTentativelyMergePatches :: ... -> IO (Sealed (FL Prim C(u)))

and

tentativelyMergePatches :: ... -> RIO (Sealed (FL Prim C(u)))

and most likely tentativelyMergePatches would be implemented in using
unsafeTentativelyMergePatches.

> I don't have #1 for either approach which is a show stopper both for
> type witnessing the commands and implementing RIO.

Why is tentativelyMergePatches particularly problematic?

David


More information about the darcs-users mailing list