[darcs-users] darcs patch: Is this Monadish?

Thu Aug 28 17:30:03 UTC 2008

On Thu, Aug 28, 2008 at 7:58 AM, David Roundy <droundy at darcs.net> wrote:
> On Wed, Aug 27, 2008 at 09:17:36PM -0700, Jason Dagit wrote:
>> David,
>>
>> Is this a type safe Monadish that you would accept?
>
> It's very close, but it seems to be considerably more invasive than
> I'd hope we could achieve.  In particular, I don't see that we need to
> add any new types other than an RIO (or IORepo, but I'd rather keep it
> short) type.
>
>> Note that RepoInState is broken in the same way that Context was and maybeToFL
>> is.  But, what else can we do?  I don't have any other bright ideas other
>> than maybe you'll feel like we don't need it at all?  As in, we could make it:
>> IORepo p r u tr a
>
> You're right that I don't see any reason why we'd need it.
>
>> and drop RepoInState like a bad habbit.  I think that presents new problems
>> in the type of (>>>=).
>>
>> I can think far enough ahead to see how we'll instantiate the RepoInStates,
>> but maybe we could export,
>> newRepoState :: Sealed3 RepoInState
>> newRepoState = Sealed3 RepoInState
>>
>> And then let it trickle down from pattern matches?  Of course, we probably
>> also want to add, unsafeCoerceR :: r x y z -> r a b c
>> In the event that we need to coerce a repository or a RepoInState.
>>
>> Probably one function that is missing is
>> lowerIORepo :: IORepo p i j a -> IO a
>>
>> I also may have export evil goblins from Repositoy.InternalTypes.  It's even
>> possible that by using newtype deriving that I've open backed doors.  Perhaps
>> there is a way to derive lowerIORepo already?  Deriving MonadIO is
>> nice because it lets us use liftIO (check the patch for why that's nice).
>> Deriving Monad is nice because when then as long as the type witnesses are
>> not changing we can just use the monad instance.
>
> I'd definitely prefer not to use newtype deriving, and don't see any
> reason to do so.  I don't see why MonadIO is nice, I'm afraid.  Is it
> any different from the one-liner io function I define below?

It's nice precisely for io/liftIO.

> Something like this (it's equivalent to a Reader Monad, but it's nicer
> in my opinion to avoid those nasty transformers, and make behavior
> explicit.  Monads aren't actually hard to define.

Er.  Transformers are nasty?  In my experience, newtype deriving plus
transformers = modern Haskell.  I've used it to great success in some
of my side projects.  Sure, defining your own instances is easy, but
why do something the compiler can do for you?  Do you only refer to it
as nasty because you want to do some type hackery?

> data RIO p C(r u t t') a = UnsafeRIO (Repository p C(r u t) -> IO a)
>
> (of course you won't be able to use t' because of the C preprocessor,
> but I'll use it anyhow in this email)

So, t' is a phantom-phantom type?

> The type here reflects the fact that the RIO (stands for Repository
> IO) can only modify the tentative state.  That's how we manage
> atomicity in darcs, is by setting up a "tentative" state and then
> calling finalizeRepositoryChanges.  Of course, this constructor cannot
> be safely exported, which is why it's marked UnsafeRIO.

Hmm...I think you must use "tentative state" to refer to the tentative
*recorded* state.  Over the history of our discussions this has been a
big source of confusion for me.  For a long time I thought it was the
pending, then when I found it didn't correspond to the pending, I
couldn't figure out how to use it consistently.  Then I noticed we
have several tentative "things", such as the tentative pending.
Hence, why I think it's important to be exact here and give "tentative
state" a clear definition, otherwise I could waste even more time
being clueless.

> First off, we should change the type of
>
> withRepository :: [DarcsFlag] -> (forall p C(r u). RepoPatch p
>               => RIO p C(r u r r) a -> IO a
>
> to reflect the fact that we can't modify repositories unless we've got
> a lock, and the fact that the tentative state always starts out as the
> recorded state.
>
> We can now modify withRepoLock to be of a type something like
>
> withRepoLock :: [DarcsFlag] -> (forall p C(r u). RepoPatch p
>             => RIO p C(r u r t) a -> IO a
>
> Where the type system will now allow us to make modifications to the
> repository.  Note that we could do better than this, and could ensure
> that any modifications are protected by withGutsOf, but that can be a
> later refactor.

Hmm...I don't see (yet) how this is helpful to solving our real
problems.  I agree that it's good that withRepository documents that
the tentative recorded state starts out equal to the recorded state.

> So now we need our monad-like functions (and I agree that we don't
> need or want a class for these).
>
> unsafeUnRIO :: RIO p C(r u t t') a -> Repository C(r t u) -> IO a
> unsafeUnRIO (RIO f) = f
>
> This is your lowerIORepo function, but marked (properly) as unsafe.
>
> (>>>=) :: RIO p C(r u t t') a -> (a -> RIO p C(r u t' t'') b)
>       -> RIO p C(r u t t'') b
> (RIO f) >>>= g = RIO (\ (Repository ...) ->
>                      do a <- f (Repository ...)
>                         unsafeUnRIO (g a) (Repository ...))
>
> (>>>) :: RIO p C(r u t t') a -> RIO p C(r u t' t'') b
>      -> RIO p C(r u t t'') b
> (RIO f) >>> (RIO g) = RIO $ \ (Repository ...) ->
>                            do f (Repository ...)
>                               g (Repository ...)

And this one I think can just be defined by (>>>=) like in the Monad
class.  Is there a reason to prefer writing it not in terms of (>>>=)?

> returnR :: a -> RIO p C(r u t t) a
> returnR = io . return
>
> io :: IO a -> RIO p C(r u t t) a
> io = RIO . const
>
> (You could also use the MonadIO class, I suppose.)

Yeah, io/liftIO are handy for integration with existing IO actions.

> Note that these definitions themselves are entirely un-typechecked,
> but we can audit them pretty carefully--since they're pretty small.
> Also note that the type definitions of RIO functions can only be fully
> checked if they do not use the RIO constructor.  So maybe we should
> rename the RIO constructor as UnsafeRIO.  Any primitives that use that
> constructor need to be carefully audited.

Right, that's typical of one-way monads.

> We'd also want an
>
> instance Functor (RIO p C(r u t t')) where
>  fmap f (RIO g) = RIO (\repo -> fmap f (g repo))

So you're not deriving here so that t' on the RHS is different than on
the LHS?  As I explain below in the (>>=) case, I think the type
signature of fmap forces t' to be equal on both sides.  In which case,
if we allow this I think we can just derive it.

> And as you suggest, an
>
> instance Monad (RIO p C(r u t t)) where
>  (>>=) = (>>>=)
>  (>>) = (>>>)
>  return = returnR
>  fail = io fail
>
> But we definitely don't want this monad instance derived from newtype
> deriving, because that would break the type system (since you could
> sequence two (RIO p C(r u t t')) actions to get a single (RIO p C(r u
> t t') action, which strongly violates the type witnesses.  By writing
> the Monad instance using the safe RIO operators (>>>=) and (>>>), we
> ensure that this instance is safe.

Hmm.  I thought the plan was that if it modifies the repository we
make it an RIO action instead of an IO action.  So then using the
Monad instance is for when you do IO but you're not modifying the
repository in a way we *choose* to track.  Here I'm planning to work
incrementally.  We need a strategy that allows us to update the code
in pieces, right?

Also, I'm not sure the above type checks in a meaningful way, for example,

(>>=) :: (Monad m) => m a -> (a -> m b) -> m b

Abusing the notation so I can be explicit:
(>>>=) :: (Monad (RIO p C(r u t t)) = m, Monad (RIO p C(r u t t')) =
m', Monad (RIO p C(r u t' t'')) = m'')
=> m a -> (a -> m' b) -> m'' b

So for that to unify with the type of (>>=) it would require t = t' =
t''.  Isn't that the very case you're trying to avoid?  If not, do you
think you could help me understand the difference?

Maybe that's what you mean in the next paragraph (but I don't think so...)?

> I'm a bit afraid this monad instance won't work after all, since I
> don't think that we can define an instance with 't t' repeated like
> this.  :(

I'm not following you.

> In which case we may need to switch to a different approach
> that I wrote about and then deleted (when I saw your idea of a Monad
> instance).

That sounds like Fermat's last theorem.  Email needs wider margins I suppose :)

> A little checking shows that instances like
>
> instance Monad (RIO p C(r u t t))
>
> are allowed if we turn on FlexibleInstances.  I'm not sure how safe
> that is, though.

I don't know the ramifications either.  Does this get around t = t' =
t'' that I mentioned above?

> Anyhow, these are my thoughts.  The backup plan--if we can't define a
> safe Monad instance--would be to use RIO only in the bits of the code
> that modify the repository, in which case withRepository and
> withRepoLock would both look like:
>
> withRepository :: [DarcsFlag] -> (forall p C(r u). RepoPatch p
>               => Repository p C(r u r) -> IO a) -> IO a
>
> which itself is an improvement over the current type (without changing
> anything else), and we'd just change withGutsOf to run RIO:
>
> withGutsOf :: Repository p C(r u r) -> RIO p C(r u r t)
>           -> IO (Repository p C(t u t)
>
> and then we'd be safe so long as withGutsOf is only called once (which
> we could check in haskell_policy.sh).  This lesser approach might be a
> reasonable stepping stone, as it'd give us almost-complete safety, and
> would involve modifying far less code.  Oh, and note that withGutsOf
> would ideally to be modified to call finalizeRepositoryChanges.  In
> this latter approach, only repository-modifying command would need to
> be moved into the RIO.  This has an additional advantage in that it
> would help motivate us to keep the repo-modifying commands together in
> a tight sequence--which is important on darcs-1 repositories, where we
> don't have atomicity, and want to keep the critical section of the
> code (where a crash would corrupt the repository) as small as
> possible.

The problem for me is that I don't see how to apply this approach.  I
thought we were trying to fix things like tentativelyMergePatches.
But, here we're now concerned with withGutsOf.

Even if we fix withGutsOf to export a safe interface, what would be
the RIO (or IORepo) type of tentativelyMergePatches?  The entire
reason I went down this road again, was to make it so that we can
continue with the other type witness work.  Without tracking the state
manipulations I see no reason to resend that big patch bundle that
tries to make the commands safer.  Without RIO we either have "buggy"
type signatures in Internal or bogus unsafeCoercePs in the commands.

>From my point of view that means:
1) the RIO approach we take needs to have a way to address
tentativelyMergePatches
2) we need an incremental transition plan like C()
3) it has to make sense to me

IORepo had #3 more or less, and I think by using the monad instance
(modulo safety) we could achieve #2.  Then again, I don't understand
how RIO's monad instance is more or less safe than one for IORepo.

With RIO, I almost have #3 but I don't see how #2 is possible.

I don't have #1 for either approach which is a show stopper both for
type witnessing the commands and implementing RIO.

thanks,
Jason