[darcs-devel] [patch455] Generalize withSignalsBlocked and withGutsOf

Gabriel Kerneis bugs at darcs.net
Wed Jan 19 17:49:15 UTC 2011


Gabriel Kerneis <kerneis at pps.jussieu.fr> added the comment:

Eric,

Let me first state clearly what this patch is about:
- This patch is about getting a more general type for the
  withSignalsBlocked (and the related withGutsOf) function.

That's all.  In particular :
- This patch is *not* about changing the semantics, behaviour, or
  fixing a bug in this function.  
- This patch is also *not* about switching to the "mask" primitive of
  GHC 7 (although understanding what it does makes it obvious what
  should be changed to use mask).

(But, as it turns out, there is a bug in this patch. More on that below.)

Why?
----

- Having a more general type is a good thing because it means the
  function composes better: currently, you cannot run any function job
  through withSignalsBlocked and get a result back; the result is
  (arbitrarily, because of programming style) restricted to IO ().
- Focusing on this piece of code paves the road for the up-comming
  change to the mask primitive of GHC 7 (block/unblock are *deprecated*
  in GHC 7).

What is a signal?
-----------------

There is a lot to say about signals, I'll focus on what is relevant from
a Haskell point-of-view.  I might be wrong sometimes, but here is what I
understand about them anyway.

A signal is an asynchronous interrupt.  This means that you could be
interrupted anywhere in your code.

As a result, there are very few things you can do when a signal occurs.
You cannot, for instance, call functions because you have no guarantee
about the state of your stack, heap, and resources in general.  In fact,
what you do generally boils down to two strategies:
- print an error message and exit(), or
- set a flag variable to 1, and make sure that you check its value
  regularly to notice when signals happen.
(Of course, you must be very careful because it's easy to get caught in
a race-condition and other dark-corners of signal handling.)

Signals in Haskell
-------------------

I am not aware of GHC's internals, but I suspect it follows the second
strategy.  Anyway, as a Haskell programmer, you do not need to worry
about those gorry details: signals are exposed as some kind of
exceptions, and you can handle them as such.

Just like signals, those exceptions might be raised at any time.  They
are called "asynchronous":
http://www.haskell.org/ghc/docs/7.0-latest/html/libraries/base-4.3.0.0/Control-Exception.html#10

It is, of course, impossible to add exception handlers everywhere in
your code.  Instead, you chose one strategy out of two, as above:
- accept that signal might occurr and kill your application at any point
  (this is okay as long as do not perform critical operations), or
- ignore signals during your critical operations, and check afterwards
  which signals happened, and if you want to do something about them.

Ignoring signals is done with the "block" primitive:
   block job
will compute job, record any signal happening during the computation,
and raise the corresponding exceptions once job has returned.  You have
the guarantee that job has not been interrupted (hence no critical
operation is aborted too early), and you get either the result of job or
an asynchronous exception (that might be caught somewhere else).

What about withSignalsBlocked?
------------------------------

If you want to catch those signals, you do:
   (block job) `catch` ...
but this comes at a cost:  the two branches of "catch" must have the
same type, so what you return cannot depend on the result of job.

This is exactly what withSignalsBlocked does: it computes an IO (), and
then catches any signal and prints a (warning) message.  Then it goes on
(signals are ignored).

What about patch455?
--------------------

Now, if you want to catch signals, but return the result of job anyway,
you should do the following:
  block signals
  r <- job
  unblock signals and try to return r
  catch exceptions, if any, print a message and return r eventually

In real Haskell code, this is done with the "unblock" primitive:
  block (
  job >>= \r ->
  unblock (return r)
  `catch`
  (\signal -> hPutStrLn stderr "error" >> return r))

If some signals have been queued, they are raised when entering
"unblock".  Note that if a signal happens after entering unblock but
before returning r, it will also be catched by "catch".  On the other
hand, a signal cannot happen in the signal handler (the right hand-side
of `catch`) because there is an implicit "block" protecting it (added by
GHC itself).

End of the story.  Except there is a bug...

The bug (or why signals are tricky)
-----------------------------------

There is a bug in patch455.  It does follow the above scheme, but rather
the following one:

  block (
  job >>= \r ->
  unblock (return r
  `catch`
  (\signal -> hPutStrLn stderr "error" >> return r)))

Notice the slightly incorrect bracket after unblock...

What happens is that a signal might occurr after we enter unblock but
before we return r.  In that case, it will not be caught by
`catch`.  As Simon Marlow explained on IRC:
> yes that can leak async exceptions - the catch needs to fully surround
> the unblock

To summarize, patch455 is something like:
  unblock (catch (return r) handle_signals)
whereas you need:
  catch (unblock $ return r) handle_signals

I shall submit shortly an amended patch fixing this issue.

End of the story.  What follows is about what patch455 does not do.


The other potential bug (or why mask is better than unblock)
------------------------------------------------------------

Actually, patch455 does change the semantics of withSignalsBlocks.  This
has to do with why block/unblock have been deprecated, and the new
"mask" primitive.

Consider the following code, which does some critical work, and calls
withSignalsBlocked at some point:

  withSignalsBlocked $ 
    ...
    withSignalsBlocked job
    ...

(If you find this stupid, imagine withSignalsBlocked is not called
directly, but burried in a few function calls.)

What happens if a signal is raised in "job"?
- With the current version, it will be caught by the external
  withSignalsBlocked because signals are still blocked by the external
  block.
- After patch455, it will be caught by the inner withSignalsBlocked
  because we explicitely call "unblock", which unblocks every signal
  even if there is an enclosing call to "block"!.

This looks like a small change, not much of concern (we are just
changing the point where exceptions are caught).

Now assume that at some point in darcs' code you use unblock.  For
instance, assume that my bug went unnoticed: you might leak an
asynchronous exception from the inner withSignalsBlocked, thus breaking
your external critical computation.

To retain the semantics of withSignalsBlocked, and avoid issues with
leaking exceptions, you shall use mask instead of block/unblock
(introduced in GHC 7):

  mask $ \restore -> (
  job >>= \r ->
  restore (return r)
  `catch`
  (\signal -> hPutStrLn stderr "error" >> return r))

mask/restore is exactly like block/unblock except restore will not do
anything if signals were already blocked before entering block.

You must also be careful not to use unblock anywhere (otherwise there is
no point in using mask).

How to handle legacy GHC versions?
----------------------------------

It turns out that mask is a very simple wrapper around block/unblock.
We could therefore provide it in Workaround.hs for GHC < 7 (probably the
best thing to do).  Or implement the same logic directly in
withSignalsBlocked (not a good idea, IMHO).

http://www.haskell.org/ghc/docs/7.0-latest/html/libraries/base-4.3.0.0/src/GHC-IO.html#mask

Open questions
--------------

- Some functions are interruptible.  The documentation explains it, but
  I do not fully understand what it means. I/O functions seem to be
  interruptible, so this is probably something we should understand.
  There is uninterruptibleMask:
  http://www.haskell.org/ghc/docs/7.0-latest/html/libraries/base-4.3.0.0/Control-Exception.html#v:uninterruptibleMask
  which might be necessary (the doc says: THIS SHOULD BE USED WITH GREAT
  CARE).
- Shall we ignore signals (as we do currently) or abort once the
  sensitive operation has completed?

Best,
-- 
Gabriel

__________________________________
Darcs bug tracker <bugs at darcs.net>
<http://bugs.darcs.net/patch455>
__________________________________


More information about the darcs-devel mailing list