[darcs-devel] [issue318] Crash on bogus input

Thu Aug 5 16:27:22 UTC 2010

Jason Dagit <aoeu> added the comment:

>
> I'll address one specific point quickly, as I want to take time to reply
> to the rest properly.
>
> On Wed, 4 Aug 2010, Jason Dagit wrote:
>
> > So, if I implement the above space leak fix, do you plan to accept the
> > patches?  I ask because if you are not planning to accept them I won't
> > bother with the fix :)
>
> I think there are two issues to consider here: whether your use of try is
> ok, and whether it's a good idea to expose try in our parser code at all,
> for fear that someone will come along later and make seemingly innocuous
> use of it causing a serious problem.
>
> I've actually realized now that my definition of mplus has a 'try' baked
right in.  That is, it's like a two parameter version of try.  Maybe that's
what you were getting at when you said, "try foo <|> return Nothing = foo".
 So while that statement isn't completely accurate as I explained
previously, it does capture how mplus and try are similar.  The parser ended
up this way because our parsers are of the form ByteString -> Maybe (a,
ByteString).  In the failure case, you can't know how much string was
consumed before the failure.  I think it would have to be ByteString ->
(Maybe a, ByteString) for try and mplus to be significantly different.  The
immediate result, is that I can stop exposing try.  I don't know if that
version of my parser will be compatible with other parser libraries (which
was one of the initial motivating points of this refactor).  I assume that
in parsec/attoparsec that try and mplus are significantly different.  I
would need a chance to read the source code to determine if this is true.

My benchmarks suggest that mplus is not introducing a space leak that is
detectable in normal usage patterns, eg., by darcs-benchmark:
http://wiki.darcs.net/Benchmarks/ParserRefactor

I deleted the old suspect benchmarks and all that remains is my benchmark on
a ramdisk.

As I have time I'll do a benchmark with trys in there.

>
> IMO the latter point is one that should be discussed a bit more widely. My
> general impression - and I'm by no means an expert - about the existence
> of try in parser libraries is that it can be quite problematic. Here it
> would be changing the interface of the parsing library away from one which
> pretty much enforces a single pass scan. On the other hand, the benefits
> in compositionality are nice, as we can see from your patches. So I'm
> torn, but I think other people should get involved in the decision, and
> that we should perhaps investigate what parser libraries out there offer
> it.
>
> While peekInput allows us to write code that does not leak, it also makes
it really easy to write leaky code.  All you have to do is peek and then
look at the result later after applying some other parsers and you have a
leak.  Your objection to try seems to be that try is doing this for you in a
controlled way (really I mean encapsulated way).  So, try automatically
creates a leak, but the scope and size of the leak is also more predictable
because it's only the scope of the try.  The leak also goes away once the
parser finishes.  On the other hand, with try the leak is inescapable.

I would consider alterInput to be unsafe because it allows arbitrary updates
to the parser's state.  I would call that a leaky abstraction or abstraction
violation.  My software engineering sensibilities tell me it's evil.

In summary:
  * the leaks created by try/mplus seem to be unavoidable as long as we
expose them
  * the leaks created by mplus do not seem to be noticeable, probably
because they are short lived in our parser
  * alterInput is evil
  * peekInput allows code that is even leakier than try

I would say that, as long as darcs isn't using more memory or being slower
that the new parser API is an improvement.

As for other libraries, these are the ones I know of:
  * parsec2 and parsec3
  * parsimony
  * attoparsec
  * uu parser lib (I don't think we can use this based on a comment by the
author on Haskell-Cafe)

Happy and alex could potentially apply, but it seems that our consumption of
whitespace and our token sets seem to vary depending on where we are in the
input stream.  We may also lose control over knowing if we have back
tracking in our parser.

One of the things I wanted to accomplish by refactoring the parser is the
ability when parsing from files to track how much of the input has been
consumed and be able to later re-open the file and seek back to that point
in the stream and start reading again from there.  Granted, I don't know
which of the above parser libraries actually support this.  I know
attoparsec does not support it.  I may need to implement that in the parser
we use instead of using an existing library.  Implementing it is easier if
alterInput doesn't exist.  I should say, with uncontrolled use of alterInput
I don't know how I would implement it.  I suspect it wouldn't be possible.

Jason

----------
nosy: +dagit
status: resolved -> unknown

__________________________________
Darcs bug tracker <bugs at darcs.net>
<http://bugs.darcs.net/issue318>
__________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/darcs-devel/attachments/20100805/693c8a42/attachment.html>