[darcs-users] [patch318] tune the patch parser (and 10 more)

Ganesh Sittampalam bugs at darcs.net
Tue Aug 3 20:46:18 UTC 2010


Ganesh Sittampalam <ganesh at earth.li> added the comment:

> Sun Jul 25 09:56:10 BST 2010  Jason Dagit <dagit at codersbase.com>
>   * tune the patch parser
>   This patch is based on a few observations:
>     * alterInput (const foo), completly replaces the parser's input
>     * after peekInput, the above trick avoids redundant processing
>     * best to leave a packed ByteString packed


BC.unpack xs == "<" vs BC.singleton '<' (new)

is the latter really better?  I'd like to see the generated 
code or get word from a ByteString expert.

You've removed initial calls to 'work myLex' in many places - isn't 
this behaviour changing? OTOH the tests pass...

the whole 'alterInput' thing feels messy. Since it's barely used
after the whole chain of patches I won't worry further about that.

> Sun Jul 25 13:06:51 BST 2010  Jason Dagit <dagit at codersbase.com>
>   * add utility functions to ReadMonads
>   The intention of these functions is that in the future we will be 
able
>   to use more conventional notation to express parsers in the darcs 
source.

Generally fine.

Are the rather ugly separate definitions of bindSM 
etc necessary for performance? If so it's a shame GHC can't 
handle this better. My uninformed speculation would be that the 
definitions (whether in the typeclass or not) would get inlined 
as small anyway.

I also have concerns about the existence of try - see below.

> Sun Jul 25 14:31:32 BST 2010  Jason Dagit <dagit at codersbase.com>
>   * refactor Read and ReadMonads to remove peekInput from Read

> -          Nothing -> do s <- peekInput
> -                        case myLex s of
> -                          Just (ps', r) | ps' == BC.pack "merger" ->
> -                                          alterInput (const r) >>
> -                                          (liftM (Just . seal) $ 
readMerger True)
> -                                        | ps' == BC.pack "regrem" ->
> -                                          alterInput (const r) >>
> -                                          (liftM (Just . seal) $ 
readMerger False)
> -                          _ -> liftM (fmap (mapSeal PP)) $ readPatch' 
want_eof
> +          Nothing -> choice [ liftM (Just . seal) $ try $ readMerger 
True
> +                            , liftM (Just . seal) $ try $ readMerger 
False
> +                            , liftM (fmap (mapSeal PP)) $ try $ 
readPatch' want_eof
> +                            , return Nothing ]

The final return Nothing and the try in the previous case seem 
to be new here; it's correct since try foo <|> return Nothing = 
foo, but seems gratuitous.

I like the fact that each parser piece is now self-contained 
(knows about its own intro text). However doesn't it lead to 
myLex being called repeatedly where previously it was only 
being called once or sometimes twice? We should be sure this is 
benchmarked adequately and not a problem in practice before we 
go down that route.

Also, doesn't the use of try introduce a memory leak, since the 
old string has to be held onto until we know whether the entire 
sub-parser succeeded? Previously we would commit once we'd 
checked the first token.

----------
status: needs-review -> review-in-progress

__________________________________
Darcs bug tracker <bugs at darcs.net>
<http://bugs.darcs.net/patch318>
__________________________________


More information about the darcs-users mailing list