[darcs-users] darcs patch: optimize firstspace/firstnonspace to remove boxed retu...

Ian Lynagh igloo at earth.li
Fri Oct 2 17:18:27 UTC 2009


On Fri, Oct 02, 2009 at 12:31:32PM +0100, Eric Kow wrote:
> 
> Ian: do you think could help me understand some more?  Are your objections
> about riskiness (as in 1, 2, or other), or more about long term
> maintainability?

Maintainability. I have no reason to think the patch actually introduces
any bugs.

> As I understand it, the discussion on IRC came to suggest that a lot of these
> performance issues are just GHC problems and should be fixed there, rather than
> in Darcs

Rather, *if* they are GHC performance-of-generated-code bugs, then
rather than making the code less readable I would advocate:
* Definitely make a small, self-contained test case and file it as a GHC
  ticket, so it can be fixed.
* Living with darcs being a few percent slower for a year rather than
  obfuscating the code.
  
In general, in my opinion, small constant factor improvements are not
very important (I don't think it's important if darcs takes 9M or 10M
RAM, 9G or 10G RAM, 0.09s or 0.1s, 9 mins or 10 mins), so are outweighed
by almost any downside. Of course, all else being equal, a constant
factor better is better.

Personally, I wouldn't accept any code you wouldn't be proud to print
out and hang on your wall unless there's a very good reason. But that's
just my opinion.

If it is the difference between constant memory use, and memory use
linear in the total pristine size, then more drastic steps are
justifiable.

> For comparison, here's that new code again.
> 
>   firstspace :: Ptr Word8 -> Int -> Int -> Int
>   firstspace (Ptr p) (I# n) (I# m) = I# (first p n m)
>     where
>     first :: Addr# -> Int# -> Int# -> Int#
>     first addr n' m'
>         | n' >=# m' = n'
>         | otherwise = if isSpaceWord8 ch
>                         then n'
>                         else first addr (n' +# 1#) m'
>       where
>       ch = indexWord8OffAddr# addr n'

A side point is that the addr being passed around here is just p.

> > +                   ghc-prim
> 
> Ian again: why is it objectionable to be using GHC primitives?  Is it because
> of the likelihood of instability (things shifting under our feet?).  Or is it
> just bad form to be doing things at this low a level?

Both.

> > -isSpaceWord8 :: Word8 -> Bool
> > +isSpaceWord8 :: Word# -> Bool
> >  isSpaceWord8 w =
> > hunk ./src/ByteStringUtils.hs 139
> > -    w == 0x20 ||    -- ' '
> > -    w == 0x09 ||    -- '\t'
> > -    w == 0x0A ||    -- '\n'
> > -    w == 0x0D       -- '\r'
> > +    i ==# 0x20# ||    -- ' '
> > +    i ==# 0x09# ||    -- '\t'
> > +    i ==# 0x0A# ||    -- '\n'
> > +    i ==# 0x0D#       -- '\r'
> > +  where
> > +  i = word2Int# w
> 
> I guess I understand this, just use raw machine types to do this comparison?
> What's the harm, really?

Can't you feel the hashes burning your eyes?
The manual unboxedness devouring your soul?

Up to you, anyway  :-)

> > hunk ./src/ByteStringUtils.hs 147
> > -firstnonspace :: Ptr Word8 -> Int -> Int -> IO Int
> > +firstnonspace :: Ptr Word8 -> Int -> Int -> Int
> 
> In my naive, purely high-level mental model of Haskell programming, when I see
> 'Ptr', I think "ooh, deep down murky machine things" which makes in turn think
> "err, shouldn't that be in IO?" How does this work?  Am I to treat things like
> 'indexWord8OffAddr#' as having an implicit unsafePerformIO somewhere?

Yes, it would be good to see if "unsafePerformIO $ someIOPeekFunction ..."
gives the same benefit. And perhaps file a GHC bug if not.


Thanks
Ian



More information about the darcs-users mailing list