[darcs-users] darcs patch: configuring author spelling variations w... (and 3 more)

Eric Kow kowey at darcs.net
Sun Feb 8 22:43:38 UTC 2009


Thanks, Simon,

I've long been hoping that somebody would work on this.

Just some quick superficial comments...

Also, I'm assuming the goal is for list_authors to either disappear
or offload the bulk of its work to darcs show authors?  Why not just
go that route directly?

Thanks!

configuring author spelling variations was complicated, now easier
------------------------------------------------------------------
I see some entries with only one possible spelling.  Would you have to
update this (or the corresponding spellings file) every time somebody
new comes into the project?  If so, I think what would be nicer is if we
just passed along any variant that is not matched by anything.

canonical authors may be defined in an .authorspellings file
------------------------------------------------------------
> Simon Michael <simon at joyful.com>**20090207221321
>  Example:
>  
>  Joe Blogg <a at b.c>
>  -- authors containing d at e.f or d at g.h or matching just "sue" are Sue Bragg
>  Sue Bragg <d at e.f>, d at g.h, ^sue$
>  
> ] hunk ./src/list_authors.hs 26
> -import Data.List ( sort, group, isInfixOf )
> -import Data.Char ( toLower )
> +import Data.List ( sort, group, isInfixOf, isPrefixOf )
> +import Data.Char ( toLower, isSpace )
> hunk ./src/list_authors.hs 36
> -          mapM_ putStrLn $ sort_authors use_statistics $ mapRL (pi_author.info)
> -                         $ concatRL darcs_history
> +          spellings <- compiled_spellings
> +          mapM_ putStrLn $ sort_authors use_statistics spellings
> +                         $ mapRL (pi_author.info) $ concatRL darcs_history
> hunk ./src/list_authors.hs 54
> -sort_authors :: Bool -> [String] -> [String]
> -sort_authors use_stats as = reverse $ map shownames $ sort $
> -                            map (\s -> (length s,canonize_author $ head s)) $
> -                            group $ sort as
> +sort_authors :: Bool -> [(String,[Regex])] -> [String] -> [String]
> +sort_authors use_stats spellings as = 
> +    reverse $ map shownames $ sort $
> +    {- group and count again after canonizing -}
> +    map (\s -> (length s,head s)) $ group $ sort $ concat $ map (\(n,a) ->  replicate n a) $
> +    map (\s -> (length s,canonize_author spellings $ head s)) $ group $ sort as
> hunk ./src/list_authors.hs 64
> -canonize_author :: String -> String
> -canonize_author a | null author_spellings = a
> -canonize_author a = safehead a $ canonicalsfor a
> +canonize_author :: [(String,[Regex])] -> String -> String
> +canonize_author [] a = a
> +canonize_author spellings a = safehead a $ canonicalsfor a
> hunk ./src/list_authors.hs 69
> -      canonicalsfor s = map fst $ filter (ismatch s) $ compiled_spellings
> +      canonicalsfor s = map fst $ filter (ismatch s) spellings
> hunk ./src/list_authors.hs 72
> -          where email = takeWhile (/= '>') $ tail $ dropWhile (/= '<') canonical
> +          where email = takeWhile (/= '>') $ drop 1 $ dropWhile (/= '<') canonical
> hunk ./src/list_authors.hs 82
> -compiled_spellings :: [(String,[Regex])]
> -compiled_spellings = map compile author_spellings
> +compiled_spellings :: IO [(String,[Regex])]
> +compiled_spellings = do
> +  fs <- author_spellings_from_file
> +  return $ map compile $ fs ++ author_spellings
> hunk ./src/list_authors.hs 92
> --- containing the canonical name and email address optionally followed
> --- by additional regular expression patterns. An author string which
> --- contains the canonical email address or any of the patterns will be
> --- replaced by the canonical form.  All matching is case-insensitive,
> --- to match the whole author string use ^ and $.
> +-- containing the canonical name and email address in angle brackets,
> +-- optionally followed by additional regular expression patterns. An
> +-- author string which contains the canonical email address or any of
> +-- the patterns will be replaced by the canonical form.  All matching
> +-- is case-insensitive. To match the whole author string use ^ and $.
> hunk ./src/list_authors.hs 192
> +-- Canonical author spellings may also be defined in this file, one
> +-- per line. Fields are as above, comma-separated. Blank lines and
> +-- lines beginning with -- are ignored. The file takes precedence over
> +-- the built-in spellings.
> +authorspellingsfile = ".authorspellings"
> +
> +author_spellings_from_file :: IO [[String]]
> +author_spellings_from_file = do
> +  s <- readFile authorspellingsfile `catch` (\e -> return "")
> +  let noncomments = filter (not . ("--" `isPrefixOf`)) $ 
> +                    filter (not . null) $ map strip $ lines s
> +  return $ map (map strip . split_on ',') noncomments
> +
> +split_on :: Eq a => a -> [a] -> [[a]]
> +split_on e l = 
> +    case dropWhile (e==) l of
> +      [] -> []
> +      l' -> first : split_on e rest
> +        where
> +          (first,rest) = break (e==) l'
> +
> +strip :: String -> String
> +strip = dropWhile isSpace . reverse . dropWhile isSpace . reverse
> +
> +
> 

add list_authors-style canonicalizing to the show authors command
-----------------------------------------------------------------
> -           map (\s -> (length s,head s)) $ group $ sort authors
> +           map (\s -> (length s,head s)) $ group $ sort $ concat $ map (\(n,a) ->  replicate n a) $

concatMap and uncurry might be nice here

> +           map (\s -> (length s,canonize_author spellings $ head s)) $ group $ sort authors

> hunk ./src/Darcs/Commands/ShowAuthors.lhs 72

> +      ismatch s (canonical,regexps) =
> +          (not (null email) && (s `contains` email)) || (any (s `contains_regex`) regexps)

Superfluous parentheses

> +          where email = takeWhile (/= '>') $ drop 1 $ dropWhile (/= '<') canonical

> +contains :: String -> String -> Bool
> +a `contains` b = lower b `isInfixOf` (lower a) where lower = map toLower

Superfluous parentheses

> +contains_regex :: String -> Regex -> Bool
> +a `contains_regex` r = case matchRegex r a of
> +                         Just _ -> True
> +                         _ -> False

I think this could be written as:
  a `contains_regex` r = maybe False (const True) matchRegex r a

(Up to you to decide if it's wise to do so)

> +compiled_author_spellings = do
> +  ss <- author_spellings_from_file
> +  return $ map compile $ ss

Perhaps a better formulation:
  map compile `fmap` author_spellings_from_file

> +      compile [] = error "each author spelling should contain at least the canonical form"
> +      compile (canonical:pats) = (canonical, map mkregex pats)
> +      mkregex pat = mkRegexWithOpts pat True False

> +-- Canonical author spellings can be defined in this file, to clean up

Sounds like this should be a haddock comment

> +split_on :: Eq a => a -> [a] -> [[a]]
> +split_on e l =
> +    case dropWhile (e==) l of
> +      [] -> []
> +      l' -> first : split_on e rest
> +        where
> +          (first,rest) = break (e==) l'

We could consider using the split package on hackage (don't know if it's
wise to introduce a dependency just for that)

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20090208/3a342e46/attachment.pgp>


More information about the darcs-users mailing list