[darcs-users] darcs patch: Add GChart graphs to reports

Wed May 5 00:14:34 UTC 2010

On 05/04/2010 05:38 PM, Eric Kow wrote:
> Unfortunately, my review isn't going to be particularly helpful as
> it's (a) subjective Eric style, which may be meaningless and (b)
> consists of saying "huh?" followed by the realisation that the "huh"
> bits are code I wrote.

Yeah, that's probably the biggest problem I had in writing this was 
figuring out for myself what exactly Report.hs was doing. (Plus still 
having to google/grep my way around most Haskell command names.)

> Add GChart graphs to reports
> ----------------------------
>> +-- A Formatter with Double output instead of String
>> +type Selector = TimeUnit ->  Maybe MemTimeOutput ->  Double
>
> I'm a bit skeptical about type synonyms like this that don't get used
> very often.
>
> It's sort of a delicate balance, and I never know which way ultimately
> leads to more readability, but it does sometimes happen that actually
> the long expanded version is more transparent.  I wish I could figure
> out when :-/
>
> Heh, of course, you were just following the type synonym I wrote
> but just be warned that maybe I was making a mistake :-)

At one revision of the module I expected the Selectors here to be 
exported (before settling on the simple, but ugly Bool "flag" instead). 
The point was to mimic the Formatters of Report/Definition, but the 
issue I encountered was that it makes even less sense in Graph, because 
the unit name winds up in an entirely different place (the graph title) 
than the selected/formatted values.

>> +selectMemory :: Selector
>> +selectMemory _ (Just mt) = fromRational (mtMemMean mt) / (1024 * 1024)
>> +selectMemory _ _ = 0.0
>> +
>> +selectTime :: Selector
>> +selectTime Milliseconds (Just mt) = mtTimeMean mt * 1000
>> +selectTime MinutesAndSeconds (Just mt) = mtTimeMean mt
>> +selectTime _ _ = 0.0
>
> What do the Nothing cases correspond to?

MissingCell in Report (formatted to "--")... while there is a "missing" 
value in some GChart encodings, hs-gchart doesn't seem to support them, 
and even if it did, for a bar graph 0 is a adequate substitute.

>> +graph :: Bool -- ^ Select for Time, else Memory
>> +  ->  [(Test a, Maybe MemTimeOutput)] ->  [(String, String)]
>> +graph timeresults results = map graphs coreNames
>> + where
>> +  graphs cname = (cname, mkGraphs cname)
>> +  coreNames = nub [ trCoreName r | (Test _ r _, _)<- results ]
>> +  mkGraphs cname = graphRepo timeresults cname results
>
> One graph set per repo (eg. tabular).
>
> I think mkGraphs is a little superfluous and
>
>    graphs cname = (cname, graphRepo timeresults cname)
>
> would be lighter

I nearly did that, but this was another case of simply duplicating Report.

>> +graphRepo :: Bool -- ^ Select for Time, else Memory
>> +  ->  String -- ^ Repo name
>> +  ->  [(Test a, Maybe MemTimeOutput)] ->  String
>
> Maybe this should return [String] and avoid not use graphDirective.
> Just focus on generating the graph itself...

I considered that, but wasn't sure if there was a specific use case yet 
for non-reST output. Similarly, rather than getGraphUrl, the hs-graph 
graph objects could be passed higher and functions applied to produce 
PNGs directly, for example.

>> +  -- the results which correspond to the repo in question (or a variant thereof)
>> +  interesting = [ test | test@(Test _ r _, _)<- results, trCoreName r == repo ]
>> +  columns = nub [ (b, trName tr) | (Test _ tr b, _)<- interesting ]
>> +  rownames = map description . filter hasBenchmark $ benchmarks
>> +  hasBenchmark b = any (\ (Test tb _ _, _) ->  description tb == description b) interesting
>
> Huh? [This is an expression of sheepishness... I know this comes
> straight from Report.hs, but looks like a bit of write-only code
> on my part.  Sorry!]

Yeah, I tried unsuccessfully to write a prettier version of this 
(attempting a groupBy, among other things), but instead resorted to 
Report code. Certainly this is a clear place to refactor both 
Report/Graph in sync.

Basically, the data structure here isn't very useful for how we're 
reporting/graphing it. We have a loose "bag" of tests and maybe their 
results: [(Test a, Maybe MemTimeOutput)]. (It is an ordered list, but 
for the intents and purposes of Report/Graph the order isn't useful: its 
the test order.)

``interesting`` is in order to (ad hoc) partition the results by tested 
repository. (Probably should just do a real ``partition`` in 
graph/tabulate and save graphRepo/tabulateRepo the trouble...)

``columns`` is in order to get all of the distinct (binary, repo 
variant) pairs.

``rownames`` is in order to get all of the distinct benchmark names.

Some of this is so that we can label the graph (or edges of the table), 
but its also needed to sort the results into the actual order we are 
interested in: grouped into rows by benchmark, in turn sorted by 
binary/variant order. Which is what this does:

>> +  rows = [catMaybes [find (match row col) interesting | col<- columns]
>> +    | row<- rownames]
>> +  match bench (binary,name) (Test bench' tr binary', _) =
>> +    bench == description bench'&&  binary == binary'&&  trName tr == name

>> +  rs = [(title (head row) row, map mkColName columns, rowdata row)
>> +    | row<- rows]

Once all of the real organization is done, graphRepo then does one last 
ordering to get the actual data structure it is interested in: 
(graphTitle, graphLabels, graphData), which is input of graphUrl.

>> +  rowdata row = [select (tu row) mt | (_, mt)<- row]
>> +  title (Test b _ _, _) row = description b ++ " (" ++ unitName (tu row) ++ ")"
>> +  select = if timeresults then selectTime else selectMemory
>> +  unitName tu = case (timeresults, tu) of
>> +    (True, Milliseconds) ->  "ms"
>> +    (True, MinutesAndSeconds) ->  "s"
>> +    (False, _) ->  "MiB"
>> +  tu row = case timeresults of
>> +    True ->  case mapMaybe getTime row of
>> +      [] ->  Milliseconds
>> +      xs ->  appropriateUnit (minimum xs)
>> +    False ->  Milliseconds
>> +  getTime (_, Just mt) = Just $ mtTimeMean mt
>> +  getTime _ = Nothing
>
> This sort of selection logic makes me wonder if you'd not be better off
> having separate functions for memory and timing graphs.  Maybe you can
> divide this function up in a different way?

Yeah, I started by following the Formatter of Report, but that didn't 
work. This Bool selection technique was a quick hack just to get 
something working.

> Presumably you tried to put it all in one function because the logic for
> picking out all of the benchmarks is common to graphs.  But could you
> not just isolate *that* (rs) into a function of its own and then have
> two functions that wrap around that instead?  Do you see what I mean?

I've been thinking something along those lines, particularly to pull out 
the commonality from tabulateRepo as well. We're doing a lot of work to 
break apart then "square" the data table, and certainly that probably 
makes a lot more sense as a separate, common function(s).

>> +graphDirective :: String ->  String
>> +graphDirective = (++) ".. image:: "
>
> This could be a little bit clearer using an operator section:
>
> graphDirective = ("image:: " ++)

Ah, nice. I'm obviously still getting used to Haskell operators.

> Do we want all the graphs in one big block?  Could work...

It'll still be organized by repo, just like the blocks proceeding it... 
I don't think there is a clear organization principle just yet for where 
to place the graphs.

As I said in the first email, I started wondering if perhaps the best 
place may instead be inside the tables (presumably as thumbnails with 
links to the full graph) themselves next to the rows they represent. 
(Does pandoc support reST substitutions?)

--
--Max Battcher--
http://worldmaker.net