[darcs-users] annotate output

Mon Sep 6 10:32:25 UTC 2010

Lele Gaifax <lele at nautilus.homeip.net> writes:
> With respect, that's not true. Just because some of us is more
> productive with other languages, it does not mean that "nothing
> happened"! I bet most of us had an opinion on the subject since the
> beginning, but given that workaround existed for a long time, there
> were better ways to spend hacker's time...

There are better ways to spend hacker's time than to argue about details
of the format. I probably spent about equal time writing the new
annotate (complete, with the speed improvements and all) as replying on
this thread. I wish I knew why I am still doing that... anyway, now that
I have reply written, I'll send it -- but that's the last one.

Of course, people are still free to submit patches, or serious
researched proposals showing that we could do better than we already
do. Like, if someone wants to argue that parsing YAML is easier than
parsing a regular language, they would provide both parsers in code (it
doesn't have to be Haskell).

> As already said, I still think it [YAML] is a viable alternative when it
> comes to the task: dumping *structured* data that you want
> re-elaborate later by some other machine code, without sacrifying
> human readability.

Viable, sure, but best? Probably not. YAML is tailored to representing
CFLs (i.e. arbitrarily nested data). But we don't ever need CFLs in
darcs. Most of computer science is not about finding a tool that's
powerful enough: we already got that for all the problems we know how to
solve -- Turing machine. It is about finding the weakest tool that still
works. The reason for that is that simpler (weaker) tools are more
transparent and have nicer properties. More things about them are
decidable, they are easier to implement (have fewer bugs), there are
more and better tools to handle them, etc.

What I am arguing against is using CFLs to represent inherently regular
data. There is nothing wrong with regular... I know you all have mixed
feelings about regular expressions, since people abuse them a lot to
parse CFL data. The reason for that is that there simply aren't tools
with comparable power and flexibility that would apply to general CFLs
(the other part of the reason is that people insist on dumping simple
regular data embedded in a CFL of some sort). Tools for specific CFLs
(XML, JSON, YAML) are comparatively rare, unwieldy and buggy, each comes
with its own learning curve, etc.

While you claim that tools for parsing YAML are universally available,
tools for parsing *any* regular language are actually much more
ubiquitous. And even though they are, within their class of languages,
more general (they cover all RLs, not just a specific subset), the
parsers are not any more complex (actually to the contrary), as can be
demonstrated by writing parsers for the various proposed formats (YAML
and regular) in this thread.

> a) dumping "annotate" information in a way that's easy to elaborate
> b) dumping an "annotated" content that's easy to the human eye

No-one ever argued against that split (well, not me anyway). For humans,
we have the output I quoted in the original mail about annotate rewrite,
with a numbered list of patches at top and lines marked with patch
numbers.

The machine-readable version looks like

20100903001327-fb03a-045b1923d4b1b1b432d3e3b03840101f4f9891e3 | text
20100903001327-fb03a-045b1923d4b1b1b432d3e3b03840101f4f9891e3 | more text

It is so trivial to parse that there's no way you could make it more
trivial with YAML or any other format you can think of. If the patch
hash format changes, either both YAML and RE parsers cope, or both
break, but neither can do better than the other in principle.

> For subproblem b), I'll ideally code something like
>
>     $ darcs annotate --yaml README
>     - 20100903001327-fb03a-045b1923d4b1b1b432d3e3b03840101f4f9891e3
>     - 20100904030947-fb03a-fac657d51ee80c33488031d8bdcfab769aa7f83c
>     - 20100904151247-a433a-oac77ade86ab7772cbb1932acdffc65f11abcd81
>     lines_patch: [1, 3, 2]
>
> where "lines_patch" (an attribute with a better name to come) is an
> array where each item is the index to the patch that touched the
> "positional-corresponding" line of content. No fancy escape needed,
> very compact, and basically all that tracdarcs needs to do its work.

There are certainly tools that would benefit from getting the file
contents as part of the annotate, especially those that work with
multiple VCSes (I believe all other tools give you the lines). Although
I see how you could apply that argument to the patchinfos for the
hashes. Nevertheless, the general practice seems to be to give complete
file contents in annotate.

Yours,
   Petr.