[darcs-users] machine-readable formats (Was: the state of the adventure)

Lele Gaifax lele at nautilus.homeip.net
Sat Sep 4 09:27:53 UTC 2010


On Fri, 03 Sep 2010 17:13:30 -0400
Max Battcher <me at worldmaker.net> wrote:

> Just because it is "easy to parse" doesn't mean I want to maintain a 
> parser for it. In my experience "easy to parse" things often mean a 
> long-term maintenance hell. What happens when someone needs more 
> information added to the feed? What happens when data-types change?
> Does darcs need to keep a large zoo of parsers in a myriad of
> languages and platforms as a regression suite as a guard against
> breaking user's tools out "in the wild"? Or would the preference be
> to rubber-stamp and thus maintain "official" parser libraries for a
> large number of languages/platforms?

In the context of darcs (or more generally within VCs), I doubt that's
the common rule: how much often data-types changes? when was the last
time you wanted more information from any of the darcs ouput streams?

When I started developing tailor (and some time later trac-darcs), I
had to parse the "darcs changes" "human-biased" format (it was darcs
1.0.x IIRC), and I had to recompute the patch hash from that. It
hasn't been so much of a work, after all. When David (or whoever,
thanks!) implemented "changes --xml", I rewrote my parsers mainly to
get rid of the need to recompute the hash. Very roughly, the amount of
code was comparable (not accounting for the actual XML parsing library
that is). Unfortunately, the hard and most annoying aspect was to
workaround imprecise non-ASCII chars encoding, and from fast memory it
is still the only occasional glitch in the code, that needed some
tweaks now and then.

Darcs internals has changed dramatically since then, but I did not
have to adapt anything in the parsers.

> > * Human-readable (even if it's machine-oriented) could be a nice
> >    minor feature [it lends a sort of transparency]
> 
> This seems like a rather low priority. I think that you have to
> assume that any machine-oriented output is primarily destined to be
> piped directly into tools with little or no human involvement.
> Certainly the human readable output may always be preferable.
> 
> I've pointed out before that if this is considered something of a 
> priority, however, I think YAML is a good candidate. YAML has 
> human-readable configurations and goals within its spec. I've pointed 
> out before that ``darcs show repo`` as it already is, is nearly YAML 
> already [1] and I have pointed out before that I think it may be
> worth tweaking it to make it valid YAML.

As I already said, I think that the particular case of annotate makes
a point: in the majority of cases it gets slurped by humans (and in
the remaining cases, it's just because you want to decorate it in some
fancy way, again for human consumption). That's why I think a format
like XML is bad-suited, and force Alberto to write ann2ascii.py :-)

Given the semplicity of the output, a much simpler format like the one
proposed by Petr, or bzr's[1] or even git's[2] do fit the need better
than any XYZ markup could ever do. There's no possible ambiguity that
could ruin the understanding.

> > * As far as I'm concerned, "not-XML" is a feature.
> >    I think that's just a silly knee-jerk reaction on my part, though
> 
> XML is not the enemy here. We're talking about passing around data
> that other applications can read and XML is a fine solution for that, 
> particularly when implemented correctly.

Yes, I can agree, but I still think that XML is the worst choice when
you just need to encode lists of dictionaries (in Python jargon).

As exemplified by someone, even to extract the hash name from "darcs
changes --xml" you need either an XML parsing library, or a pipeline
of greps/cuts... By contrast, the equivalent information dumped in
YAML is much easier to adapt, either in a shell environment or by an
Emacs macro :)

ciao, lele.

[1] $ bzr annotate README
    2545.1.1 mbp at sou | =================
                     | README for Bazaar
                     | =================
    1904.3.1 dato at ne | 
    3092.2.1 mbp at sou | Bazaar (``bzr``) is a ...

    $ bzr annotate --long README |head
    2545.1.1 mbp at sourcefrog.net              20070625 | =================
                                                      | README for Bazaar
                                                      | =================
    1904.3.1 dato at net.com.org.es             20060811 | 
    3092.2.1 mbp at sourcefrog.net              20071207 | Bazaar (``bzr``)...

[2] $ git annotate README {NB: slightly edited}
    90543503 (lkcl 2010-04-25 13:16:32 +0000 1)Current Release: 0.7
    59285777 (lkcl 2009-07-10 21:46:17 +0000 2)---------------
    59285777 (lkcl 2009-07-10 21:46:17 +0000 3)
    90543503 (lkcl 2010-04-25 13:16:32 +0000 4)This is the 0.7 release-...
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele at nautilus.homeip.net |                 -- Fortunato Depero, 1929.


More information about the darcs-users mailing list