[darcs-users] machine-readable formats (Was: the state of the adventure)

Max Battcher me at worldmaker.net
Fri Sep 3 21:13:30 UTC 2010


On 09/03/2010 04:39 AM, Eric Kow wrote:
 > Is there a way to have both cake (very very easy to parse) and eating
 > (sufficiently expressive to do anything Darcs would reasonably want to
 > do with machine-readable outputs)?

Short answer: No.

> * "Very very easy to parse" seems like a good feature.
>    And there is nothing easier to parse than simple line-based
>    like the above. Even JSON (with the json library) imposes a
>    little bit of friction...

Just because it is "easy to parse" doesn't mean I want to maintain a 
parser for it. In my experience "easy to parse" things often mean a 
long-term maintenance hell. What happens when someone needs more 
information added to the feed? What happens when data-types change? Does 
darcs need to keep a large zoo of parsers in a myriad of languages and 
platforms as a regression suite as a guard against breaking user's tools 
out "in the wild"? Or would the preference be to rubber-stamp and thus 
maintain "official" parser libraries for a large number of 
languages/platforms?

The markup/object formats (JSON, YAML, XML, etc) are large, ugly and 
verbose for a reason: they try their best to support both 
backwards-compatibility and forwards-compatibility. Schema changes are a 
lot easier to make backwards-compatible than "parser changes".

Any attempt at a smaller format is eventually going to have to answer 
the same questions that a markup/object format deals with in its spec, 
and without a *lot* of planning before hand (effectively remaking a 
markup format in the process) is doomed to do so haphazardly and with 
ever so much "cruft".

I do think the best bet is to pick and existing markup standard with a 
good specification and support it *well*. Even if the answer is just to 
beef up the current XML output.

> * Human-readable (even if it's machine-oriented) could be a nice
>    minor feature [it lends a sort of transparency]

This seems like a rather low priority. I think that you have to assume 
that any machine-oriented output is primarily destined to be piped 
directly into tools with little or no human involvement. Certainly the 
human readable output may always be preferable.

I've pointed out before that if this is considered something of a 
priority, however, I think YAML is a good candidate. YAML has 
human-readable configurations and goals within its spec. I've pointed 
out before that ``darcs show repo`` as it already is, is nearly YAML 
already [1] and I have pointed out before that I think it may be worth 
tweaking it to make it valid YAML.

> * Perhaps another feature would be a sort of uniformity, that all of
>    Darcs machine-readable outputs work the same way.  Can we achieve
>    such a uniformity with just a regular language?

I think that in trying something like that you end up with either N 
mini-languages that are "mostly similar" or a half-baked markup with a 
poor specification.

> * As far as I'm concerned, "not-XML" is a feature.
>    I think that's just a silly knee-jerk reaction on my part, though

XML is not the enemy here. We're talking about passing around data that 
other applications can read and XML is a fine solution for that, 
particularly when implemented correctly.

I see no problem in picking a markup language with simpler dependencies 
than XML or that are easier to validly output than XML, but I do think 
it makes sense to stick with a markup language of one sort or another, 
with existing specifications and existing well-known parsers in the 
wild, than building an arbitrary new one without *strong* reason to do so.

----

[1] Using ``darcs show repo`` from darcs 2.3.0 as an example:

           Type: darcs
         Format: hashed, darcs-2
           Root: /home/worldmaker/repos/darcsforge
       Pristine: HashedPristine
          Cache: thisrepo:/home/worldmaker/repos/darcsforge, 
cache:/home/worldmaker/.darcs/cache
Default Remote: code.worldmaker.net:repos/pub/darcsforge/main/
    Num Patches: 155

Primarily, YAML barfs on the right-aligned keys (because YAML's more 
human readable formatting is whitespace-dependent). Reformatted to valid 
YAML, but preserving the attempted alignment:

Type:           darcs
Format:         [hashed, darcs-2]
Root:           /home/worldmaker/repos/darcsforge
Pristine:       HashedPristine
Cache:
                 - thisrepo:/home/worldmaker/repos/darcsforge
                 - cache:/home/worldmaker/.darcs/cache
Default Remote: code.worldmaker.net:repos/pub/darcsforge/main/
Num Patches:    155

I think this is just as readable, but now YAML also parses it, with both 
Format and Cache being (correctly) interpreted as lists and YAML also 
picks up that 155 is a numerical literal (and thus is an integer in 
Python, for example). Of course, this example is easy because there 
aren't any special characters in my paths above that required escaping.

-- 
--Max Battcher--
http://worldmaker.net


More information about the darcs-users mailing list