[darcs-users] machine-readable formats

Petr Rockai me at mornfall.net
Fri Sep 3 22:43:43 UTC 2010

Max Battcher <me at worldmaker.net> writes:

>> Ok, another downside of this that I discovered while reading the YAML
>> spec is that we can't keep the lines verbatim, even modulo non-printable
>> escapes, since these would be ambiguous. We probably need to use quoted
>> scalars, which comes with a significant downside, namely that it's much
>> harder to parse now without real YAML parser... :(

> I'm *for* encouraging people to use a real YAML parser, if we are going for a
> YAML-like output. It would add a form of consistency and easy sign-post for
> best-supported experiences (to parse, just use your local YAML parser). YAML
> parsers exist already for a number of languages/platforms. Since YAML's data
> schema is a clear superset of JSON's, its also very easy to convert YAML to
> JSON (or XML for that matter) for those languages/platforms that don't provide
> YAML parsers.

Even the simplest YAML parser won't fit the 20 or so characters of shell
you need to extract useful data from the proposed regular format. Also,
your argument goes both ways. If we do the in-darcs API right, it'd be
trivial to use the same code to give both --machine (regular language,
no escaping) and --yaml output. The yaml won't be very spectacular (in a
number of cases, say annotate, it'll be a straight ugly mess), but would
probably work.

I guess the only reasonable way to YAML [(a, b)] is a list of singleton
maps, right... or is a list of two-element lists better? (As opposed to
[a -> b] which is an actual YAML map.)

In fact, we only really need the associative list type, since we can
represent anything we need that way. That said, the uniqueness
requirement on YAML maps is a bit unfortunate in this respect.

But we could:

class Serialize a where
      regular :: a -> ByteString
      yaml :: a -> ByteString

newtype AList = AList [(ByteString, ByteString)]
newtype MList = MList [(ByteString, ByteString)]

instance Serialize MList where ...
instance Serialize AList where ...

the difference being that the yaml output of MList will fold things like
[(M, ./foo.txt), (M, ./bar.txt)] into [(M, [./foo.txt, ./bar.txt])] to
create a valid YAML map, whereas AList will create (some rendering of)
an associative list (not a map).

For the regular language output, there's no difference.

For patch listings (changes --yaml, changes -s --yaml) we'd use MList,
for annotate --yaml, we'd use AList.

> I've got a gut feeling that I'd rather see darcs properly escape things in a
> clear character-oriented format for its machine-readable output than deal with
> verbatim output, anyway.

Yes, but that's just because you are a python programmer and not a perl
programmer (or a shell programmer, or an awk programmer). I suspect
emacs-lispers will prefer the regular language output too.


PS: We still need a YAML library. Shame that all that are on Hackage are
essentially undocumented. I'll just hope that someone will be able to
implement the above two yaml :: a -> ByteString functions for me. They
could actually be a (-> Handle) -> IO () if that helps any.

More information about the darcs-users mailing list