[darcs-users] machine-readable formats

Petr Rockai me at mornfall.net
Fri Sep 3 11:17:33 UTC 2010


Petr Rockai <me at mornfall.net> writes:

> - <hash>: line
> - <hash>: line
> ...
>
> which makes it a list of singleton hash->line maps... which may be about
> as convenient as it gets with the data model at hand. For now, I'll
> implement this and try to sell it as annotate --yaml in adventure. :)

Ok, another downside of this that I discovered while reading the YAML
spec is that we can't keep the lines verbatim, even modulo non-printable
escapes, since these would be ambiguous. We probably need to use quoted
scalars, which comes with a significant downside, namely that it's much
harder to parse now without real YAML parser... :(

The result would look like
- hash: "..."

with ", \ and non-printable characters escaped in those ... but it's
really quite hard to get rid of those correctly. Even though the
resulting language is still regular, the correct automaton becomes
unwieldy, defeating the original purpose.

The alternative

- hash: |
   line

does not work, because of (a) leading spaces and (b) no escaping
allowed, so an unprintable character in your VCSd files will lead to
invalid YAML.

So while it's great that YAML is lightweight, it has very serious issues
dealing with arbitrary byte strings (as opposed to character strings),
which makes them basically useless to quote actual file contents.

The originally proposed machine-readable format (fixed-length prefix, \n
as the only significant byte) does not suffer from any of those
problems.

I am afraid that the above disqualifies YAML, as almost any darcs output
may contain non-printable characters, so the recipients would need to
deal with "-quoted strings everywhere. Overall, I think that a real
line-oriented format is probably much better suited (that way, the only
thing that we need to escape are newlines in filenames). It also
dispenses with the key uniqueness requirement, so we can have something
like:

patch: 20100903001327-fb03a-045b1923d4b1b1b432d3e3b03840101f4f9891e3.gz
  name: The name of the patch (no newlines allowed by darcs here)
  date: ...
  salt: ...
  comment: Some fancy comment
  comment: that spans multiple lines
  M -10 +3: some_file.txt
  M -8 +10: some file\nwith newlines and spaces in it

I don't know if the format has a name, but it's basically

/(^patch: [0-9a-f-]{61}\.gz\n  name: ([^\n]*)\n  date: ([^\n]*)\n[...])*/

You can add "^\n" as a separator between patches to make it even easier
to process.

Yours,
   Petr.


More information about the darcs-users mailing list