On 9/5/2010 15:37, Petr Rockai wrote:
> Max Battcher<me at worldmaker.net>  writes:
>> 045b19	First line of the file
>> oac77a	second line, tweaked by Lele
>> fac657	third line, added by John
>> ...
>> I'm still wondering if the second document may also be better in a YAML format
>> of some sort... Another toy example:
>> - 045b19: |
>>    First line of file
>> - oac77a: |
>>    second line, tweaked by Lele
>>    third line, tweaked by Lele
>> - fac657: |
>>    man line, added by John
>> That doesn't seem that much worse than the existing human-readable annotate
>> output...
> I should remind you that what you propose is not legal YAML, which was
> also my reason against using it for annotate in the first place. You
> either mangle the file contents doing custom non-YAML quoting, or you
> use double-quoted YAML strings with escapes.
> The YAML spec explicitly says, that there's no other way to encode
> arbitrary strings than to use "..." with \-sequences, which will lead to
> really hard to read output (both for humans and non-YAML machines).

Is it hard to read? Surely \-sequences are not much harder to read than 
existing [_-sequences in darcs. I don't think quote marks subtract much 
readability either (most of us are programmers, after all-- we see 
quoted strings all over the place).

Also, what are we talking about escaping? YAML's default character set 
is UTF-8 and the excluded characters seem quite explicit:

"The allowed character range explicitly excludes the C0 control block 
#x0-#x1F (except for TAB #x9, LF #xA, and CR #xD which are allowed), DEL 
#x7F, the C1 control block #x80-#x9F (except for NEL #x85 which is 
allowed), the surrogate block #xD800-#xDFFF, #xFFFE, and #xFFFF."

Everything in UTF-8 not explicitly excluded is fair game in a literal 
block... How likely are non-binary files to contain characters in the 
proscribed ranges? Certainly darcs could check for the proscribed ranges 
and switch to a quoted format only when truly necessary.

- 045b19:
   "First line of file\u0000"
- oac77a: |
   second line, tweaked by Lele
   third line, tweaked by Lele
- fac657: |
   man line, added by John

I think human-readability is preserved fine. Sure, it makes "scrape 
parsing" tougher, but of course "use a more conforming parser" if you 
need to parse more complicated annotate output doesn't sound like bad 
advice to me. However, it should be obvious that the existence or lack 
of "|" in the header is a clear, regular determinant even if you do wish 
to maintain some rough per-line/regex scrape.

