[darcs-users] annotate output
me at worldmaker.net
Sun Sep 5 21:08:13 UTC 2010
On 9/5/2010 15:37, Petr Rockai wrote:
> Max Battcher<me at worldmaker.net> writes:
>> 045b19 First line of the file
>> oac77a second line, tweaked by Lele
>> fac657 third line, added by John
>> I'm still wondering if the second document may also be better in a YAML format
>> of some sort... Another toy example:
>> - 045b19: |
>> First line of file
>> - oac77a: |
>> second line, tweaked by Lele
>> third line, tweaked by Lele
>> - fac657: |
>> man line, added by John
>> That doesn't seem that much worse than the existing human-readable annotate
> I should remind you that what you propose is not legal YAML, which was
> also my reason against using it for annotate in the first place. You
> either mangle the file contents doing custom non-YAML quoting, or you
> use double-quoted YAML strings with escapes.
> The YAML spec explicitly says, that there's no other way to encode
> arbitrary strings than to use "..." with \-sequences, which will lead to
> really hard to read output (both for humans and non-YAML machines).
Is it hard to read? Surely \-sequences are not much harder to read than
existing [_-sequences in darcs. I don't think quote marks subtract much
readability either (most of us are programmers, after all-- we see
quoted strings all over the place).
Also, what are we talking about escaping? YAML's default character set
is UTF-8 and the excluded characters seem quite explicit:
"The allowed character range explicitly excludes the C0 control block
#x0-#x1F (except for TAB #x9, LF #xA, and CR #xD which are allowed), DEL
#x7F, the C1 control block #x80-#x9F (except for NEL #x85 which is
allowed), the surrogate block #xD800-#xDFFF, #xFFFE, and #xFFFF."
Everything in UTF-8 not explicitly excluded is fair game in a literal
block... How likely are non-binary files to contain characters in the
proscribed ranges? Certainly darcs could check for the proscribed ranges
and switch to a quoted format only when truly necessary.
"First line of file\u0000"
- oac77a: |
second line, tweaked by Lele
third line, tweaked by Lele
- fac657: |
man line, added by John
I think human-readability is preserved fine. Sure, it makes "scrape
parsing" tougher, but of course "use a more conforming parser" if you
need to parse more complicated annotate output doesn't sound like bad
advice to me. However, it should be obvious that the existence or lack
of "|" in the header is a clear, regular determinant even if you do wish
to maintain some rough per-line/regex scrape.
More information about the darcs-users