[darcs-users] annotate output
me at mornfall.net
Sun Oct 24 12:30:20 UTC 2010
Alberto Bertogli <albertito at blitiri.com.ar> writes:
>> Instead, I could offer a list of key->info mapping as part of the
>> machine format at the start or end (probably end). I imagine it could
>> look like:
>> <the annotation>
>> <patch info>
>> <patch info>
>> I did not include before because I only had feedback from Lele who said
>> it is redundant in his case (since he already maintains a <hash> ->
>> <info> map internally). I don't think it'd be too costly to add the map
>> for anyone (if the infos are not interesting, you can simply cut off at
>> the first empty line).
> That depends on the tool.
> Darcsweb relies on annotate --xml output to show the annotate page, and
> if it had only the hash ids, that would mean extra darcs invocations to
> get authorship information.
> That is so because darcsweb does not rely on any database, or persistent
> state. It's supposed to be a light, easy to install and read-only cgi
> I can imagine that, for example, a short-lived graphical anotate browser
> (like git gui blame) could have similar requirements.
Yes, see my suggestion above... what I proposed would look like (e.g.):
hash1 | line 1
hash2 | line 2
A patch author
D patch date
N patch name
C patch comment
C (comment continued...)
(the formatting of patch data is subject to further discussion I guess,
but the above looks quite reasonable to me... maybe we should come up
with different letter prefixes, so we have empty intersection with the
status letters, which would also let us re-use the same format for darcs
changes --machine --summary)
> I've read the discussion and I think most of the formats look great
> (both machine and readable), but it'd be nice if the machine-readable
> ones could export the same (or more) information than the current --xml,
> for the reasons stated above.
> These are some things in XML output that caused trouble for darcsweb in
> the past, and maybe could be avoided/improved in the new format:
> - Encoding of code: in particular non-utf8 files, or files with a mix.
> - Non-printable characters in code: things like ^L are common, if you
> are escaping some of them, please make it easy to handle.
The above format isn't escaping anything, since it's line-based, so it
doesn't need to. You get the literal source lines after the |, as
delimited by \n.
> - Date formats: please use a normalized date format (ISO would be IMHO
> a nice choice), and avoid timezone names if possible, using [+-]XXXX
> instead. Timezone names are very problematic to parse.
> - Encoding of the author's name. Remember that people may put weird
> characters in their name and it should be handled properly.
As long as we don't allow newlines there, shouldn't be a problem. (Even
if we do, it actually wouldn't be that much of a problem either.)
> - Names and email addresses: if you are putting names and email
> addresses together, please escape < and > in names, so finding out
> the email address is easier.
Git disallows < and > from appearing in author name. I suppose we don't
and the < and > have no special meaning to darcs. I suppose if you need
to parse the email, going for the last <...> pair would make most
sense. (I.e. .*<(.*?)>.*? written as a perl regex.)
> - Binary files: while this has not been a problem, it's a very nice
> feature to know from darcs which files it considers binary.
I am not sure, but how was this indicated in the xml format. Do you mean
(In that case, it's actually not machine-parseable information, since it
coincides with a text file containing a single line, "Binary file".)
> Also, if you are going to deprecate --xml, please make sure there is a
> way to reliably detect the availability of the new output in a
> backwards-compatible way.
Is this good enough?
darcs failed: unrecognized option `--machine'
(it also gives error code 2, while other failures seem to give error
More information about the darcs-users