[darcs-users] annotate output

Alberto Bertogli albertito at blitiri.com.ar
Fri Oct 15 15:23:23 UTC 2010

Sorry for sending this so late, but I'm catching up with darcs-related

On Tue, Sep 14, 2010 at 02:12:13PM +0200, Petr Rockai wrote:
> Instead, I could offer a list of key->info mapping as part of the
> machine format at the start or end (probably end). I imagine it could
> look like:
> <the annotation>
> <hash1>
> <patch info>
> <hash2>
> <patch info>
> ...
> I did not include before because I only had feedback from Lele who said
> it is redundant in his case (since he already maintains a <hash> ->
> <info> map internally). I don't think it'd be too costly to add the map
> for anyone (if the infos are not interesting, you can simply cut off at
> the first empty line).

That depends on the tool.

Darcsweb relies on annotate --xml output to show the annotate page, and
if it had only the hash ids, that would mean extra darcs invocations to
get authorship information.

That is so because darcsweb does not rely on any database, or persistent
state. It's supposed to be a light, easy to install and read-only cgi

I can imagine that, for example, a short-lived graphical anotate browser
(like git gui blame) could have similar requirements.

I've read the discussion and I think most of the formats look great
(both machine and readable), but it'd be nice if the machine-readable
ones could export the same (or more) information than the current --xml,
for the reasons stated above.

Maybe a short --machine, and an optional --machine=long or something
like that, where the latter would behave as you propose, with an initial
hash -> info map.

About the format in particular, as long as it is unambiguous and
reasonable to parse, I'm ok with it.

These are some things in XML output that caused trouble for darcsweb in
the past, and maybe could be avoided/improved in the new format:

 - Encoding of code: in particular non-utf8 files, or files with a mix.
 - Non-printable characters in code: things like ^L are common, if you
   are escaping some of them, please make it easy to handle.
 - Date formats: please use a normalized date format (ISO would be IMHO
   a nice choice), and avoid timezone names if possible, using [+-]XXXX
   instead. Timezone names are very problematic to parse.
 - Encoding of the author's name. Remember that people may put weird
   characters in their name and it should be handled properly.
 - Names and email addresses: if you are putting names and email
   addresses together, please escape < and > in names, so finding out
   the email address is easier.
 - Binary files: while this has not been a problem, it's a very nice
   feature to know from darcs which files it considers binary.

Also, if you are going to deprecate --xml, please make sure there is a
way to reliably detect the availability of the new output in a
backwards-compatible way.

That is so tools can try to use the new format, and if it fails they can
fall back to the old one. One simple possibility is making sure
current/old darcs exits with code != 0 when invoked with the new flag,
and also different from the one used by darcs --machine (or whatever) to
signal an error.

Thanks a lot,

More information about the darcs-users mailing list