[darcs-users] Escaping of hunks and file names
Alexander Staubo
alex at byzantine.no
Fri Nov 5 06:08:14 UTC 2004
Is there a reason why Darcs escapes file names and text differently?
$ darcs what
{
hunk ./test.xml 1
-<?xml version="1.0" encoding="UTF-8"?>
+\ef\bb\bf<?xml version="1.0" encoding="UTF-8"?>
addfile ./with\32\spaces.xml
}
The hunk shows a file containing a UTF-8 sequence. The addfile patch
cites a file named "with spaces.xml".
Darcs uses \xx for the diff, where xx is a hexadecimal number; but it
seems to use the format \yy\ for the file name, where yy is a 10-base
decimal number.
I'm parsing Darcs' output in a tool I'm writing. For the sake of
simplicity and script-friendliness, it would be preferable if Darcs were
consistent here.
The escaping also occurs in the XML output:
$ darcs changes --xml-output
<changelog>
...
<add_file>with\32\spaces.xml</add_file>
...
</changelog>
There is absolutely no need to escape anything in XML except "<", ">"
and "&", and the escaping pollutes the format.
I'm surprised that Darcs even escapes file names. However, in many
places it *doesn't* escape anything:
$ darcs what -s
A ./with spaces.xml
$ darcs changes -s
Fri Nov 5 07:02:39 W. Europe Standard Time 2004 alex at byzantine.no
* More testing.
A ./with spaces.xml
Again, a consistent syntax would be great.
The *lack* of escaping actually causes ambiguity in the case of modified
lines:
$ darcs what -s
M ./with spaces.xml -2 +3
Here you would need to parse the line counts from line[-2] and line[-1],
respectively, and what if the file (for whatever reason) *ends* with a
space? Not a great way to ensure future compatibility with scripts.
What, if anything, does Darcs do with Unicode file names? I suspect it
either outputs them as-is, or it escapes each octet individually.
Alexander.
More information about the darcs-users
mailing list