[darcs-users] Escaping of hunks and file names

Alexander Staubo alex at byzantine.no
Fri Nov 5 06:08:14 UTC 2004


Is there a reason why Darcs escapes file names and text differently?

$ darcs what
{
hunk ./test.xml 1
-<?xml version="1.0" encoding="UTF-8"?>
+\ef\bb\bf<?xml version="1.0" encoding="UTF-8"?>
addfile ./with\32\spaces.xml
}

The hunk shows a file containing a UTF-8 sequence. The addfile patch 
cites a file named "with spaces.xml".

Darcs uses \xx for the diff, where xx is a hexadecimal number; but it 
seems to use the format \yy\ for the file name, where yy is a 10-base 
decimal number.

I'm parsing Darcs' output in a tool I'm writing. For the sake of 
simplicity and script-friendliness, it would be preferable if Darcs were 
consistent here.

The escaping also occurs in the XML output:

$ darcs changes --xml-output
<changelog>
...
     <add_file>with\32\spaces.xml</add_file>
...
</changelog>

There is absolutely no need to escape anything in XML except "<", ">" 
and "&", and the escaping pollutes the format.

I'm surprised that Darcs even escapes file names. However, in many 
places it *doesn't* escape anything:

$ darcs what -s
A ./with spaces.xml

$ darcs changes -s
Fri Nov  5 07:02:39 W. Europe Standard Time 2004  alex at byzantine.no
   * More testing.

     A ./with spaces.xml

Again, a consistent syntax would be great.

The *lack* of escaping actually causes ambiguity in the case of modified 
lines:

$ darcs what -s
M ./with spaces.xml -2 +3

Here you would need to parse the line counts from line[-2] and line[-1], 
respectively, and what if the file (for whatever reason) *ends* with a 
space? Not a great way to ensure future compatibility with scripts.

What, if anything, does Darcs do with Unicode file names? I suspect it 
either outputs them as-is, or it escapes each octet individually.

Alexander.




More information about the darcs-users mailing list