[darcs-users] darcs changes --xml is not consistently encoded

Trent W. Buck trentbuck at gmail.com
Sat Oct 11 06:54:18 UTC 2008


There's at least one patch in the Darcs darcs repo that is not encoded
in UTF-8.  As the output does not specify an encoding, I believe it is
required by the XML standard to be UTF-8 encoded.

  $ darcs changes --xml |
    xmlstarlet sel -t -m changelog/patches -v @author -n
  -:10179: parser error : Input is not proper UTF-8, indicate encoding !
  Bytes: 0xFC 0x6E 0x7A 0x6C
  <patch author='Daniel B�nzli &lt;daniel.buenzli at epfl.ch&gt;' date='2005112017015

This is in itself a problem, preventing me from working with the darcs
metadata programmatically.  But it may be indicative of a very serious
problem -- darcs may not convert patche metadata to a single internal
encoding.

Suppose there's a two-man repo, and the contributors respectively use
Shift JIS and UTF-16.  Even if they use -*- coding -*- magic in their
source files to use the same coding there, it might be that "darcs
changes" emits data that's partly encoded as UTF-16, and partly
encoded as JIS!


More information about the darcs-users mailing list