[darcs-users] darcs changes --xml is not consistently encoded
Trent W. Buck
trentbuck at gmail.com
Sat Oct 11 06:54:18 UTC 2008
There's at least one patch in the Darcs darcs repo that is not encoded
in UTF-8. As the output does not specify an encoding, I believe it is
required by the XML standard to be UTF-8 encoded.
$ darcs changes --xml |
xmlstarlet sel -t -m changelog/patches -v @author -n
-:10179: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xFC 0x6E 0x7A 0x6C
<patch author='Daniel B�nzli <daniel.buenzli at epfl.ch>' date='2005112017015
This is in itself a problem, preventing me from working with the darcs
metadata programmatically. But it may be indicative of a very serious
problem -- darcs may not convert patche metadata to a single internal
encoding.
Suppose there's a two-man repo, and the contributors respectively use
Shift JIS and UTF-16. Even if they use -*- coding -*- magic in their
source files to use the same coding there, it might be that "darcs
changes" emits data that's partly encoded as UTF-16, and partly
encoded as JIS!
More information about the darcs-users
mailing list