[darcs-devel] [issue580] list files converts names to utf-8 even if they already are utf-8
Pekka Pessi
bugs at darcs.net
Wed Jan 9 04:19:29 UTC 2008
New submission from Pekka Pessi <ppessi at gmail.com>:
When the repo contnets are shown with darcs list, the file names that
contain 8-bit chars (UTF-8 or ISO-8859-* or whatever) are converted to
UTF-8 as if they are ISO-8859-1.
For example, file named "Ääliö älä lyö ööliä läikkyy" in 8859-1 is
byte string \e4\e4\6c\69\f6 ...
It is shown with, e.g., darcs changes --summary as quoted bytestring
[_\e4_][_\e4_]li[_\f6_] ...
With darcs list files it is shown as
./[_\c3_][_\a4_][_\c3_][_\a4_]li[_\c3_][_\b6_] (iow, it has been
converted into utf-8 as iso-8859-1).
If the file name is encoded in utf-8, it has bytestring
\c3\84\c3\a4\6c\69\c3\b6 (each accented char is now encoded in two
bytes). It is shown with, e.g., darcs changes --summary as quoted
bytestring [_\c3_][_\84_][_\c3_][_\a4_]li[_\c3_][_\b6_]
However, with darcs list files it is shown as
[_\c3_][_\83_][_\c2_][_\84_][_\c3_][_\83_][_\c2_][_\a4_]li[_\c3_][_\83_][_\c2_][_\b6_]
that is, darcs list assumes that the bytestring is a ISO-8859-1 string
and converts it into UTF-8.
A script output from utf-8 terminal is attached.
----------
files: list-files-utf8
messages: 2385
nosy: beschmi, droundy, kowey, ppessi, tommy
status: unread
title: list files converts names to utf-8 even if they already are utf-8
__________________________________
Darcs bug tracker <bugs at darcs.net>
<http://bugs.darcs.net/issue580>
__________________________________
-------------- next part --------------
A non-text attachment was scrubbed...
Name: list-files-utf8
Type: application/octet-stream
Size: 1011 bytes
Desc: not available
Url : http://lists.osuosl.org/pipermail/darcs-devel/attachments/20080109/9864714b/attachment.obj
More information about the darcs-devel
mailing list