[darcs-users] Debugging issue1739-escape-multibyte-chars-correctly.sh on tn23

Mon Apr 5 19:59:07 UTC 2010

Hi all,

Op zondag 04 april 2010 15:45 schreef je:
> The earlier investigation indicated that darcs decodes the contents of files 
that it reads with readLocaleFile, such as the file read when specifying the 
--logfile option, using "the console's encoding". To me, this is a debatable 
choice. 

At least on Linux, "the console's encoding" is the locale encoding, which is 
configured by environment variables and data files and implemented by libc. It 
also used by the C library in its multibyte character string functions. "less" 
for instance also uses it to indicate an error if they failed to decode it. 
It's also the encoding in which your text editor will save your new files if 
you haven't specified an encoding.

Maybe on Windows, it is not as clear that the console's encoding is also the 
encoding we should assume for files of which we don't know the encoding. But 
then I think it's up to Windows user to submit a patch to do something more 
sensible.

I'm now going to submit a patch that makes issue1739 skip on non-UTF-8 
locales. This should be safe because darcs's behavior on non-UTF-8 locales is 
tested in the utf8.sh test (it will skip on pretty much all stock Linux 
systems because it requires an ISO-8859-15 locale to available, but at least 
it runs and still passes on my laptop :)).

Reinier
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20100405/15dd52b0/attachment.pgp>