[darcs-users] haskeline and character encodings
Trent W. Buck
trentbuck at gmail.com
Fri Feb 6 02:15:42 UTC 2009
Judah Jacobson <judah.jacobson at gmail.com> writes:
> On Thu, Feb 5, 2009 at 9:53 AM, Eric Kow <kowey at darcs.net> wrote:
>> Hi Judah,
>> Just a quick question about haskeline. I notice it uses iconv (and if I
>> remember correctly, gives us UTF-8 characters).
>> Any implications for our current character encoding woes?
> The way that Haskeline is currently used in Darcs should not change
> its encoding behavior at all. (In fact, Darcs still gets 8-bit Chars
> in whatever encoding the console is set to.)
If you want to fix this, I won't object ;-)
> However, Haskeline does provide extra functionality to help with
> - All the functions from System.Console.Haskeline (for both input and
> output) automatically convert the console's encoding to/from decoded
> (i.e, 32-bit) Unicode Chars.
By "Chars" there I guess you mean codepoints? I don't think Unicode
uses the term "character" anywhere.
> So for console input and printing, Haskeline can provide what you
> need, although its behavior when printing unencodable Chars might need
> to be improved slightly. (Currently, it drops them from the output.)
I would prefer it to follow iconv(1) and either throw an error or
attempt to transliterate. As a comparison, mecurial (a competing VCS)
converts all unencodable codepoints to the question mark.
$ echo 'μ≥ü' | iconv --to ascii//translit
$ echo 'μ≥ü' | iconv --to ascii
iconv: illegal input sequence at position 0
I would certainly prefer it to print a question mark or some other
placeholder, instead of simply discarding the codepoints.
More information about the darcs-users