[darcs-users] haskeline and character encodings

Trent W. Buck trentbuck at gmail.com
Fri Feb 6 02:15:42 UTC 2009


Judah Jacobson <judah.jacobson at gmail.com> writes:

> On Thu, Feb 5, 2009 at 9:53 AM, Eric Kow <kowey at darcs.net> wrote:
>> Hi Judah,
>>
>> Just a quick question about haskeline.  I notice it uses iconv (and if I
>> remember correctly, gives us UTF-8 characters).
>>
>> Any implications for our current character encoding woes?
>>  http://bugs.darcs.net/issue64
>
> The way that Haskeline is currently used in Darcs should not change
> its encoding behavior at all.  (In fact, Darcs still gets 8-bit Chars
> in whatever encoding the console is set to.)

If you want to fix this, I won't object ;-)

> However, Haskeline does provide extra functionality to help with
> encoding/decoding:
>
>  - All the functions from System.Console.Haskeline (for both input and
> output) automatically convert the console's encoding to/from decoded
> (i.e, 32-bit) Unicode Chars.

By "Chars" there I guess you mean codepoints?  I don't think Unicode
uses the term "character" anywhere.

> So for console input and printing, Haskeline can provide what you
> need, although its behavior when printing unencodable Chars might need
> to be improved slightly.  (Currently, it drops them from the output.)

I would prefer it to follow iconv(1) and either throw an error or
attempt to transliterate.  As a comparison, mecurial (a competing VCS)
converts all unencodable codepoints to the question mark.

    $ echo 'μ≥ü' | iconv --to ascii//translit
    ?>=u
    $ echo 'μ≥ü' | iconv --to ascii
    iconv: illegal input sequence at position 0

I would certainly prefer it to print a question mark or some other
placeholder, instead of simply discarding the codepoints.



More information about the darcs-users mailing list