[darcs-users] darcs patch: Re-encode the input from Haskeline as UTF8.

Judah Jacobson judah.jacobson at gmail.com
Sun Dec 28 02:09:14 UTC 2008


On Sat, Dec 27, 2008 at 8:53 AM, Eric Kow <kowey at darcs.net> wrote:
> Hi Judah,
>
> On Sat, Dec 27, 2008 at 08:37:01 -0800, Judah Jacobson wrote:
>> The following patch fixes Unicode input on POSIX systems with the
>> Haskeline backend.  Haskeline returns decoded Chars, but Darcs expects
>> the input to be encoded (which is the behavior of the non-Haskeline
>> backend).  This fix re-encodes the input received from Haskeline into
>> UTF-8.
>
> Thanks for the explanation, but I think I'm going to need a little more
> help understanding this patch.  What do you mean by "Darcs expects the
> input to be encoded"?  Is it because we use getLine which just assumes
> that the input encoding is ISO-8859-1 (or actually, as I understand it,
> the first 256 code points in the Unicode table, which happens to be the
> same)?

Yes, that's right.  For example:

Prelude> getLine
[user input:]α
"\206\177"
Prelude> :m +System.Console.Haskeline
Prelude System.Console.Haskeline> runInputT defaultSettings $ getInputLine ""
[user input:]α
Just "\945"

Haskeline decodes the two bytes into one Char, but Darcs expects to
receive the encoded string since it uses standard functions like
writeFile, putStr, etc. which ignore all but the last 8 bits of each
Char input.

> Also, what if the user is not using UTF-8 (ugh!) in their terminal?

Haskeline currently can't handle encodings besides ASCII or UTF-8 on
POSIX systems.  (I'm hoping to fix that soon.)  How serious of a
concern is this?

-Judah


More information about the darcs-users mailing list