[darcs-users] haskeline and character encodings

Judah Jacobson judah.jacobson at gmail.com
Mon Feb 9 01:15:53 UTC 2009


On Sun, Feb 8, 2009 at 2:02 PM, Eric Kow <kowey at darcs.net> wrote:
> Thanks for your answer, Judah
>
>> The way that Haskeline is currently used in Darcs should not change
>> its encoding behavior at all.  (In fact, Darcs still gets 8-bit Chars
>> in whatever encoding the console is set to.)
>
> So my impression is that Haskeline implicitly decodes (from the
> console's locale) and then explicitly re-encodes it to emulate darcs's
> behaviour of being 100% ignorant about character encodings.

Right.

> Just one last fit of paranoia: are there any corner cases in which this
> decode-and-re-encode process could go wrong (i.e. any more wrong than
> being completely ignorant about encodings)?  For example, what if my
> locale is mis-set? Can that affect anything?  What if I'm passing user
> input to darcs from a file or something?

Haskeline will drop any characters it can't encode or decode.  (In the
next minor release I'll make it convert them to '?'.)  For entry from
the console, this isn't really an issue since the user will be able to
notice any missed characters while entering the line.

If input is piped in from a file with a different encoding than the
locale, it's true that some characters could be dropped or converted
to '?'.  But this sort of issue can already occur with the
encodings-ignorant backend (for example, if the file is saved as
UTF-16).  It seems beyond the scope of what Haskeline should be taking
care of.

> I'm guessing it should be ok, just want to make sure we give this a good vigorous shake... :-)

I'm grateful for all of the poking and shaking that integration into
Darcs has given Haskeline; it's made the library much more useful and
robust!

-Judah


More information about the darcs-users mailing list