[darcs-users] blue color bug

Alex Shinn foof at synthcode.com
Mon Jul 5 06:21:45 UTC 2004


At 05 Jul 2004 08:03:45 +0200, Juliusz Chroboczek wrote:
> 
> > 1) current behaviour
> > 2) blind output
> > 3) pretend everything is in UTF8 and convert to current locale
> > 4) pretend everything is in 8859-1 and convert to current locale
> > 5) have a per-repository setting for encoding/charset and convert to
> >    current locale.
> 
> 6. If it's in [0x20..0x7E] ++ [0x80..0xFF], it's text, otherwise, it's
> binary.
> 
> The above test will correctly detect text in all Unix locales.  It
> will generate some false positives; however, as true binary data
> usually contains 0x00, I'd expect the false positives to be rather
> rare.

This is a very good idea to determine whether or not the data is text,
but if it is you still need to handle text encoding so it's more of a
pre-step to the other options.

> 7. Try to determine if it's text in the current locale.  If it's not,
> treat it as binary.
> 
> If the user is working with a repository in UTF-8 while living in an
> ISO-2022-JP locale, he's got other problems.

Projects will very frequently have mixed encodings, especially if they
are internationalized (e.g. generally every .po file will be in a
different encoding that has nothing to do with the user's locale).
Also right now many projects will be in a transitioning phase from a
native encoding to UTF-8, or have any other number of valid reasons
for working with multiple encodings.

-- 
Alex




More information about the darcs-users mailing list