[darcs-users] blue color bug

Peter "Firefly" Lund firefly at diku.dk
Sat Jul 3 01:05:28 UTC 2004

On Fri, 2 Jul 2004, Tommy Pettersson wrote:

> darcs 0.9.22.  In whatsnew output, if a hunk contains the
> first line of a file, chars above 127 (8:th bit set) in this
> particular hunk are not printed in current locale, but are
> printed as \xx and in bold blue.

[here follows some ramblings that try to lay out what the options are and
 which probably ignores what Tommy tried to report]

Hmmm, what /should/ darcs display?

It can't blindly assume that those bytes are part of an ISO 8859-1 message
(which is probably what they are in your case and definitely what they
would be in my case).  It can't blindly assume they are part of some UTF8
either (as would probably be the case if you were a Red Hat user), can it?

Should it just blindly send them to stdout?  As long as they are not lo/hi
control codes, things won't go /terribly/ wrong, it'll just look weird and
a few chars will be garbled.

Should it actively try switch into/out of UTF8 with an escape code first?

Most of my stuff is either docs in English or source code (with comments
in English) so I wouldn't run into this problem.  If, however, I used
darcs for a document in Danish, I would.

As a user, I would then want darcs to treat the three extra Danish letters
just like the remaining 26 letters we use in Danish instead of as coloured
computer codes.  I would probably decide on a project-by-project basis
whether I would use 8859-1 or UTF8 for my documents. I expect 8859-1 text
documents to be a dying breed, btw.

I think these are the practical options in order of ease of implementation:

1) current behaviour
2) blind output
3) pretend everything is in UTF8 and convert to current locale
4) pretend everything is in 8859-1 and convert to current locale
5) have a per-repository setting for encoding/charset and convert to
   current locale.

There are other options, of course: assume (but check!) that everything is
in UTF8 (without overlong sequences), use the new Linux ioctl to ask the
terminal whether it is in UTF8 mode or not, use a simple VT100 sequence to
autodetect to whether the terminal is in UTF8 mode or not (I have an
implementation of this I can dig up if somebody is interested), etc.

In no case do I want to do any conversion of actual file/patch content.
This is entirely for output purposes!

ad 2) blind output is bad with binary files and some of the more insane
      Chinese/Japanese encodings (some of them use escape codes).
      /Some/ filtering/escaping seems to be necessary.
ad 3-5) can mostly be taken care of with iconv (which can handle pretty
      much any whacky encoding/charset known to Man).
ad 5) is probably the Right thing to do.  It's just that Gabriel's "Worse
      is Better" essay keeps echoing in my mind for some reason ;)


More information about the darcs-users mailing list