[darcs-users] darcs.cgi 1.0.1 and character set: converting iso-8859-1 to utf-8 for old log entries and patch names?

Matthias Andree matthias.andree at gmx.de
Tue Jan 11 16:35:36 UTC 2005


On Tue, 11 Jan 2005, Matthias Andree wrote:

> Whenever the CGI script steps on these characters, apache2 logs errors
> like these (sorry - long lines). I'm not sure who exactly complains, if
> it's Perl, some Perl module, darcs or whoever.

Turns out it's xsltproc that complains.

I have hacked darcs.cgi to assume every line that is not valid UTF-8 to
be in ISO-8859-1 encoding and convert it to UTF-8. It's nothing more
than a hack, and I do not suggest this patch for inclusion as this
treats every text the same, including patch names. It is my personal
stop-gap until I know how I can convert the logs to UTF-8 after the
fact.

Besides that, someone else might want to convert from ISO-8859-2 or
ISO-8859-7 instead. Instead, the cvs2darcs stuff should be fixed -
that's the right place to convert $ANYTHING to UTF-8, if that's what
darcs should use.

Perl-5.8 only, no warranties except "appears to work for me":

Tue Jan 11 16:51:11 CET 2005  Matthias Andree <matthias.andree at gmx.de>
  * Automatically assume XML non-UTF-8 strings are ISO-8859-1 and convert them.
diff -rN -u darcs-old/cgi/darcs.cgi.in darcs-new/cgi/darcs.cgi.in
--- darcs-old/cgi/darcs.cgi.in	2005-01-11 17:30:14.000000000 +0100
+++ darcs-new/cgi/darcs.cgi.in	2005-01-11 16:49:02.000000000 +0100
@@ -31,6 +31,8 @@
 
 use strict;
 
+use utf8;
+use Unicode::String;
 use CGI qw( :standard );
 use CGI::Util;
 use File::Basename;
@@ -113,6 +115,9 @@
 
     seek ($xml, 0, 0);
     while (<$xml>) {
+      if (!utf8::is_utf8($_)) {
+        $_ = Unicode::String::latin1($_)->utf8();
+      }
       print $pipe $_;
     }
 }





More information about the darcs-users mailing list