[darcs-users] darcs.cgi 1.0.1 and character set: converting iso-8859-1 to utf-8 for old log entries and patch names?
Matthias Andree
matthias.andree at gmx.de
Tue Jan 11 16:35:36 UTC 2005
On Tue, 11 Jan 2005, Matthias Andree wrote:
> Whenever the CGI script steps on these characters, apache2 logs errors
> like these (sorry - long lines). I'm not sure who exactly complains, if
> it's Perl, some Perl module, darcs or whoever.
Turns out it's xsltproc that complains.
I have hacked darcs.cgi to assume every line that is not valid UTF-8 to
be in ISO-8859-1 encoding and convert it to UTF-8. It's nothing more
than a hack, and I do not suggest this patch for inclusion as this
treats every text the same, including patch names. It is my personal
stop-gap until I know how I can convert the logs to UTF-8 after the
fact.
Besides that, someone else might want to convert from ISO-8859-2 or
ISO-8859-7 instead. Instead, the cvs2darcs stuff should be fixed -
that's the right place to convert $ANYTHING to UTF-8, if that's what
darcs should use.
Perl-5.8 only, no warranties except "appears to work for me":
Tue Jan 11 16:51:11 CET 2005 Matthias Andree <matthias.andree at gmx.de>
* Automatically assume XML non-UTF-8 strings are ISO-8859-1 and convert them.
diff -rN -u darcs-old/cgi/darcs.cgi.in darcs-new/cgi/darcs.cgi.in
--- darcs-old/cgi/darcs.cgi.in 2005-01-11 17:30:14.000000000 +0100
+++ darcs-new/cgi/darcs.cgi.in 2005-01-11 16:49:02.000000000 +0100
@@ -31,6 +31,8 @@
use strict;
+use utf8;
+use Unicode::String;
use CGI qw( :standard );
use CGI::Util;
use File::Basename;
@@ -113,6 +115,9 @@
seek ($xml, 0, 0);
while (<$xml>) {
+ if (!utf8::is_utf8($_)) {
+ $_ = Unicode::String::latin1($_)->utf8();
+ }
print $pipe $_;
}
}
More information about the darcs-users
mailing list