[darcs-users] Interop with Darcs

Reinier Lamers tux_rocker at reinier.de
Fri Jun 5 10:53:52 UTC 2009


Hi all,

Thursday 04 June 2009 02:53:46 Trent W. Buck wrote:
> Gwern Branwen <gwern0 at gmail.com> writes:
> > There isn't any schema I know of. You really just have to parse it
> > kind of ad-hoc.
>
> And as we've seen in the Darcs repo, input isn't recoded into UTF-8, so
> in *one output document* from changes --xml you can have ISO 8859-1
> bytes, UTF-8 bytes, and JIS bytes.  Which basically means it's not XML :-(

But the contents of files in the repo are not text, they are bytes (also for 
text files, which are managed at lines of bytes delimited by a newline). How 
should we deal with that in XML?

A quick Google search turns up the suggestion to either use base64 or store 
the binary data outside the XML and make the XML refer to it. Both of those 
seem really bad for readability.

Perhaps we can use quoted-printable encoding(*) inside the XML? It sounds 
somewhat Frankensteinian, but we may have code for that lying around already, 
and it encodes the non-ascii bytes while keeping the result readable as text. 
In fact, Google returns results about using quoted-printable in XML, so it's 
not that weird an idea.

Regards,
Reinier
(*): quoted-printable encoding is what is used for e-mail text in encodings 
other than ASCII. It preserves most ASCII characters, but escapes non-ASCII 
bytes.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20090605/1d88ca09/attachment-0001.pgp>


More information about the darcs-users mailing list