[darcs-users] Interop with Darcs
Reinier Lamers
tux_rocker at reinier.de
Fri Jun 5 10:53:52 UTC 2009
Hi all,
Thursday 04 June 2009 02:53:46 Trent W. Buck wrote:
> Gwern Branwen <gwern0 at gmail.com> writes:
> > There isn't any schema I know of. You really just have to parse it
> > kind of ad-hoc.
>
> And as we've seen in the Darcs repo, input isn't recoded into UTF-8, so
> in *one output document* from changes --xml you can have ISO 8859-1
> bytes, UTF-8 bytes, and JIS bytes. Which basically means it's not XML :-(
But the contents of files in the repo are not text, they are bytes (also for
text files, which are managed at lines of bytes delimited by a newline). How
should we deal with that in XML?
A quick Google search turns up the suggestion to either use base64 or store
the binary data outside the XML and make the XML refer to it. Both of those
seem really bad for readability.
Perhaps we can use quoted-printable encoding(*) inside the XML? It sounds
somewhat Frankensteinian, but we may have code for that lying around already,
and it encodes the non-ascii bytes while keeping the result readable as text.
In fact, Google returns results about using quoted-printable in XML, so it's
not that weird an idea.
Regards,
Reinier
(*): quoted-printable encoding is what is used for e-mail text in encodings
other than ASCII. It preserves most ASCII characters, but escapes non-ASCII
bytes.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20090605/1d88ca09/attachment-0001.pgp>
More information about the darcs-users
mailing list