XML format (was: Re: [darcs-users] GUI)
Thomas Zander
zander at kde.org
Thu Dec 16 17:29:45 UTC 2004
Hi Alexander,
On Thursday 16 December 2004 17:38, Alexander Staubo wrote:
> If we are going to settle on a flexible XML format, let's leverage the
> structural and semantic capabilities of XML! :)
...
> <whatsnew>
> <patch file="foo/bar">
> <hunk line="5">
> <delete>Lorem ipsum dolor "sit</delete>
> <add>amet, consectetuer adipiscing elit</add>
> </hunk>
> </patch>
> </whatsnew>
Much better then mine :)
You made 1 mistake, though. The </delete> needs to be on the next line
since this way you forgot all the line endings. So as long as darcs is
per-line based, you need to add the exact \n and/or \r in the xml.
> The Darcs format also has no concept of encodings, which is a typical
> Unix problem
'problem' is a bit overstated IMO. The fact that everthing will be parsed
and seen as latin1 avoids many conversions which end up being unneeded
anyway (since darcs does not actually_use_ the text).
This will surely be different if darcs chooses to not be line based anymore.
More below.
> with XML this problem doesn't automagically go away, but
> becomes easier to deal with as encoding metadata can be preserved and
> described in a standard manner; if my document begins,
>
> <?xml version="1.0" encoding="utf-8">
>
> then your parser had damn well better treat the document as UTF-8.
Thats true; but assumes that darcs knows how to read the users provided
file correctly. i.e. knows the encoding of each file in the repo. Which is
a problem that has not yet been addressed.
Until that has happened, the advantage is null. Then again, it _is_ easy to
assume everything on disk is latin1, and place everything in the XML as
utf8, which delays the conversion until that is implemented in a forward
compatible manner.
> That said, the idea of using XML internally is not without merit. I see
> three advantages:
[snip]
> 2) Character encoding encapsulation is already handled by XML, so as
> long as Darcs is told the encoding of its input (eg., when adding files
> to a repo), the XML framework would handle the rest.
Right, which means that changing the encoding of a file will still allow
old patches to work on later versions, as long as they are 'aware' of the
format-change-patch. Thats aware, not dependent!
Naturally; this point is not specific to XML, standardizing encoding in the
current fileformat (the pending file) has the same advantage.
--
Thomas Zander
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.osuosl.org/pipermail/darcs-users/attachments/20041216/9f07adfa/attachment.pgp
More information about the darcs-users
mailing list