XML format (was: Re: [darcs-users] GUI)

Thomas Zander zander at kde.org
Thu Dec 16 17:29:45 UTC 2004


Hi Alexander,

On Thursday 16 December 2004 17:38, Alexander Staubo wrote:
> If we are going to settle on a flexible XML format, let's leverage the
> structural and semantic capabilities of XML! :)
...
> <whatsnew>
>    <patch file="foo/bar">
>      <hunk line="5">
>        <delete>Lorem ipsum dolor &quot;sit</delete>
>        <add>amet, consectetuer adipiscing elit</add>
>      </hunk>
>    </patch>
> </whatsnew>

Much better then mine :)
You made 1 mistake, though.  The </delete> needs to be on the next line 
since this way you forgot all the line endings.  So as long as darcs is 
per-line based, you need to add the exact \n and/or \r in the xml.

> The Darcs format also has no concept of encodings, which is a typical
> Unix problem

'problem' is a bit overstated IMO.  The fact that everthing will be parsed 
and seen as latin1 avoids many conversions which end up being unneeded 
anyway (since darcs does not actually_use_ the text).
This will surely be different if darcs chooses to not be line based anymore. 
More below.

> with XML this problem doesn't automagically go away, but 
> becomes easier to deal with as encoding metadata can be preserved and
> described in a standard manner; if my document begins,
>
>    <?xml version="1.0" encoding="utf-8">
>
> then your parser had damn well better treat the document as UTF-8.

Thats true;  but assumes that darcs knows how to read the users provided 
file correctly. i.e. knows the encoding of each file in the repo.  Which is 
a problem that has not yet been addressed.
Until that has happened, the advantage is null.  Then again, it _is_ easy to 
assume everything on disk is latin1, and place everything in the XML as 
utf8, which delays the conversion until that is implemented in a forward 
compatible manner.

> That said, the idea of using XML internally is not without merit. I see
> three advantages:
[snip]
> 2) Character encoding encapsulation is already handled by XML, so as
> long as Darcs is told the encoding of its input (eg., when adding files
> to a repo), the XML framework would handle the rest.
Right,  which means that changing the encoding of a file will still allow 
old patches to work on later versions,  as long as they are 'aware' of the 
format-change-patch. Thats aware, not dependent!
Naturally;  this point is not specific to XML, standardizing encoding in the 
current fileformat (the pending file) has the same advantage.

-- 
Thomas Zander
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.osuosl.org/pipermail/darcs-users/attachments/20041216/9f07adfa/attachment.pgp 


More information about the darcs-users mailing list