[darcs-users] Re: XML format

Einar Karttunen ekarttun at cs.helsinki.fi
Sun Dec 19 11:51:50 UTC 2004


Alexander Staubo <alex at byzantine.no> writes:
> Indeed. As a rule, one should never manually parse XML; one should 
> always rely on a stable, well-tested, well-known parser such as Expat. 
> There's so much fine print in the spec and so much to get wrong.

But one of the strengths of darcs is that it needs very few
dependencies. Having darcs depend on expat or sablotron would 
at least make my life harder..

If we just want a simple serialization format then the trip 
to xml might not be worth it. Also we would need to either
base64 encode all data or think what to do about illegal utf-8 
sequences.

> Because writing parsers is a waste of time.

Actually it is quite fast with the right tools.

> XML exists to describe these structures in a neutral, portable way. The 
> wealth of parsers, validators and transformation tools provide a 
> framework that allows you to get up and running very, very quickly.

Managing the library dependency and testing that it works on 
all supported systems takes time too.. And xml is by no means
very good for supporting octet-strings.

> After spending several hours on this, I'm going back to writing a parser 
> by hand, which will waste another couple of hours. In the end, all I 
> want is a programmatically navigable model -- something I can traverse 
> and pull information from -- which is exactly what XML is.

Wouldn't writing a library interface to Darcs be the best and most clean
solution to this all?


> That said, the idea of using XML internally is not without merit. I see 
> three advantages:
>
> 1) Code reusability; ie. the same code is used for the internal patch 
> format as for the output of certain commands.

A library interface would accomplish the same and touching the Darcs
repo would still need to go through darcs for locking reasons.

> 2) Character encoding encapsulation is already handled by XML, so as 
> long as Darcs is told the encoding of its input (eg., when adding files 
> to a repo), the XML framework would handle the rest.

Should we really handle encodings inside the repo? This would make
things more complex with surrogates, conversions losing precision etc.

- Einar Karttunen




More information about the darcs-users mailing list