[darcs-users] Escaping of hunks and file names

David Roundy droundy at abridgegame.org
Sun Nov 7 12:46:45 UTC 2004


On Fri, Nov 05, 2004 at 07:08:14AM +0100, Alexander Staubo wrote:
> Is there a reason why Darcs escapes file names and text differently?

First off, darcs only escapes either when it sees that it's outputting to a
terminal, so it shouldn't affect scripting.

File names are treated differently than file lines, because lines can be
treated just as a sequence of bytes.  File names, in the haskell standard
libraries, are treated as sequences of unicode characters.  Darcs follows
this convention, and always encodes them as UTF-8.

Sometimes I think this was a mistake... I think that we should view
filenames as being just a sequence of bytes, but this would mean that if we
wanted forward-compatibility with haskell compilers we'd have to forgo
using the haskell IO libraries.

In any case, it's pretty much irrelevant now, since I'm pretty sure there
are people out there with their file names in their repositories encoded as
UTF-8 instead of raw octets, and it's not really worth a repository format
transition.

> There is absolutely no need to escape anything in XML except "<", ">" 
> and "&", and the escaping pollutes the format.

On the XML formatting, I defer to others, who know about such things...

> I'm surprised that Darcs even escapes file names. However, in many 
> places it *doesn't* escape anything:

I'd say (and many people would complain) that most darcs commands are
*primarily* intended to be parsed by humans.  If you have crazy files with
spaces in them, you need to be careful.  It's still parseable, since darcs
formats things "predictably", it's just more of a pain.  But unless
characters are non-printable, I don't see any reason to escape them.

Ideally, scripts should use the xml output, which *is* intended to be
parsed by scripts.
-- 
David Roundy
http://www.darcs.net




More information about the darcs-users mailing list