[darcs-users] Escaping of hunks and file names

David Roundy droundy at abridgegame.org
Sun Nov 7 22:42:27 UTC 2004

On Sun, Nov 07, 2004 at 09:01:11PM +0100, Alexander Staubo wrote:
> David Roundy wrote:
> >On Fri, Nov 05, 2004 at 07:08:14AM +0100, Alexander Staubo wrote:
> >
> >>Is there a reason why Darcs escapes file names and text differently?
> >
> >First off, darcs only escapes either when it sees that it's outputting to a
> >terminal, so it shouldn't affect scripting.
> By terminal, do you mean that this:
> $ darcs what >foo
> should not escape the output to foo?

Right, and darcs whats | less also shouldn't be escaped, except that white
space is always escaped, since otherwise the parsing of patches would be
made more complicated.

> Not what's happening here. Darcs 1) escapes text and file names, and, 
> oddly enough, 2) colourizes the diffs (though it only colourizes the 
> "addfile", etc. keywords when outputting to the terminal):
> $ python
> Python 2.3.4 (#2, Sep 24 2004, 08:39:09)
> [GCC 3.3.4 (Debian 1:3.3.4-12)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import os
> >>> inp, out = os.popen2("darcs what")
> >>> out.read()
> '{\naddfile ./with\\32\\space.txt\nhunk ./with\\32\\space.txt 
> 1\n+Hell\x1b[01;34m\\f8\x1b[00m\x1b[01;34m\\f8\x1b[00m!\n}\n'
> As you can see, those are "ANSI" escape codes in there.
> The above output is for a changeset with a file called "with space.txt" 
> containing the string "helløø". I should think that the above 
> Python-invoked command does not constitute "outputting to a terminal"?
> I know next to nothing about Unix terminal emulation, so forgive me if 
> this is the expected behaviour. I hadn't noticed the colourization 
> before, though.

No, this isn't obvious.  Oddly enough, it seems that perl does the same
thing.  I don't know what the haskell standard library "isTerminal"
function checks, but apparently when these languages call external
programs, they somehow are able to trick haskell into thinking it's in a
terminal.  :(

The ansi escape color codes are the easy way to tell if darcs thinks it's
in a terminal.

> Outputting file names as UTF-8 is fine. However, why is Darcs escaping 
> the UTF-8, and in such a non-standard (\yy\) format?

Only whitespace (and backslashes) are escaped in that format, and the
stupid format is because that is what I came up with when I was coding this
ages back.  Technically, only spaces and newlines actually need to be
escaped, since they would mess up darcs' parsing of patches--tabs and
carriage returns aren't used in darcs patch format as delimiters.

Basically, I didn't put much thought into it, since at the time I was
thinking it wouldn't often come into play, since I consider white space in
filenames a bad idea, and backslashes in filenames also don't greatly
enhance the portability of your code.

> However, XML handles unescaped Unicode (or UTF-8) just fine, as long as 
> you declare the appropriate encoding at the beginning, eg. <?xml 
> version='1.0' encoding='utf-8'/>.

We can't really declare the encoding, since we don't know what the encoding
of the user's data is.

> (As an XML user, I note that Darcs' output lacks both the XML header and 
> a DTD doctype declaration -- which means I can't automatically validate 
> it -- and you could do with a namespace declaration as well. I'm 
> inclined to submit a couple of patches for this once I got the hang of 
> this Haskell thang.)

Patches are certainly welcome.  Hopefully someone another developer will
take a look at what you submit to make sure you aren't taking advantage of
my ignorance of XML... :)

> >>I'm surprised that Darcs even escapes file names. However, in many 
> >>places it *doesn't* escape anything:
> >
> >I'd say (and many people would complain) that most darcs commands are
> >*primarily* intended to be parsed by humans.  If you have crazy files
> >with spaces in them, you need to be careful.  It's still parseable,
> >since darcs formats things "predictably", it's just more of a pain.  But
> >unless characters are non-printable, I don't see any reason to escape
> >them.
> Based on my previously reported findings, I would say that, strictly
> speaking, Darcs is formatting things predictably, but not consistently,
> and thus, to users and especially script-writing users, seemingly
> unpredictably; Darcs has three ways of outputting stuff: unescaped (as in
> "darcs changes -s"), hex-escaped (as in "darcs whatsnew"), and
> decimal-escaped (as in "darcs whatsnew"), and as a user I know that I'd
> have problems remembering which is happening where.

The hex escaped are how things show up on terminals--it's an attempt to
keep from messing up the terminal configuration by displaying escape
characters (except for color codes that are intentional).  On a terminal,
the hex escaped characters always show up blue...

If darcs isn't in a terminal, it never should escape.

> >Ideally, scripts should use the xml output, which *is* intended to be
> >parsed by scripts.
> Couldn't agree with you more. However, as far as I can see, only "darcs 
> changes" and "darcs "annotate" have XML output.
> Significantly, "darcs whatsnew" does not do XML -- understandable as the 
> patch format might be considered Darcs' *canonical* patch description 
> language, but still damn awkward for scripts. (Try writing a LALR 
> grammar some time for a line-oriented, extremely context-dependent 
> format such Darcs' -- I've got the hang of it now, but the idea of all 
> that state juggling isn't conjuring up any butterflies in *my* belly. :)
> Again, I'm interested in adding this functionality if you agree that 
> it's a good idea. Interestingly, the XML output of "whatsnew" would 
> share elements with the existing XML output of "changes", to the point 
> where you could say they share the same format. "changes" deals with 
> persistent patches, "whatsnew" expresses *potential* patches, so the 
> latter would not have metadata such as author, date, hash or name, but 
> if would have, say, modify_file with a detailed hunk-style diff.

Indeed, I definitely would appreciate patches to add XML output to
whatsnew.  For --summary, this would be easy enough.

> Speaking of output, Darcs also needs improvement when it comes to 
> detecting error conditions. For example, "darcs add a-non-existent-file" 
> will return with exit code 0, as will "darcs add a-file-already-added". 
> A script could perceive the lack of messages as meaning success and 
> everything else meaning error, but it's not exactly robust. One of the 
> things my code needs to do is determine whether a file is recorded in 
> the repository

The problem here is that often one will run

darcs add *

trusting darcs to add only the relevant files.  It's hard to see in this
case exactly what the error code should indicate.  I suppose if *no* files
are added, we could consider that failure.  It would be sufficient if you
were adding a single file, but mightn't that definition of failure cause
trouble when adding several files at once? I guess the answer may be that
careful scripts should add files one at a time? Or maybe we should fail if
any of the files couldn't be added, and figure that when run interactively
the error code will be ignored and the user will just read the message.
David Roundy

More information about the darcs-users mailing list