[darcs-users] Escaping of hunks and file names

Alexander Staubo alex at byzantine.no
Mon Nov 8 16:03:33 UTC 2004


David Roundy wrote:
[snip]
>>I know next to nothing about Unix terminal emulation, so forgive me if 
>>this is the expected behaviour. I hadn't noticed the colourization 
>>before, though.
> 
> No, this isn't obvious.  Oddly enough, it seems that perl does the same
> thing.  I don't know what the haskell standard library "isTerminal"
> function checks, but apparently when these languages call external
> programs, they somehow are able to trick haskell into thinking it's in a
> terminal.  :(

Given the lack of general Unix support for Haskell's "isTerminal", I see 
two options here:

1) Figure out how to fix the terminal check. Clearly other programs ("ls 
--color=auto", for example) do this successfully, apparently by using 
isatty(fd). Is it easy to call arbitrary C library functions from 
Haskell? Can you get at the output stream's file descriptor?

2) Add something like a --non-terminal option to all of Darcs' commands, 
allowing one to force the desired behaviour.

> The hex escaped are how things show up on terminals--it's an attempt to
> keep from messing up the terminal configuration by displaying escape
> characters (except for color codes that are intentional).  On a terminal,
> the hex escaped characters always show up blue...
> 
> If darcs isn't in a terminal, it never should escape.

(Btw, possible bug: "darcs annotate" does not do the to-terminal hex 
escaping, ever.)

>>Outputting file names as UTF-8 is fine. However, why is Darcs escaping 
>>the UTF-8, and in such a non-standard (\yy\) format?
> 
> Only whitespace (and backslashes) are escaped in that format, and the
> stupid format is because that is what I came up with when I was coding this
> ages back.  Technically, only spaces and newlines actually need to be
> escaped, since they would mess up darcs' parsing of patches--tabs and
> carriage returns aren't used in darcs patch format as delimiters.
> 
> Basically, I didn't put much thought into it, since at the time I was
> thinking it wouldn't often come into play, since I consider white space in
> filenames a bad idea, and backslashes in filenames also don't greatly
> enhance the portability of your code.

Would you be willing, at this stage, to move to a more Unixy escaping 
syntax? The principle of least surprise etc. When people all over start 
writing scripts, it's going to be one of Darcs "little warts", I think, 
that people complain about.

>>However, XML handles unescaped Unicode (or UTF-8) just fine, as long as 
>>you declare the appropriate encoding at the beginning, eg. <?xml 
>>version='1.0' encoding='utf-8'/>.
> 
> We can't really declare the encoding, since we don't know what the encoding
> of the user's data is.

The default encoding in XML is UTF-8. So whether or not you declare it, 
you must still adhere to a specific encoding.

For file names, enforcing UTF-8 -- and therefore pretty much outputting 
them verbatim -- might not be such a bad idea.

For actual file data, the best way to do this, I think, is to escape 
everything above 127 as character references, eg. &#128;. I think you 
can safely output everything below verbatim. But you can't output all 
characters as-is because certain combinations can be construed as UTF 
control sequences even when they aren't.

>>Speaking of output, Darcs also needs improvement when it comes to 
>>detecting error conditions. For example, "darcs add a-non-existent-file" 
>>will return with exit code 0, as will "darcs add a-file-already-added". 
>>A script could perceive the lack of messages as meaning success and 
>>everything else meaning error, but it's not exactly robust. One of the 
>>things my code needs to do is determine whether a file is recorded in 
>>the repository
> 
> The problem here is that often one will run
> 
> darcs add *
> 
> trusting darcs to add only the relevant files.  It's hard to see in this
> case exactly what the error code should indicate.  I suppose if *no* files
> are added, we could consider that failure.  It would be sufficient if you
> were adding a single file, but mightn't that definition of failure cause
> trouble when adding several files at once? I guess the answer may be that
> careful scripts should add files one at a time? Or maybe we should fail if
> any of the files couldn't be added, and figure that when run interactively
> the error code will be ignored and the user will just read the message.

The problem really is that 1) you have a composite command that 
continues regardless of sub-failures, and 2) you want to capture the 
aggregate status in a single numeric result value. The only completely 
sane solution is to output the individual results, perhaps CVS-style:

$ darcs add *
? myboringfile
A newfile
R alreadyadded

or whatever, and then encapsulate the outcome in a three-state value: 0 
(everything added, perhaps boring files ignored), 1 (some added, perhaps 
some failed) or 2 (all failed).

Alexander.




More information about the darcs-users mailing list