[darcs-users] Re: windows

Peter Simons simons at cryp.to
Mon Jul 7 18:52:56 UTC 2003


David Roundy writes:

 > Well this at least wouldn't be a problem (I think...) in the
 > conversion case as I envisioned it, since whenever darcs *reads* a
 > file it is ambivalent as to what type of endings it uses.

Deciding on the end-of-line marker on-the-fly is sometimes ambiguous.
For instance, what should be the result when interpreting a file like
this:

 | line 1\n
 | line 2\r\n
 | line 3\n

Is it [ "line 1\nline 2", "line 3\n" ]?
Or is it [ "line 1", "line 2\r", "line 3" ]?
Or even [ "line 1", "line 2", "line 3" ]?
Or maybe a syntax error?

Another problem is that file formats might depend on the end-of-line
marker. RFC messages (e-mail, etc.), for example, _must_ use '\r\n'.
Any conversion would "destroy" the file's integrity.

If you want to support "both" variants (and I put the "both" in quotes
because technically, there is no limit to what systems might use for
end-of-line; some more weird systems do, in fact, use '\r'), the only
way to go is to add a "Content-Encoding:"-type of header to the patch
format, to denote how it is encoded.

Once you start thinking about this, though, you might easily end up
distinguishing between different character sets as well! And the
second you do that, someone will ask about Unicode's multi-byte
characters -- and then Pandorra's box is wide open. :-)

Peter





More information about the darcs-users mailing list