[darcs-users] petition for '\0' to be removed from binary auto-detection code

David Roundy droundy at abridgegame.org
Tue Nov 16 11:47:40 UTC 2004


On Mon, Nov 15, 2004 at 06:06:21PM +0000, Mark Stosberg wrote:
> Hello,
> 
> While I like the idea of auto-detecting binary files, I realized that
> '\0' (aka NUL) is not a good test. 
> 
> It it sometimes used (at least) in Perl to put a bunch of things into a
> string that you may want to separate back out later. The character is
> used precisely because it doesn't occur in text.
> 
> In particular, it's still used in the modern "CGI.pm" library, to provide
> compatibility with the ancient 'cgi-lib.pl' library. 

Sounds like a reasonable argument to me.  The only trouble is that this
pretty well guts the check for binary files, since we currently only check
for '\0' and '\26' (EOF).  And I imagine that it is usually the '\0' check
that correctly identifies binary files.

> I tried to create this patch myself, but I couldn't figure out where
> this logic was located. :) 

It's in fpstring.c, actually written in C for blinding speed (well,
blinding may be an overstatement...).

Another option would be to add a set of regexps that indicate files that
are *always* text.  This would be an ugly option, but might be used to keep
\0 as a binary test, but special-case .pl files out of getting checked.
-- 
David Roundy
http://www.darcs.net




More information about the darcs-users mailing list