[darcs-users] Re: Whitespace in filenames

John Meacham john at repetae.net
Fri Aug 1 15:59:07 UTC 2003


On Fri, Aug 01, 2003 at 11:38:08AM -0400, David Roundy wrote:
> On Fri, Aug 01, 2003 at 05:15:31PM +0200, Peter Simons wrote:
> > David Roundy writes:
> > 
> >  > Is whitespace in filenames something that people tend to use?
> > 
> > IMHO, the patch format should be able to deal with _any_ character in
> > a file name. Some file systems use Unicode for file names, thus even
> > \0 is a legal value there -- and _will_ occur.
> 
> Hmmmm.  Dealing with multibyte characters in filenames may be a problem.
> But dealing with really strange character shouldn't be, since the only
> problem (that I'm aware of anyways) is the parsing of filenames, and so
> that's only a problem with whitespace.  On the other hand, if the filenames
> are ever converted into C strings (by the haskell library), a \0 will wreak
> havoc, and could lead to weird errors.  On POSIX systems, C strings are
> generally used for filenames, so I imagine that a \0 in a filename *will*
> lead to errors.  The wrong file would be written to, which would cause
> really hard to track down errors.  :(
> 
> Currently there's nothing preventing null characters from being used in
> filenames, but it certainly scares me.  At the least, I would think that a
> special flag should be required in order to add such a file, as is
> currently the case with file names that differ only in case.  It would be
> nice to not have people accidentally create a repository that cannot be
> used on a different platform or filesystem.  It would be nice to be able to
> move a repository from one file system to another without it becoming
> invalid.
> 
> In short, it seems (upon reflection) that in general it would be better to
> restrict filenames to contain only those characters that are valid on
> *every* filesystem, except that we don't want to cause problems using darcs
> with already existing projects that might use such characters.

a null although technically possible is unlikely, mainly due to the
reasons you give about C libraries. however control characters and
multibyte ones are relativly common. 

The appropriate thing to do is to not take the intersection of features
but the union. we don't want to restrict everyone to 8.3 filenames
because someone might want to pull to a FAT filesystem at some point. if
someone puts backslashes in their filenames, then darcs shouldnt care,
but it won't let you check out that file on a windows machine.
presumably if you are putting backslashes in filenames then you are
doing so for good reason and don't expect your repository to be used on
a windows machine. 

darcs should accept anything passed to it. at most it should warn you on
mv and add
that 'this file contains backslashes and might not be compatable with
some filesystems' or 'this file has the same name as this other file
except for case, it may cause problems on some systems'. these
constraints should only be checked on 'pull' against the constrations of
the filesystem/os you are pulling to. 
        John

-- 
---------------------------------------------------------------------------
John Meacham - California Institute of Technology, Alum. - john at foo.net
---------------------------------------------------------------------------




More information about the darcs-users mailing list