[darcs-users] Re: Whitespace in filenames

David Roundy droundy at abridgegame.org
Sat Aug 2 10:12:59 UTC 2003


On Fri, Aug 01, 2003 at 08:59:07AM -0700, John Meacham wrote:
> a null although technically possible is unlikely, mainly due to the
> reasons you give about C libraries. however control characters and
> multibyte ones are relativly common. 
> 
> The appropriate thing to do is to not take the intersection of features
> but the union. ...

Ok, I'm convinced.

> darcs should accept anything passed to it. at most it should warn you on
> mv and add that 'this file contains backslashes and might not be
> compatable with some filesystems' or 'this file has the same name as this
> other file except for case, it may cause problems on some systems'. these
> constraints should only be checked on 'pull' against the constrations of
> the filesystem/os you are pulling to.

Well, there's a problem in that I don't know that it's possible or
practical to check against the conventions of the filesystem you're pulling
to.  Among other things, the repository might span more than one
filesystem, each of which has different filename restrictions.  And on the
principle of letting users do whatever they want, that should be ok.  So I
don't see myself trying to figure out what the filename restrictions of a
given filesystem are to check on pull.  Patches to do this would be
welcome, but it doesn't interest me, and I don't see how it can guarantee
that the patch will apply properly (that is, that it will write to the
desired file).

The other problem is that for a repository to be portable, no file can
*ever* have had a filename portability problem.  Normally when a project
has a filename policy, I would think that they would enforce it by renaming
any files that violate it, but this wouldn't be enough to acheive
portability.  The patches would need to be unrecorded, which, for example,
the darcs-patcher won't do--mostly to avoid race conditions with people
pulling while you unrecord, but also because I think unrecording publicly
available patches is bad policy.  Anyhow, the point is recording a
"non-portable" patch is a very serious issue--if it goes unnoticed, and
later the repo is intended to be used on another platform, there are only
two options: either start a new repository or manual excise all mention of
the guilty file from the repository history.  If it's a directory, it's
even worse.  I had to do this myself with darcs, when I had two files named
diff.lhs and Diff.lhs (you can still see, like fossils, mention of diff.lhs
in Makefiles of old patches).

So the point is that I'll not just warn the users when they try to add
weird filenames, but will disallow it unless they use an override flag (as
currently --case-ok overrides the case-insensitive filename check).

As far as timeframe and how to implement arbitrary filenames, I think I
have a good plan, which should fix the space issue and the unicode filename
issue at one go.  The idea is to fix these problems when I switch Patches
from using Strings to hold their filenames to using PackedStrings.  When I
do this, I'll have to add conversion functions all over the place anyways,
and I'll just use packFilePath and unpackFilePath, rather than packString
and unpackPS, where (un)packFilePath will convert from(to) unicode to(from)
utf8 and also will escape whitespace into nonwhitespace (e.g. ' ' -> "\\s"
or something like that).  This will mean I don't have to change my parsing
routines, which assume filenames are delimited by whitespace.
-- 
David Roundy
http://www.abridgegame.org




More information about the darcs-users mailing list