[darcs-users] darcs and non-ASCII characters

David Roundy droundy at darcs.net
Wed Oct 26 13:25:36 UTC 2005


On Tue, Oct 25, 2005 at 07:39:38PM +0200, Juliusz Chroboczek wrote:
> > Yes, filenames are just byte sequences to darcs.
> ...
> > I also believe that treating filenames as byte sequences is correct.
> 
> It is the right thing to do on Unix systems, as there filenames are
> just byte sequences.  (This is just a statement of fact -- arguably,
> this is a mis-feature, but that's what we're stuck with.)
> 
> This is not necessarily the right thing on Windows systems, where
> filenames are Unicode sequences, and might therefore be presented to
> user-space in different forms depending on the locale[1].  (This is
> just a statement of fact -- arguably, this is a mis-feature, but
> that's what we're stuck with.)

> [1] to the fooA functions; the fooW functions are locale-independent.

Indeed that is the issue.

I'd say that unix systems do it right--at least if you consider multi-user
systems desirable.  But yes, windows is a pain in this way.  On the other
hand, my understanding is that the codepage for a filesystem on Windows
determines the actual encoding of the filenames, so depending on which disk
you're accessing you could get some sort of "illegal filename" error
regardless of whether you use fooA functions or fooW functions.

In any case, darcs' native filename behavior is unixy.  Adding a windowsy
option would be a bit of a pain, but ought to be doable.  In fact, with
just a bit of care it shouldn't be too hard, since all of our writing IO
goes through the WriteableDirectory monad (where one could translate
filenames), and the reading either goes through ReadableDirectory or slurp,
so I don't think this need be too invasive.  It just needs a developer
who's interested in using non-ascii filenames with persons that use
different encodings.
-- 
David Roundy
http://www.darcs.net




More information about the darcs-users mailing list