[darcs-users] darcs and non-ASCII characters

Juliusz Chroboczek Juliusz.Chroboczek at pps.jussieu.fr
Thu Oct 27 17:18:21 UTC 2005


> my understanding is that the codepage for a filesystem on Windows
> determines the actual encoding of the filenames

I may be mistaken, but it was my understanding that both NTFS and VFAT
``long filenames'' use UTF-16.

(The ``short filenames'' hack has the behaviour you describe, though.)

> I'd say that unix systems do it right

{-# OPTIONS -fadvocacy #-}

I'd argue that both models (Unix's and Windows') are broken.

There are two properties that you want to have:

  (1) media is interchangeable between systems (and therefore locales);
  (2) the user-visible name of a given file doesn't depend on the locale.

Windows breaks property (1) (a file written in a Polish locale might
not be accessible under French Windows, which used to be a serious
problem when people sent me floppies from Poland), while Unix breaks
property (2) (a file called ``e-circumflex'' under a French locale
will be called ``e-ogonek'' under a Polish one).

If you want to reconcile the two properties, some third property has
to give.  As far as I can see, this third property is the existence of
multiple distinct character encodings.  So you need to move to a model
where all locales use the same character encoding.

This is what we're doing, by switching to UTF-8 locales under Unix,
and deprecating the fooA functions under Windows.

                                        Juliusz




More information about the darcs-users mailing list