[darcs-users] darcs and non-ASCII characters

David Roundy droundy at darcs.net
Tue Oct 25 12:52:46 UTC 2005


On Sat, Oct 22, 2005 at 07:22:07PM +0200, Wolfgang Jeltsch wrote:
> I suppose that darcs stores the byte sequence which makes up a filename
> verbatim, not taking any encodings into account.  In other words, for
> darcs, filenames seem to be byte sequences instead of character
> sequences.  The question is if this is a good behavior.  At least, it
> avoids problems if, for example, a Makefile refers to a file.  If
> filenames would be treated as character sequences, the underlying byte
> sequence would change if a different encoding is used.  But the byte
> sequence in the Makefile won't change so the Makefile won't work
> correctly anymore.
> 
> Is it really true that filenames are just byte sequences for darcs and no
> character encodings are taken into account when storing and retrieving
> filenames?  Or are filenames treated as character sequences?  Or are they
> treated non-uniformly which would mean that darcs is buggy at this point
> and one should avoid filenames with non-ASCII characters?

Yes, filenames are just byte sequences to darcs.  Due to some foolishness
on my part long ago, those byte sequences are stored as a utf-8
representation of the original byte sequence, interpreted as latin1.  This
doesn't change the fact that it's just a byte sequence, but complicates
parsing and printing of patches.

Perhaps with the new repo format code we can arrange a pleasant transition
to storing filenames as raw byte sequences (which makes *much* more
sense).  But up to now it just hasn't been important, since it doesn't
affect darcs' functionality (and formatting transitions are a royal pain).

I also believe that treating filenames as byte sequences is correct.
There's no requirement on posix systems that filenames either can be or are
represented in your current locale.  On the other hand, it would also be
nice to allow an optional (and configurable) filter between the actual
filenames on disk and what darcs interprets as the "raw bytes".  But in
general, all internationalization waits on someone other than myself (who
never uses anything but ASCII).
-- 
David Roundy
http://www.darcs.net




More information about the darcs-users mailing list