[darcs-users] [patch252] Resolve issue1763: use correct filename encoding in co...
tux_rocker at reinier.de
Tue Jun 1 06:20:46 UTC 2010
Op vrijdag 28 mei 2010 11:05 schreef Eric Kow:
> Plan for future work? (two kinds of read/show)
> Complementary plan: we should distinguish between decoding/encoding
> filepaths from the operating system, and decoding/encoding filepaths
> to patch files and patch bundles.
> Basically the picture looks like this:
> OS <--> darcs <---> patch files
> The reason why I initially thought that NewFormat was a step backwards
> was that I was thinking about the darcs <--> patch files part. IMHO,
> what you want is for darcs <--> patch files to always use UTF-8. On the
> other hand, the OS <--> darcs part needs some more thought.
> This is a little half-baked right, but maybe somebody else can run with
> the idea?
As a Unix guy I never think of filenames as text. If our patch files are
UTF-8, how do we represent patches to the issue1763 repo with its Hungarian
characters in a single-byte encoding? Note that if I copy these files to my
machine that has a UTF-8 locale, the file names will still be valid single-
byte-hungarian and invalid UTF-8!
Given that even enterprisey Java does not have a good solution to this problem
makes me feel hopeless about finding one for darcs.
We could of course say that for darcs, filenames are Unicode text. Then we
should check upon darcs add that a filename is valid according to the current
locale, and encode the file name using the locale encoding whenever darcs does
filesystem operations on Unix. But refusing to 'darcs add' a file because its
name does not fit our model may anger some users. And I haven't even thought
about backward compatibility.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 197 bytes
Desc: This is a digitally signed message part.
More information about the darcs-users