[darcs-devel] Failing Windows tests / encoding issues

Ben Franksen ben.franksen at online.de
Mon Jul 27 13:15:55 UTC 2020


>> Okay. There are unresolved issues, though. For instance
>> http://bugs.darcs.net/issue2591 is still open. Perhaps we gave up at
>> some point and skipped these tests on Windows?
> 
> Yes, true. I see the following encoding or name related tests that have
> abort_windows:
> 
> # invalid characters in filenames - almost certainly will never work
> git_quoted_filenames
> convert-import-export-non-ascii
> issue1932-colon-breaks-add
> 
> # these appear to be bugs or at least not properly understood
> issue1442_encoding_round-trip
> issue1739-escape-multibyte-chars-correctly
> issue2262-display_of_meta_data
> utf8-display
> 
> This was just a crude grep, I didn't check for example which tests are
> actually being skipped on my computer for some other Windows-related reason.

Okay, thanks for that overview. I guess until we have a convenient build
and test platform that supports Windows (e.g. Appveyor) we won't make
much progress with these issues. Our current "workflow" (I make changes
then ask you to test them on Windows) obviously doesn't scale.

>> Some of the test scripts go to great lengths to create repos with file
>> names that are not encodable within the current locale, switch
>> locale/encoding between different operations on the same repo etc. I
>> imagine this to be difficult on Windows. Doesn't NTFS store them
>> internally in UTF-16?
>>
>> But I may be wrong about that.
> 
> I see a few different challenges with encoding:
> 
>  (1) Make darcs work nicely just on Unix with all the different ways
>      people can use encodings there.
> 
>  (2) Make darcs work nicely just on Windows with all the different ways
>      people can use encodings there.
> 
>  (3) Make darcs interoperate nicely between Windows and Unix. This may
>      require users to restrict themselves to a defined subset of
>      functionality.
> 
>  (4) Keep our code decently abstracted even given the significant
>      differences between the two platforms.
> 
> So yes, your original statement that some of our encoding tests cannot
> be made to work properly on Windows is likely true in that they are
> relevant to (1) but fall outside the intersection required for (3).

Yes, that is what I meant, basically.

> I've been through some of the old discussions you pointed to but
> unfortunately am still quite hazy on the all the details of this topic :-(

You are not alone. Not at all. I regularly get headaches trying to
understand how everything fits together in Darcs wrt encoding.

Regarding file names, I have been entertaining fairly radical notions to
get rid of large parts of these troubles. We could, in principle, put
limitations on file paths inside a repo. The fact that on Posix systems
a file path may contain ASCII control characters does not mean we have
to support them. Nobody in their right mind would use such file names in
any half-serious project. I also tend to think that legacy encodings
like ISO-8859 are becoming obsolete pretty fast, so that we could
additionally require file names to be unicode strings, period. We would
then store file paths internally in UTF-8, like we do for meta data.

The only question is what to do with existing repos. My current thinking
is that these should be handled using a variant of 'darcs repair' that
/interactively/ fixes file names that do not conform to the above two
rules by asking the user to enter a valid replacement name. This means
users can avoid file name collisions that could otherwise (e.g. if we
used a replacement character) make a repo completely unusable (and
unfixable). (Instead of interactive use we could also define a simple
file format for file name translätions that can be used as input.) As
for the regular 'darcs repair' command, this should not create new
identities for patches (because we don't touch the meta data). Of course
we have to make sure we validate file names whenever we read a patch (or
the index).

I think it would make the most sense to make these changes before we
release darcs-3.0.

For interoperability between systems, we could extend the existing
--case-ok logic to make these checks more thorough, forbidding (by
default) any file names that are invalid on either Windows, Linux, or
MacOS. This could include the reserved names on Windows (like "PRN") and
also things like ':' in file names.

Cheers
Ben



More information about the darcs-devel mailing list