[darcs-users] UTF-16 (was: Default binary masks)

Alex Shinn foof at synthcode.com
Fri Nov 28 02:41:31 UTC 2003

At Thu, 27 Nov 2003 06:49:30 -0500, David Roundy wrote:
> It seems that replace is safe if your tokens only contain single byte
> characters, but if you want to include multibyte characters in your tokens,
> the tokenizing code will split the bytes of a single character, which is
> definitely not a good thing.  A small extension to the tokenizing code to
> support [^ \n\t] type specification would make replace work all right with
> multibyte characters, as long as you specify the token delimiters rather
> than the valid token characters (or the valid token characters are all
> single-byte).

Ah, I'd prefer this style anyway since I program in Scheme which allows
almost anything in an identifier.  Technically R5RS Scheme is defined in
terms of an inclusive list of letters, numbers and any of

 ! $ % & * + - . / : < = > ? @ ^ _ ~

but in my head I just think in terms of whitespace, comma, hash,
semi-colon, quotes and parens being delimiters.  And Gauche Scheme
allows arbitrary Unicode characters in identifiers (so you can define <=
using the single <= character if you want).

It might be nice to be able to record per project the default
delimiters, whether they are inclusive or exclusive.


More information about the darcs-users mailing list