[darcs-users] Default binary masks

David Roundy droundy at abridgegame.org
Sun Nov 23 17:22:27 UTC 2003

On Sat, Nov 22, 2003 at 05:59:04PM -0500, Sean E. Russell wrote:
> > Studio as well.) It would be nice to at least recognize files starting with
> > bytes FFFE or FEFF (hex), since these are most probably UTF-16 files.  But
> > perhaps I'm opening a can of worms here :-)
> Hm.  Does darcs do any sort of byte-level investigation of files?  I
> thought it relied on file endings.

Its primary mechanism is file endings, but a few weeks ago (or is it months
now?) I added support for checking for '\0' and ^Z (EOF) characters in
files and creating binary patches if either is found.  It probably wouldn't
be too hard to create rules for UTF-16 as well, but of course that would
require that we have a UTF-16 patch type.  This wouldn't be too hard to do,
but is definitely a job for post-1.0 (as all new patch types are).  I'm
starting to thing that we'll be reaching 1.0 time early in next year (maybe
January or February?), and then we can open up development to interesting
new patch types.  (Not that UTF-16 is particularly interesting or

> > One way to do this would be to store the following properties for every
> > file, or for each group of file names (using regexps):
> Store properties for every file "based on file ending"?  Or must the user 
> specify, when they add a file, what the character encoding and line ending 
> encoding is?

Currently it is "based on regexps", which for the defaults means mostly
based on file ending, but could also be based on complete filenames.  In
the general case, users can do whatever they want.

> It isn't possible to programmatically determine the encoding from the
> byte-stream, of course, even if you know that you're working with
> non-binary files.

In some cases you can determine the encoding, provided you define the
encoding such that there are invalid files.  For example, a valid text file
should never have a '\0' in it...  Similarly, a file with no '\n' in it may
be a valid text file, but there's not much point in creating a text hunk
out of it, since it only has one line.  To be useful, all darcs would need
would be rules of thumb for guessing the encoding.
David Roundy

More information about the darcs-users mailing list