[darcs-devel] Re: Line endings opinion poll (with bonus option)

David Roundy droundy at abridgegame.org
Sun Nov 7 04:16:31 PST 2004


On Thu, Nov 04, 2004 at 08:08:36PM +0100, Peter Strand wrote:
> 
> [moved to darcs-devel]
> 
> David Roundy wrote:
> >On Thu, Nov 04, 2004 at 08:06:02AM +0100, Peter Strand wrote:
> >>I also had to add an extra argument to a lot of the functions in
> >>SlurpDirectory.lhs to allow the desired line ending to be specified,
> >>since that is a repo-preference or command argument, which
> >>slurpdirectory knows nothing about.
> >>Not very pretty.
> >
> >I think I'd rather move towards abstracting away slurpies from the command
> >code.  I'd rather implement pairs of functions
> >
> >write_working_directory :: FilePath -> Slurpy -> IO ()
> >write_current :: Slurpy -> IO ()
> >
> >In both cases, it would be taken as a given that the working directory is
> >the repository directory (the FilePath argument would be used for things
> >like creating a test directory).
> 
> I tried to avoid making SlurpDirectory too dependent on knowledge of the
> repository (and preferences and such..).
> 
> And, I was stuck on the idea that the read and write functions in
> SlurpDirectory had to do the conversion, but just realized that that's
> not true at all, we could have a
>  convert_slurpy :: NewlineType -> Slurpy -> Slurpy
> function, and call that when reading or writing the working directory.
> That would leave the low-level code untouched.
> 
> Converting dos->unix should be reasonably efficient, just dropping the
> \r on lines. It's a bit harder to avoid reallocating every line when
> doing unix->dos conversion.
> 
> Perhaps SlurpyFile could have an extra field, describing its on-disk
> format when writing? Then reading and writing slurpys could be done
> without knowledge of repo preferences, while still allowing reasonably
> efficient "conversion" of slurpys.
> And it would allow us to decide a format on a per-file basis in the
> future (if that's ever desireable..).

If converting both ways were easy, I'd say go for it, but I don't like the
idea of adding an extra flag to deal with this.

> ...
> But I'll think a bit more about it..
> 
> What do you think?

I don't know, I don't think I really like the idea of sticking line-endings
preferences in the slurpies.  I'm rather do the conversion directly in the
reading/writing as you are currently doing.

> >The big reason for my feelings on this is that I think that _darcs/current/
> >should always have '\n' line endings.  If we don't have a "standard" set of
> >line endings in _darcs/current/, users will be able to corrupt their
> >repositories just by changing their line endings option, which would be
> >bad.  The down side is that will be a performance penalty associated with
> >using windows line endings, since the file sizes in _darcs/current won't
> >match those in the working directory, so (unless we do something tricky)
> >darcs will always read in all files when doing a diff (e.g. whatsnew or
> >record).  But the benefit will be that the worst thing a user can do by
> >changing his line-endings preferences is record a patch modifying the line
> >endings of all files.  That's annoying, but it sure beats introducing
> >corruption to the repo, and when users start messing with line endings,
> >they're always going to be capable of messing up line endings.
> 
> It definitely makes sense to keep as much as possible of the internals
> in a canonical format, but it's more work that my current hack ;)
> But I'll think about it and dig around in the darcs code a bit more.
> 
> Regarding changing line-ending preference, I'm not sure it should even
> be allowed. "endingness" is something you decide when you "get" or
> "init" a repo. You can always do a "darcs get --dos-endings unix-repo
> dos-repo" to get a new repo in another format, with a well-defined
> behaviour.

I see.  So the line-ending preference would be store *not* in _darcs/prefs/
but somewhere else in _darcs, so users aren't allowed to edit it (or if
they do edit it, they deserve what they get).  In that case I wouldn't
object to your patch.  Perhaps that's what it does? I may not have read
carefully enough... If the line-ending state of a repo is immutable, then
your implementation is actually the best way of going about this (except
that it would still be nice to create wrappers like write_working, etc.

The only problem is that we'd still need to figure out how to deal with
"interesting" configurations.  We don't want users to be able to create an
"impossible" situation (see below).

> >I'm also a little concerned about your crlfLinesPS, which isn't really
> >robust with respect to files with messed up line endings.  In particular,
> >if there are any instances of '\r' that aren't followed immediately by '\n'
> >it looks like you could have trouble.  I'd rather parse dos line endings by
> >looking for the '\n' and ignoring the preceding '\r' if it exists rather
> >than the other way around, but in any case we had better check.
> 
> And we should perhaps complain loudly if the files aren't in the
> expected format. If someone creates \n files in an \r\n repo, that's
> probably either due to a mistake (and the user should be informed), or
> intentional (and should be kept that way).

Indeed, that sounds reasonable, but I'm thinking we'd also want to make it
non-lossy regardless of what stupid data users stick in their repo (with
'\n' line-endings being canonical).  Which is to day, that _darcs/current
remains well-behaved regardless of how many '\r's there are that may not be
paired with '\n' to form a line ending.  My thought is that regardless of
what users do, the only difference between a 'unixy' copy of a repo and a
'dossy' copy is that the dossy copy has an extra '\r' inserted in front of
every '\n' that exists in the unixy copy.
-- 
David Roundy
http://www.darcs.net




More information about the darcs-devel mailing list