[darcs-devel] OT: Haskell with style [was: darcs patch]

Thu Jul 1 08:42:51 PDT 2004

On Thu, Jul 01, 2004 at 04:46:43PM +0200, Juliusz Chroboczek wrote:
> > Anyhow, my feeling is that where things are confusing, it's usually because
> > there isn't really a clear model of how things are supposed to behave, and
> > features got added on incrementally...
> 
> Okay.  Since we're about style, let me go on.
> 
> As you say, things got ``added on incrementally'' in darcs, and
> somebody has been cut'n'paste programming a lot.  A lot of places
> contain the same or similar code, and they should be refactored into
> separate functions.  (Just look at repo creation in get and
> initialize.)
> 
> I'd actually go all the way and modularise all repo and current access
> into separate modules, and have a law that forbids direct filesystem
> access from the rest of darcs.  But I guess not before 1.0.

I agree.  You can probably see that I've tried to do some of this (see for
example, write_inventory), but obviously haven't gone all the way.

> (Another advantage would be to make it possible to experiment with
> other representations for filesystem data -- I've got this clever
> design for ChunkedStrings which the margin of this e-mail is too
> narrow to contain.)

Yeah, that would be very nice.  One (semi-unrelated) performance change
I've been considering is changing FileName to store PackedStrings instead
of FilePaths.  I think this could make a big difference on large
repositories.  On the other hand, FileNames started out stored as
PackedString, and I switched them to FilePaths because the conversion
overhead was too much.  Really haskell needs a class of CString-convertible
data types that can be accepted in liu of FilePath by the IO and Directory
functions... I should probably bring that idea up in libraries at haskell.org
or somewhere.

> > As long as the names are well chosen, I find separate function names
> > easier.  Certainly, one can easily map fetchFilePS Cacheable to
> > fetchFilePS_Cacheable without greatly affecting readability, and this would
> > mean one less data type to remember, and one less order of arguments to
> > remember.
> 
> How do you pass around part of the name of a function?
> 
>   ssijURL :: String -> Cachable -> Ssanie
>   ssijURL x cache = ssijPS (getURL x cache)
> 
>   getURL x cache | isHTTPUrl x = getHTTPUrl x cache
>   getURL x _ = getLocalUrl x

Ok, that's a good point.

> > Switching subjects...
...
> > data DirectedList = FL [a] | RL [a]
> 
> That's much, much better.
> 
>   mapDL f (FL a) = FL (map f a)
>   mapDL f (RL a) = RL (map f a)
> 
> Or perhaps simply
> 
>   data Direction = FW | BW
>   data DirectedList a = DL Direction [a]
> 
> as this avoids the clumsiness of the two cases when an operation
> doesn't depend on the direction.

The only catch here is that the other (somewhat nasty) idea (i.e. the :<:
etc) is nice in that it allows for cheap lazy reverses and with the right
set of accessors you'd never need to actually care which order the list is
in.  You just get the "oldest" or "newest" elements, regardless of which
order the list is actually in.  The annoying part, of course, is that this
doesn't allow for easy pattern matching... but there's no way around that,
at least that I'm aware of.

But in practice, probably FL/RL or FW/BW is the way to go.

> Yet, I think the ideal solution to that kind of problems would be to
> work out the algebra of patch lists and make the patch list an
> abstract data structure that is implemented in terms of lists.  The
> interface to the data structure would make most silly errors
> impossible.

With just plain patches, this is easily done by converting a patch list
into a patch using join_patches and flatten.

The problem with creating a patch list data structure is that I'd want to
also "safely" support [(PatchInfo, Maybe Patch)] and [(PatchInfo, Maybe
Patch)]], and perhaps also [PatchInfo].  I guess it depends on how you'd

> Compare that to strings.  We all know that strings are just lists of
> characters; and yet, you seldom make mistakes with strings that you
> might make with arbitrary lists.  That's because you stick to a useful
> set of operators on strings -- (++), lines, etc. -- that form a safe
> algebra

With strings, you almost never need to deal with them in reverse order.
I'm not sure there is a set of operations that are "safe" when order
matters both in meaning and in efficiency.
-- 
David Roundy
http://www.abridgegame.org