[darcs-devel] performance idea: a darcs db?

Sat Jan 29 03:45:54 PST 2005

On Fri, Jan 28, 2005 at 08:46:25PM +0100, Juliusz Chroboczek wrote:
> >> Is this a crazy idea? I do a lot of database programming, so I think
> >> my mind naturally gravitates to solving problems like this. :)
> 
> > I think that the essence of this idea is the same as me wanting to use
> > Arrows to cache properties on the patch itself, and David's Perhaps
> > types.
> 
> I think that Mark's idea is different, since he wants to have a
> persistent cache of patch properties.  David's idea keeps the cached data
> for one Darcs invocation only.  I believe yours is similar.

I had been thinking that as an optimization we could store the
"domains"... but it was indeed mostly as an afterthought, and as something
to do later.

But I think we need the domain formalism if we're to use this database for
anything other than query purposes, since we need to *know* that it tells
us whether or not two patches commute.

> It's not a bad idea in principle, but I think it should be done by
> putting the extra data in the ``inventory'' file.  This way you can
> avoid hitting the remote repository more often than by using a
> separate file.

Hmmmm.  I don't like the idea of bloating the inventory, but this may be
the best way to go.  I can certainly see there would be major advantages to
being able to avoid reading the patch at all if its commutation is
trivial.  And keeping this in the inventory would (by breaking
compatibility) avoid the danger of it getting out of sync.  And we don't
want to store each patch's domain in a separate file, or we'd really pay
through the nose on remote accesses.

Which means that if someone were quick, I suppose this could get into the
major release that has the conflictor patches and the hashed inventory
(assuming those two features come simultaneously).

> I also don't think that a relational database is a good idea.  Haskell
> likes coinductive datatypes.  SQL likes relations.  I don't think the
> two mix well.

I'm not really sure what sort of back-end would be most useful.  We
definitely don't want to have to try to use sqlite over an arbitrary
network, so that would limit us to using it for local operations, which
seems... not so good.

On the other hand, having some sort of a redundant database as an
optimization for query operations (which I think is what Mark was
proposing) isn't a bad idea.  It would be an optional feature, and darcs
could check for its presence, and use it in annotate to avoid dealing with
redundant patches.

Actually, a function

apply_to_file :: Patch -> (FileName, FileContents)
              -> (FileName, FileContents)

would be a nice thing to have for annotate.  It shouldn't be hard... but
actually should be trivial once we have the "filesystem-like" monad class
that I've been envisioning.  But I'm drifting from the point here...
-- 
David Roundy
http://www.darcs.net