[darcs-users] Re: darcs and patches@ mail

Aggelos Economopoulos aoiko at cc.ece.ntua.gr
Sun Feb 15 21:49:45 UTC 2004


On Sun, 15 Feb 2004 15:10:11 -0500
"Sean E. Russell" <ser at germane-software.com> wrote:

> On Sunday 15 February 2004 10:50, Aggelos Economopoulos wrote:
> > Obviously, I claim you could get better performance, but I think
> > you'll agree with me that whether it is worth it or not can only be
> > answered by implementing it and comparing actual measurements.
> 
> By "worth it", I meant "worth implementing," and, yes, I agree that
> you'll only discover whether it was worth it or not by actually doing
> it.  By the same logic, I can only discover whether I'll die by
> getting hit by a truck by actually trying it.  That's why we make
> educated guesses about these things :-)

Ah, but in the bus case almost everybody makes the same educated
guess, so you don't feel the need to try it.

> But really, where it wouldn't be worth it would be in the TCO. 
> Maintaining a custom database; rewriting it to keep up with
> changes to the basic darcs architecture; discovering, later, that
> your custom implementation doesn't scale as well as you thought and
> having to re-implement the entire thing...

Sure, but think of all the fun you'd get out of it 8) And in any case if
changes in the architecture of the version control system required a
rewrite of the database code then you've done a bad job - making
extra assumptions about your usage patterns is one thing, bad
engineering is another.

> these are the reasons why people use general-purpose databases, rather
> than hand-rolling their own every time.

But I never suggested that everyone should write their own custom
database. I'm only saying that, in this case, there may very well be
measurable performance benefits and that this could matter for really
large repos (you think otherwise, but since neither of us has any data
to support his claim it mostly boils down to intuition).

> > > ReiserFS4-land; have you?  If so, what are your impressions? 
> > > (replies
> ...
> > You really should check out the code.
> 
> I've been meaning to; only, the folks who built my laptop installed it
> onto one huge EXT3 partition (curse them!), and I haven't taken the
> time to resize and repartition it yet to get a decent filesystem on
> the thing.  Also, I'm willing to take certain risks with my software,
> but I'm rather unwilling to tempt fate with my data on that scale.
> Even Hans offers copious warnings about RFS4, and although he's
> probably being alarmist on purpose, I'm scared. ;-)

I suggested you should read the code, not run it! (and exactly why do
you think ext3 isn't a decent filesystem?)

> > Furthermore, until it becomes the standard in most installations
> > (and I doubt it ever will), depending on it would seriously reduce
> > the number of potential users for your program. Let alone the fact
> > that there are
> 
> Oh, definitely.  I think depending on it is a bad idea, but taking
> advantage of it isn't.

Agreed.

> A filesystem is just a database, and (personally) I like systems with 
> filesystem based DBs -- like maildir, darcs, etc.

I guess you haven't tried using a maildir with 40,000 mails in it :-)

> Not for everything, but I think that it is a good solution; it is
> transparent, low-maintenance, ubiquitous, and it works.  It should be
> replaced only when you've got a good idea that changing the database
> will yield significant benefits. I'm not a DB guru, so I don't know if
> darcs would, but I sort of doubt that it would improve performance
> much.

Oh. You mean arch ;)

System calls are expensive you know; if the number of syscalls you make
is proportional to the size of the repo (number of patches, number of
version controlled files, whatever) then a pure userspace implementation
(e.g. with mmap()) saves you more cycles as the repo gets larger - not
much for I/O intensive operations but still...

Also, the (unix) filesystem was made to serve different requirements
than those of darcs; berkeleydb is much closer to what you want.

> Like they say; don't optimize randomly.  Find the bottlenecks first,
> and optimize those.  Do we have any indication that the filesystem is
> a bottleneck for darcs?

I'd say we've solid evidence for the opposite. But this is just a
discussion on future directions, not a Call For Action(tm).

Aggelos




More information about the darcs-users mailing list