[darcs-users] "darcs is slooow."
zooko
zooko at zooko.com
Thu Jan 8 00:41:16 UTC 2009
Today no less than two people said they were considering contributing
to my open source project -- tahoe -- and both of them were surprised
at how extremely long it takes to "darcs get http://allmydata.org/
source/tahoe/trunk tahoe". Here is the e-mail from the second one,
who suggests switching to git.
Hopefully darcs-2.3 will address this kind of thing? :-/
Regards,
Zooko
Begin forwarded message:
> From: Shawn Willden <shawn-tahoe at willden.org>
> Date: January 7, 2009 17:07:29 PM MST
> To: tahoe-dev at allmydata.org
> Subject: Re: [tahoe-dev] Thinking about building a P2P backup system
> Reply-To: tahoe-dev at allmydata.org
>
> On Wednesday 07 January 2009 03:09:21 pm zooko wrote:
>> So, if you're planning to contribute patches, bug
>> reports, documentation, etc., then I'm delighted!
>
> Well, assuming I can get my head around the codebase sufficiently
> in the
> snatches of time I have available, I absolutely want to contribute
> all of the
> above.
>
>> Have you tried it? It might be just fine for sharing photos. I use
>> Tahoe to share photos, but I use the public test grid instead of a
>> private grid, and so I'm using many servers located in a co-lo plus a
>> handful of random servers operated by Tahoe hackers or curious
>> users. It seems to work fine.
>
> I imagine you also have a pretty fast network connection yourself,
> too. Not
> to put too much emphasis on my particular case, but I shoot with a
> moderately
> high-end DSLR so my image files tend to be large, and most of my
> family has
> low-end DSL connections. At 1 mbps, it takes at least 40 seconds
> to download
> a 5 MB image file, which would be painfully slow for browsing through
> pictures -- and that's if the pipe can be filled. If the images
> are coming
> from a handful of 256 kbps connections where Tahoe is bandwidth-
> capped to use
> no more than 100 kbps in order to keep some bandwidth available for
> other
> stuff (does Tahoe have bandwidth limiting? If not, it probably
> needs it),
> then the aggregate data stream may be no more than a 400-500 kbps.
>
> And let's not even talk about HD video.
>
>> As far as I know, we are doing adequately well on that goal. A few
>> times people have asked to have the option to turn off the
>> encryption, and in each case I asked them to please measure the
>> performance and tell me if the encryption is causing a performance
>> problem or another kind of usability problem.
>
> I'd be shocked if encryption were a performance problem. Crypto
> stuff has
> been my day job for over a decade, so I'm well aware of how
> blisteringly fast
> AES is, and RSA isn't too bad as long as you're not doing too much
> of it
> (especially if you're doing mostly public key ops, not private
> key). I'd
> expect a lot bigger performance issue from the erasure coding (BTW:
> ever
> considered Tornado coding instead of Reed-Solomon?).
>
> However, my real concern isn't CPU usage, particularly since the
> heavy lifting
> happens during storage, not retrieval. I'm thinking about
> bandwidth, both
> being able to rsync changes -- important because most home users' net
> connections are very asymmetric -- and to avoid hitting the network
> at all in
> the "Mom browsing my photos" case.
>
> I'm talking from theory here, not measurements, but I think I can
> predict
> pretty well what the performance of the sort of network I'm
> thinking about
> would be.
>
>> I want Tahoe to offer the user (human or computer) more control and
>> more knowledge about which shares go to which storage server.
>
> Okay, so here's a possibility. If I can ensure that K shares are
> stored on my
> mom's machine, and if Tahoe is clever enough to use those shares
> when she's
> browsing those files (doesn't seem difficult), rather than pulling
> from the
> network, then perhaps browsing my photos will be fast enough. The RS
> reconstruction and the decryption shouldn't be a big deal, and
> neither should
> applying a short sequence of forward deltas. Some performance
> testing is in
> order.
>
>> Yes, that's what it currently does (if you chose to share your "added
>> convergence secret" with all clients on the backup network).
>
> Cool. That's probably good enough that the added optimization of
> avoiding the
> storage of common files completely isn't worth the effort.
>
>>> To improve this, storage servers could index their local files and
>>> note when a request to store a share for a file they possess
>>> arrives.
>
>> By the way, the GNUnet project offers that feature, so you should
>> check them out.
>
> Thanks, I'll take a look.
>
>>> Next, I want incremental backups and versioning, and I want them to
>>> be done bandwidth-efficiently.
>>
>> Have you seen the duplicity plugin that Francois Deppierraz posted?
>> Maybe that does exactly what you want. :-)
>
> I'll look, but if it works at the tarball level like duplicity,
> then no, it's
> not what I want.
>
>> I would prefer if you used Tahoe and contribute patches, and if it
>> turns out that there is some behavior that you really want and that
>> seems to troublesome to me to risk including it in my codebase, then
>> I would prefer that you copy the Tahoe darcs repository and develop
>> your own branch.
>
> Okay. I grabbed the darcs repo (dang is that sloowww! Anybody for
> switching
> to git? ;-)) and I'll start from there.
>
> I haven't had a chance to look through the code much yet. Is there an
> overview document somewhere that covers the structure?
>
> Thanks,
>
> Shawn.
More information about the darcs-users
mailing list