[darcs-devel] SHA hashing

David Roundy droundy at darcs.net
Sat Apr 12 11:34:02 UTC 2008


On Fri, Apr 11, 2008 at 08:24:10PM -0400, Gwern Branwen wrote:
> So while I was playing around with ByteString, I noticed that the FPS
> functions sometimes are being called from SHA1.lhs and Crypt/SHA256.hs
> (the latter of which also needs a Crypt/sha2.h). And I was wondering.
> 
> # Why are cryptographic hashes being used? My understanding, from some
> # half-forgotten haskell-cafe thread, was that they weren't being used
> # for tree hashes and cryptographic guarantees about data integrity and
> # whatnot, like in Git or some other DVCSs. If they aren't being used for
> # their intended purposes, but for unique naming, then why not go with
> # some faster hash like MD5 or something? If they are being used for
> # cryptographic purposes, why not SHA512?

sha1 is used for unimportant hashing purposes, sha256 is used for
potentially cryptographically important purposes.  We don't use SHA512
because that would be wasteful.  Perhaps SHA256 is also wasteful, but
considerably less so.

> # Secondly, why are there two different hash functions being used? It
> # seems somewhat complicated and wasteful of LoC. Is it sheer and simply
> # supposed to be an optimization? It doesn't seem like much of one to me:
> # testing out the shasum Perl program on my computer on some large and
> # small files, it seems the difference between SHA1 and SHA256 is maybe a
> # matter of 50% more for the latter. Not much for maintaining two
> # independent and idiosyncratic implementations, and also contributing to
> # the proliferation of SHAs*.

For historical reasons.  We didn't want to break backwards compatibility,
and zooko convinced me that sha1 wasn't sufficiently secure.

> # Thirdly, if it is necessary to use both SHA1 and SHA256, or even just
> # one, why is this not being farmed out to a library? Darcs is a DVCS
> # project, not a cryptographic project - this isn't something Darcs
> # should be doing, both on moral and practical grounds. (It's quite
> # likely we aren't none of us crypto experts or even enthusiasts; are
> # those two modules well maintained? Are they particularly efficient? and
> # so on.) Why isn't Darcs using something like Crypto
> # <http://hackage.haskell.org/cgi-bin/hackage-scripts/package/Crypto> for
> # such things?

The problem is that it's hard to find a decent library to use, and it's
easy to find decent code to use.  The Crypto package is (from what I hear)
terribly slow, and there aren't decent libraries providing hash functions.

> Crypto wouldn't be a terrible choice if we were to still use SHAs ,
> actually - its SHA1 is technically [Word8] -> [Word160] though
> <http://hackage.haskell.org/packages/archive/Crypto/4.1.0/doc/html/Data-Digest-SHA1.html>,
> so might need some packs or whatever.

I believe this would provide a dramatic slowdown for darcs.

> nano-hmac might be even better as I suspect that its unsafeHMAC with sha1
> is really fast
> <http://hackage.haskell.org/packages/archive/nano-hmac/0.2.0/doc/html/Data-Digest-OpenSSL-HMAC.html#t%3ACryptoHashFunction>,
> but HMACs aren't hashes, so some rewriting would be necessary there.

Right, so we wouldn't want that.

> If the hash functions were changed to MD5 or something, there are plenty
> of bindings like nano-md5
> <http://hackage.haskell.org/packages/archive/nano-md5/0.1.1/doc/html/Data-Digest-OpenSSL-MD5.html>
> for that.

But we don't want to do that.

> *I count on my computer alone SHA1.[l]hses from: Conjure, Ginsu, Yi, JHC,
> *HAppS, Darcs, *and* Crypto; this does not count the various SHAs you can
> *find via Google. This is at least 4 or 5 too many, by my way of
> *thinking.

For sha1, we don't care about speed, and do care about giving the same
answer that darcs used to give, since it's all about backwards
compatibility.  So there's no point changing.  We *could* introduce
additional dependencies, but I see no reason to do so to save a couple of
hundred lines of code, and at the same time add additional bloat to darcs
(since there's no way we could link with *only* sha1).

For SHA256, speed counts (so Crypto is out), and if you can find a fast
library we can link to and if you're willing to write a configure test for
it, we could do so.  We've even got the microbench which you can use to
easily run timing tests on darcs' own code to see how much you've sped up
our SHA256.
-- 
David Roundy
Department of Physics
Oregon State University
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.osuosl.org/pipermail/darcs-devel/attachments/20080412/ac69a590/attachment.pgp 


More information about the darcs-devel mailing list