[darcs-devel] SHA hashing

Gwern Branwen gwern0 at gmail.com
Sat Apr 12 00:24:10 UTC 2008


So while I was playing around with ByteString, I noticed that the FPS functions sometimes are being called from SHA1.lhs and Crypt/SHA256.hs (the latter of which also needs a Crypt/sha2.h). And I was wondering.

# Why are cryptographic hashes being used? My understanding, from some half-forgotten haskell-cafe thread, was that they weren't being used for tree hashes and cryptographic guarantees about data integrity and whatnot, like in Git or some other DVCSs. If they aren't being used for their intended purposes, but for unique naming, then why not go with some faster hash like MD5 or something? If they are being used for cryptographic purposes, why not SHA512?
# Secondly, why are there two different hash functions being used? It seems somewhat complicated and wasteful of LoC. Is it sheer and simply supposed to be an optimization? It doesn't seem like much of one to me: testing out the shasum Perl program on my computer on some large and small files, it seems the difference between SHA1 and SHA256 is maybe a matter of 50% more for the latter. Not much for maintaining two independent and idiosyncratic implementations, and also contributing to the proliferation of SHAs*.
# Thirdly, if it is necessary to use both SHA1 and SHA256, or even just one, why is this not being farmed out to a library? Darcs is a DVCS project, not a cryptographic project - this isn't something Darcs should be doing, both on moral and practical grounds. (It's quite likely we aren't none of us crypto experts or even enthusiasts; are those two modules well maintained? Are they particularly efficient? and so on.) Why isn't Darcs using something like Crypto <http://hackage.haskell.org/cgi-bin/hackage-scripts/package/Crypto> for such things?

Crypto wouldn't be a terrible choice if we were to still use SHAs , actually - its SHA1 is technically [Word8] -> [Word160] though <http://hackage.haskell.org/packages/archive/Crypto/4.1.0/doc/html/Data-Digest-SHA1.html>, so might need some packs or whatever.

nano-hmac might be even better as I suspect that its unsafeHMAC with sha1 is really fast <http://hackage.haskell.org/packages/archive/nano-hmac/0.2.0/doc/html/Data-Digest-OpenSSL-HMAC.html#t%3ACryptoHashFunction>, but HMACs aren't hashes, so some rewriting would be necessary there.

If the hash functions were changed to MD5 or something, there are plenty of bindings like nano-md5 <http://hackage.haskell.org/packages/archive/nano-md5/0.1.1/doc/html/Data-Digest-OpenSSL-MD5.html> for that.

*I count on my computer alone SHA1.[l]hses from: Conjure, Ginsu, Yi, JHC, HAppS, Darcs, *and* Crypto; this does not count the various SHAs you can find via Google. This is at least 4 or 5 too many, by my way of thinking.

--
gwern
ISA grom AHPCRC rail Aladdin Verisign BLU- Hmong executive CISU
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.osuosl.org/pipermail/darcs-devel/attachments/20080411/4e1f1b16/attachment.pgp 


More information about the darcs-devel mailing list