Make a stdlib for Non-Cryptographic Hash Functions (NCHF)

Yes, I’m aware of this. I’m also very aware that performance is relative, hence asking about a use-case (“it would be nice” isn’t a use-case).

On my system, SHA1 (not considered secure any more but still exhibiting the avalanche effect etc) clocks in at nearly 3GB/sec:

$ python3 -m timeit -s 'import hashlib; data = b"a" * 1048576' 'hashlib.sha1(data)'
1000 loops, best of 5: 351 usec per loop

And that’s on a single core; you can multithread this easily (the GIL is released during hashing).

(Okay, on my slow disk server, it clocks in at 4.42ms/loop, which is only about a quarter of a GB/s, but still, that’s a fair amount of throughput.)

So, what is the use-case where this sort of speed is insufficient, you’re willing to sacrifice true reliability, but you’re NOT willing to sacrifice entire-data hashing (see this thread about file hashing for some examples of how partial-data hashing can still be remarkably effective), AND you are unable/unwilling to either use PyPI or to copy in some open source code to use?

And if you can find one use-case, can you find enough to justify adding a specific group of algorithms to the standard library? What’s correct for YOUR use-case might not be correct for someone else’s, so either the stdlib module has to be enormous (and a nightmare to maintain), or there’s a good chance that the next person stil won’t find what they want there, and will have to look elsewhere.

This seems like a pretty narrow requirement to me.

1 Like