import hashlib
PATH = '/home/user/python'
m = hashlib.sha256()
m.update(bytes(PATH.encode('utf-8')))
EXTID = ''.join([chr(int(i, base=16) + ord('a')) for i in m.hexdigest()][:32])
print(EXTID)
How to get the original input (‘/home/user/python’) of the resulting SHA-256 hash?
Not possible. Ofcourse, given just the hash, finding an input that corresponds to it is a hard problem, that is why it’s a hash function. But even given the m object that builds the hash, it doesn’t store the input data passed in via m.update, it instead consumes it eagerly if possible to reduce the memory footprint.
I’m hacking Chrome browser extensions just wondering if it is possible to get the original directory as normalized string hash back from the generated hash.
I see. I was perhaps thinking there was a method of hashlib or a combinations of builtins that can achieve this, and somebody has already created such a function.
Additionally, this kind of means that exposing something like this to a Web application is not necessarily “insecure” or exposing any PII re the underlying filesystem, correct?
The hash is sensitive to the input, but it’s not unique, so even if you do find an input that works, you can’t be certain that that was the actual input.
That’s technically true but sha256 collisions are very unlikely and it seems like the set of possible inputs is fairly small and well-structured (i.e. they’re valid paths). If I managed to find a path that generated the right hash I’d be pretty confident I got it right.
There are, by definition, 2**256 possible SHA256 hashes. Even if we assume that paths can contain only alphanumerics plus two other characters (64 options), it takes just 43 characters across all path components to have more possible paths than there are hashes. Granted, a lot of those will be unreadable monstrosities like /wZCyo2bvEQTn/QPsLFe_34bRi/3G17IV0Sqzuu/KyhwGsaNY9jA/kwmxk8KQh-jL but still, you can’t be certain you have the original path, since that is entirely legal.
I was assuming we were looking for a human readable path based on the example. But it’s possible that these paths are already randomly generated strings (judging from the SO comments). So yeah, that opens things up a lot, although there should still be a known prefix to narrow the space down.
This is all totally theoretical anyway, since “try them all” wasn’t viable in the first place.
There are a number of online services that might give you a plaintext for a given hash, but they’re usually just rainbow tables of already-known hashes… it’s not like they’re going to spend the cycles cracking it for you.
“Impractical” is an understatement typical of cryptographers. The standard of quality for cryptographic functions – encryption, hashes, digital signatures – is that there is no better answer than “try every possibility”. And of course for an N bit hash that means you can expect to need about 2^255 tries to find a string that hashes to a given value. Similarly, to find the key that was used to encrypt a given message with AES-128, you can expect to need to try 2^127 keys on average. There aren’t any known shortcuts; or to put it differently, once a shortcut is found, even one that only nibbles at the edges, the affected cryptographic function tends to be swiftly deprecated. That’s what happened to MD-5 some years ago and more recently to SHA-1.
For some crypto functions, quantum computers offer a theoretical shortcut, subject only to the minor problem that you need one a whole lot larger than what exists today.
That was a joke answer. The real answer is that if there are K possibilities and you try each in turn, on average you will have to try guess K/2 times before you happen to find the right answer. It might take K guesses, and or it might be as little as 1 guess, but on average, it will be K/2.