Picking a slice of bytes at a fixed position in a file is “a hash code”, just one that’s entirely trivial to compute. If two files are the same, the byte slices are necessarily equal. That’s the only fundamental property “a hash code” has to satisfy: if objects are equal, their hash codes are equal.
The point is that duplicates are usually rare, and it’s a hell of a lot cheaper to read, say, a few thousand bytes than a billion bytes when that suffices for “the hash codes” to differ. It doesn’t matter to this whether “the hash code” is the raw bytes themselves or a fancy hash (like SHA).
And it works for that purpose regardless of which kind of hash function (trivial or fancy) you use.
I used to routinely check a folder with tens of thousands of files for duplicates. They were JPEG files, and were usually only a few megabytes large at worst (back in the days when disks were much smaller). The job typically finished finding a few dozen duplicates, but did so after reading less than 1% of the JPEG bytes stored on disk. No “fancy hash” function used.
As noted above, writing a keyfunc() that goes on to apply a “fancy hash” to the raw bytes is certainly possible and principled, and may be helpful in some circumstances I’ve personally never been in. So, e.g., could be writing a keyfunc() that reads some number of trailing file bytes.
The key idea is successive refinement of equivalence classes, using a sequence of increasingly expensive hash functions. The initial os.path.getsize() is a hash function too, and the cheapest by far.