Thanks Barney - I really appreciate that
Unfortunately, @CAM-Gerlach, @MegaIng I’ve taken you all on a complete wild goose chase. I apologise. In summary, my suggestion could well make the performance between slightly and significantly worse (except on Mac OS, on which the performance is even worse regardless of my suggestion).
Cornelius - you were absolutely right, this needed to be measured. I have done so.
Ubuntu 22.04:
Testing reps=20 of listing a directory of: 50000 files
Time using Path.iterdir: 2.142239902
Time using ScanDirPath.iterdir: 2.313271893999996
Time using os.listdir: 0.00019698799999900984
Time using os.scandir: 0.0001934010000042008
Windows Server 2022:
Testing reps=20 of listing a directory of: 50000 files
Time using Path.iterdir: 1.841156299999966
Time using ScanDirPath.iterdir: 2.198301700000002
Time using os.listdir: 0.000457399999959307
Time using os.scandir: 0.00043940000000475266
MacOS 13
Testing reps=20 of listing a directory of: 50000 files
Time using Path.iterdir: 4.364073147000454
Time using ScanDirPath.iterdir: 4.452478466000684
Time using os.listdir: 0.00039130699951783754
Time using os.scandir: 0.00034591399980854476
There’s still the very real possibility I’ve done something silly, in particular something that means these tests are unfair. My choice of 50,000 files and 20 repetitions is influenced as much by my patience in waiting for tests to finish, as by my idea of a realistic usage scenario, in which any difference could be important. Be ever wary of isolated benchmarks, etc.
If these tests are not flawed, then you were completely right too Cornelius about using the os module, where performance is needed.
If nothing else, I believe I now have a definitive answer to Peter’s original (rephrased) question:
" “Is there a solution with pathlib
that is as fast as os.scandir()
?”"
No. Not even one as fast as os.listdir.
Pathlib is superb, but its primary benefit is code readability (and writeability). Not raw performance on unmanageably extreme numbers of files.