I made numerous tests recently, and
os.scandir() seem to be by far the fastest way to traverse. So that’s why I’d basically would want to see it in there.
Actually, I forgot to write the motivation for the whole thing above:
IMO, a typical use case when traversing a directory tree is something like:
rm -rf foo or
The usual behaviour for all these is always to also include the specified directory itself.
Now if one considers my above
scandirtree() function and maybe makes this a bit more complicated like not crossing mountpoints (as with
-xdev option) or already
stat()ing some stuff and returning that, too.
If this shall then be used in some code as in:
for p in scandirtree(root_path):
then whatever I do there (both, the additional logic in
scandirtree() and what I do with the results in the outer for loop), I need to fully repeat for
An alternative approach (but not sure whether that would break any other expectations on iterators or PathLike objects) would be if one could manually create
os.DirEntry objects, then one could write the above as:
for p in os.scandir(path):
yield from scandirtree(p)
Another (unrelated) idea would be to let
os.scandir() work on non-directory files as
path, i.e. os.scandir(“/path/to/regular-file”) would simply return an iterator that yields nothing but the file itself.
That would IMO further simplify usage, cause right now, when I have pathnames as arguments, and want to loop over them (any directories recursively), I need to add a manual check, when it’s a dir, use the
os.scandir() respectively my
scandirtree() and if not, again, re-do everything which I’d already do in
scandirtree() and where I use that, for a single file.