I made numerous tests recently, and os.scandir()
seem to be by far the fastest way to traverse. So that’s why I’d basically would want to see it in there.
Actually, I forgot to write the motivation for the whole thing above:
IMO, a typical use case when traversing a directory tree is something like:
rm -rf foo
or find foo
The usual behaviour for all these is always to also include the specified directory itself.
Now if one considers my above scandirtree()
function and maybe makes this a bit more complicated like not crossing mountpoints (as with find
’s -xdev
option) or already stat()
ing some stuff and returning that, too.
If this shall then be used in some code as in:
for p in scandirtree(root_path):
...
then whatever I do there (both, the additional logic in scandirtree()
and what I do with the results in the outer for loop), I need to fully repeat for root_path
.
An alternative approach (but not sure whether that would break any other expectations on iterators or PathLike objects) would be if one could manually create os.DirEntry
objects, then one could write the above as:
def scandirtree(path):
yield os.DirEntry(path)
for p in os.scandir(path):
if p.is_dir(follow_symlinks=False):
yield from scandirtree(p)
else:
yield p
Another (unrelated) idea would be to let os.scandir()
work on non-directory files as path
, i.e. os.scandir(“/path/to/regular-file”) would simply return an iterator that yields nothing but the file itself.
That would IMO further simplify usage, cause right now, when I have pathnames as arguments, and want to loop over them (any directories recursively), I need to add a manual check, when it’s a dir, use the os.scandir()
respectively my scandirtree()
and if not, again, re-do everything which I’d already do in scandirtree()
and where I use that, for a single file.