os.scandir should have an option to also yield the path itself

calestyo · August 22, 2022, 3:27am

I think it would be nice if os.scandir had an option (that defaults to the current behaviour) which makes it also yield the root path itself as given by its path argument.

The benefit would be, that if one has code like e.g.:

def scandirtree(path):
    for p in os.scandir(path):
        if p.is_dir(follow_symlinks=False):
            yield p
            yield from scandirtree(p)
        else:
            yield p

one would also yield path, an process that as well (in whatever one does with the generator.

Cheers,
Chris.

EpicWink · August 22, 2022, 4:55am

Your example is similar to Recursive directory list with pathlib.Path.iterdir · Issue #80783 · python/cpython · GitHub

calestyo · August 22, 2022, 1:27pm

I made numerous tests recently, and os.scandir() seem to be by far the fastest way to traverse. So that’s why I’d basically would want to see it in there.

Actually, I forgot to write the motivation for the whole thing above:
IMO, a typical use case when traversing a directory tree is something like:
rm -rf foo or find foo

The usual behaviour for all these is always to also include the specified directory itself.

Now if one considers my above scandirtree() function and maybe makes this a bit more complicated like not crossing mountpoints (as with find’s -xdev option) or already stat()ing some stuff and returning that, too.

If this shall then be used in some code as in:

for p in scandirtree(root_path):
   ...

then whatever I do there (both, the additional logic in scandirtree() and what I do with the results in the outer for loop), I need to fully repeat for root_path.

An alternative approach (but not sure whether that would break any other expectations on iterators or PathLike objects) would be if one could manually create os.DirEntry objects, then one could write the above as:

def scandirtree(path):
    yield os.DirEntry(path)
    for p in os.scandir(path):
        if p.is_dir(follow_symlinks=False):
            yield from scandirtree(p)
        else:
            yield p

Another (unrelated) idea would be to let os.scandir() work on non-directory files as path, i.e. os.scandir(“/path/to/regular-file”) would simply return an iterator that yields nothing but the file itself.

That would IMO further simplify usage, cause right now, when I have pathnames as arguments, and want to loop over them (any directories recursively), I need to add a manual check, when it’s a dir, use the os.scandir() respectively my scandirtree() and if not, again, re-do everything which I’d already do in scandirtree() and where I use that, for a single file.

steven.daprano · August 22, 2022, 11:53pm

Why can’t we create os.DirEntry objects directly?

eryksun · August 23, 2022, 2:38am

The internal constructors work with low-level OS data (e.g. a WIN32_FIND_DATAW record in Windows). DirEntry caches this data to implement is_dir(), is_file(), is_symlink(), inode() in POSIX, and stat(follow_symlinks=False) in Windows.

When PEP 471 was first approved in 2014, os.DirEntry wasn’t exposed. It was added in 2016, but nothing was done to support a generic constructor that would create an instance from a path and cache the stat() result. There’s an open issue to implement this capability.

calestyo · November 14, 2022, 2:17am

Just for the records:
Since there was no clear outcome here, I’ve just requested it at os.scandir(): add option to yielt the given path itself. · Issue #99454 · python/cpython · GitHub .

Thanks,
Chris