os.scandir should have an option to also yield the path itself

I think it would be nice if os.scandir had an option (that defaults to the current behaviour) which makes it also yield the root path itself as given by its path argument.

The benefit would be, that if one has code like e.g.:

def scandirtree(path):
    for p in os.scandir(path):
        if p.is_dir(follow_symlinks=False):
            yield p
            yield from scandirtree(p)
        else:
            yield p

one would also yield path, an process that as well (in whatever one does with the generator.

Cheers,
Chris.

Your example is similar to Recursive directory list with pathlib.Path.iterdir · Issue #80783 · python/cpython · GitHub

I made numerous tests recently, and os.scandir() seem to be by far the fastest way to traverse. So that’s why I’d basically would want to see it in there.

Actually, I forgot to write the motivation for the whole thing above:
IMO, a typical use case when traversing a directory tree is something like:
rm -rf foo or find foo

The usual behaviour for all these is always to also include the specified directory itself.

Now if one considers my above scandirtree() function and maybe makes this a bit more complicated like not crossing mountpoints (as with find’s -xdev option) or already stat()ing some stuff and returning that, too.

If this shall then be used in some code as in:

for p in scandirtree(root_path):
   ...

then whatever I do there (both, the additional logic in scandirtree() and what I do with the results in the outer for loop), I need to fully repeat for root_path.

An alternative approach (but not sure whether that would break any other expectations on iterators or PathLike objects) would be if one could manually create os.DirEntry objects, then one could write the above as:

def scandirtree(path):
    yield os.DirEntry(path)
    for p in os.scandir(path):
        if p.is_dir(follow_symlinks=False):
            yield from scandirtree(p)
        else:
            yield p

Another (unrelated) idea would be to let os.scandir() work on non-directory files as path, i.e. os.scandir(“/path/to/regular-file”) would simply return an iterator that yields nothing but the file itself.

That would IMO further simplify usage, cause right now, when I have pathnames as arguments, and want to loop over them (any directories recursively), I need to add a manual check, when it’s a dir, use the os.scandir() respectively my scandirtree() and if not, again, re-do everything which I’d already do in scandirtree() and where I use that, for a single file.

Why can’t we create os.DirEntry objects directly?

The internal constructors work with low-level OS data (e.g. a WIN32_FIND_DATAW record in Windows). DirEntry caches this data to implement is_dir(), is_file(), is_symlink(), inode() in POSIX, and stat(follow_symlinks=False) in Windows.

When PEP 471 was first approved in 2014, os.DirEntry wasn’t exposed. It was added in 2016, but nothing was done to support a generic constructor that would create an instance from a path and cache the stat() result. There’s an open issue to implement this capability.

3 Likes

Just for the records:
Since there was no clear outcome here, I’ve just requested it at os.scandir(): add option to yielt the given path itself. · Issue #99454 · python/cpython · GitHub .

Thanks,
Chris