Os.scandir question

Hi all,

I’ve been exploring os.scandir() and came across this issue

The following code works:

import os

txtFiles = [f.name[:-4] for f in os.scandir() if f.name.lower().endswith('.txt')]
mp4Files = [f.name[:-4] for f in os.scandir() if f.name.lower().endswith('.mp4')]

However the following code does not work:

import os

dircontent = os.scandir()

txtFiles = [f.name[:-4] for f in dircontent  if f.name.lower().endswith('.txt')]
mp4Files = [f.name[:-4] for f in dircontent  if f.name.lower().endswith('.mp4')]

Do you have any idea why not and how to make it work with creating a dircontent object?

Thanks!

What do you mean “does not work”?

Do you get an error? Your computer catches fire? What happens?

Well, what is the nature of your dircontent?

>>> import os
>>> os.scandir()
<posix.ScandirIterator object at 0x7f2e3b329f80>

It is not a list, nor a tuple or one of the data types that you might have learnt about. It is a special kind of object, of which the type is posix.ScandirIterator. The most important property of this object is that it supports the “iteration protocol”. Essentially, this means that you can use the object in a for loop:

for filename in os.scandir():
    ...

So, lists and tuples support iteration, but there are many more types supporting it too. In fact, you can even define your own (search “python define iterator” in any search engine). An iterator has an internal state that it uses to yield successive elements. In this case, the ScandirIterator might be using some OS function to get the files one by one. Most importantly, it has not yet gotten the name of next file before you process the current one.

So, at the end of your first comprehension, the dircontent iterable is exhausted – you have iterated over it once, and it is done with iterating now, as it has given all elements. If you need the scandir result several times, you have to create a new iterable:

dircontent = os.scandir()
txtfiles = [... for f in dircontent if ...]
dircontent2 = os.scandir()
mp4files = [... for f in dircontent2 if ...]

Alternatively, you can convert the iterable to a list first. The list() builtin exhausts the iterable, exactly like a for loop does, and puts all of the elements in a memory structure that holds them, available for reuse.

dircontent = list(os.scandir())
txtfiles = [...]
mp4files = [...]

By Bart via Discussions on Python.org at 20Mar2022 21:51:

The following code works:

import os

txtFiles = [f.name[:-4] for f in os.scandir() if f.name.lower().endswith('.txt')]
mp4Files = [f.name[:-4] for f in os.scandir() if f.name.lower().endswith('.mp4')]

However the following code does not work:

import os

dircontent = os.scandir()

txtFiles = [f.name[:-4] for f in dircontent  if f.name.lower().endswith('.txt')]
mp4Files = [f.name[:-4] for f in dircontent  if f.name.lower().endswith('.mp4')]

You do not say what, specificly, “does not work” actually means i.e.
what you expected and what you actually got. But I have a good guess.

This will be because os.scandir() returns a generator (not a list),
which yields the entries in the directory. So your txtFiles list
comprehension consumes the iterator. The mp4Files list comprehension
is then using an empty iterator and finds nothing.

Is this bourne out by the contents of txtFiles and mp4Files if you
print them?

The usual approach when you get an iterator you need to use more than
once it to consume it immediately into a list, then use the list from
there on:

dircontent = list(os.scandir())

That isn’t always appropriate (for example very big generators or ones
where consuming the entire generator will be unduly expensive) but it is
the normal situation.

Cheers,
Cameron Simpson cs@cskk.id.au

Yes, I should have been more specific then ‘does not work’, sorry about that. This lists I was trying to generated with comprehension turned out empty when there were no txt files in the directory.

Thanks for the explanations about scandir returning an iterator, right on the spot.

This video has some nice info on the subject: Python Tutorial: Iterators and Iterables - What Are They and How Do They Work? - YouTube