Missing variable assignment

What produces the first line of the following code and where do we get values for filename variable?


for root, _, files in os.walk(directory_to_search):
    for filename in files:
        # Check if the file has a JPG or JPEG extension
        if filename.lower().endswith(('.jpg', '.jpeg')):
            file_path = os.path.join(root, filename)

From the os.walk documentation:

Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).

dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (including symlinks to directories, and excluding '.' and '..'). filenames is a list of the names of the non-directory files in dirpath.

So the first for loop loops through the directory tree starting at directory_to_search . It returns the tuple described in the documentation for each directory.

There is no variable filenames, I assume you mean filename? It contains the current filename from the list returned in files. The for iterates over an iterable, in this case a list, returning items for the list in turn.

See here.

[I]t [os.walk] yields a 3-tuple (dirpath, dirnames, filenames)

So, files is a list of str. The FOR variable filename takes, in each iteration of the loop, one of the values in the list files.

So if I want to iterate this loop over a directory, where are no subdirectories and only 3 files, would the tuple look like ("user/Juan/Documents/", , ["file1.txt", "file2.txt", "file3.txt"])? Does the dirpath has an absolute or realative value? The other question is, if the result is one 3n tuple, weather os.walk() method than just updates the one tuple, which is provided as the result?

If filename returns files from the list of filenames is it a variable? Is there any generic name for this type of assignment? Is this handeled by Python core, or by os module?

OK. Now I can see, that filename is not a variable, but an attribute. That unfortunately doesn’t contribute much to my understanding of this code snippet. I can run it to see, what it does, but my question is on the principles, why it does that. The official documentation is not helpful again.

filename is a variable. It is the “dummy variable” of the for loop. In each iteration, it takes the value of one of the elements of files. So, in your example, it will take the values "file1.txt", "file2.txt", and "file3.txt".

root is relative to directory_to_search.

So, file_path = os.path.join(root, filename) is giving the path, relative to directory_to_search to the file that is now in filename.

Due to if filename.lower().endswith(('.jpg', '.jpeg')): the variable file_path will only take the values that are paths to files with extensions ".jpg" or ".jpeg".

Wow, you are posting a simple question I thought, but expecting an essay as an answer!

Almost.
("/user/Juan/Documents/",[] , ["file1.txt", "file2.txt", "file3.txt"])
Directories an empty list, path may look different, depending on the OS in use.

What did you see when you tried? This is typically something you can easily find out by trying.

That is not how Python “thinks” about names. Read this, it is worth while. Actually it returns a different value every time, referenced by the same name.

As @franklinvp wrote, it is indeed a variable. The value it references is governed by the current value returned by the iterable, in this case files.

Example of a for loop where the iterable is a string:

>>> word = "iterable"
>>> for character in word:
...     print(character)
... 
i
t
e
r
a
b
l
e
>>>

Where iterable is a list:

>>> random_words = ["iterable", "Nice!", "Art"]
>>> for word in random_words:
...     print(word)
... 
iterable
Nice!
Art
>>>
1 Like

So, in this case, the tuple will be something like (".", [], "["file1.txt", "file2.txt", "file3.txt"])?

I don’t know. It depends on the value that you gave to directory_to_search. If, for example, the values of directory_to_search is ".", there are no subdirectories and the only files are those three, then yes, that tuple is the only one that comes out of os.walk.

What do you mean? Please expand the question. Has the os library been import-ed earlier in the code? Do you know what the os-library is?

Yes! Why are you asking? Surely inserting a print() immediately inside the outer for-loop, would reveal same?

Re-reading @Mholscher’s response:

Again, could the question be answered by inserting a print()?

Do you understand immutability, and that tuples are immutable?

The outer for-loop replaces the 3-tuple at each iteration with the next sub-directory in the stated directory-tree.

This description is confusing (made worse by editing the OP so that what we read today is different to what was criticised in the first response, back-then). Some of the words used in the documentation are different from the identifiers used in the code-snippet, for the same concept. Thus, if still struggling to understand, some harmonising between the two may assist.

In the code:

the outer for-loop creates a list of file-names (files in the code and “filenames” in the docs). Whereas, the inner for-loop iterates through the list of file-names, identifying each item in the list, ie each file-name, identified as filename.

Note how Python’s docs refer to “identifier” - although “variable” is also frequently-used.

Do you understand how for-statements work? Python’s idiom is quite different from other languages and their index-based concept!

filename will always be a string (in this code-snippet). Could its value be replaced? Yes. (once again: “immutability”) Within the code-snippet’s inner for-loop it is not changed, but is examined and used as a component to compute the full file-path (file_path).

The magic happens with in the for-statement. The assignment of value (“binding”) to a single item from the iterable (list) is the same as any other such, eg x = a_list[ 0 ], which is the same as any other single-value assignment, eg y = 0. The docs don’t seem to use a specific name for for-statement assignments.

It is part of Python (as above).

In addition to the above, do you understand “unpacking”, specifically the tuple-unpacking which is implicit in the sample code?

Further to:

there are indications that there are many concepts involved here which are beyond your current-level of Python knowledge. Are you a beginning programmer? Have you skills in another language and are ‘converting’ to Python?

Which course or book are you following?

There is a school-of-thought (“school” hah!) which says not to bother with a course, but to pick-up code and learn from that, or to set a project and learn-as-you-go. Unfortunately, whilst more-valid at advanced-levels, such an idea creates “islands of knowledge” amongst learners/Python-Apprentices.

Whereas a well-designed progression (course/curriculum design) guides the learner from what (s)he already knows, one step forward at a time; the problem-approach frequently results in folk attempting ‘challenges’ that are several steps ahead of their current-knowledge - or which require multiple new concepts to be absorbed concurrently - a phenomenon popularly known-as “a steep learning curve” (see how long this message has become!).

YMMV!

Bookmark links to have some sources for own research in future

Read about for loops
For Loops Python Wiki
Python For Loops w3 schools

Read the documentation of os.walk
os.walk description python documentation
Python os.walk() Method wr schools

I will not copy over the complete documentation of os.walk as it is rather long. Just few pieces:
os.walk(top , topdown=True , onerror=None , followlinks=False )

Thanks to all. I tested that, red some documentation and I think I underestand it now.