Error: argument of type WindowsPath is not iterable

vgwosdz · March 28, 2022, 5:56pm

(me again… )

So, I have another question, although I feel I’m getting better at this programming ;-).

I am mostly done with my assignment, but this error keeps popping up. I understand why it happens and what is wrong, but have been unsuccesful in solving it (or in finding proper documentation to help me).

I have a list with languages that I want to run through and a directory filled with files with a txt-extension (dutch1.txt, dutch2.txt, …).
The plan is that my program takes the first language from the list and runs through the files in my source directory. If the language is in the filename, the file is added to a new file (with name dutch.txt, french.txt, …).
The adding works fine (I tried it while manually adding the reference language), but I get the error message:
argument of type WindowsPath is not iterable
I am fairly certain that is because I try to look for the language in “file” rather then filename, but I am not sure how to tell Python to look in the filename.

This is the code:

from pathlib import Path

source_dir = Path('C:/Users/gwovi/PycharmProjects/VickyGwosdz_LanguageDetector')
files = source_dir.glob('*.txt')

languages = ["dutch", "english", "french", "german", "italian", "spanish"]

for language in languages:  # itereren over de verschillende elementen in de languages-lijst
    for file in files:  # itereren over de txt-files in de folder
        #  files van dezelfde taal worden achter elkaar aan geplakt in een file
        if language in file:
            with open(file.with_suffix('.txt'), 'r', encoding='utf8') as f: language_input = f.read()
            with open('language.txt', 'a+', encoding='utf-8') as f2:
                f2.write(language_input)

Thanks again for pointing me in the right direction (I will pay it forward in the forum…)

CAM-Gerlach · March 28, 2022, 7:22pm

Yup, your intuition is more or less correct. For what its worth, as @cameron and I mentioned to you previously, this is where naming your variables descriptively based on their contents (e.g. html_filename instead of files) really comes in handy to avoid these sorts of errors and confusion.

In short, you can see that source_dir is a pathlib.Path, which you call .glob on to get all matching text files, which also returns pathlib.Paths, not strings (per its documentation). A pathlib.Path or just Path, for short, is essentially a Python representation of a file path. Its typically more convenient than working with the path as a plain string, since you can do handy things like file.with_suffix(".txt") to change the extension instead of manual and error-prone string munging, source_dir / "dutch1.html" to get the path to a specific file, or file.read_text(encoding="utf-8") to get the file at that path’s contents as text without a whole with block. I encourage you to at least skim the pathlib documentation for an overview of the cool things you can do with it, and refer to it if you get stuck in the future.

So, since file is not a string, you can’t check whether language is included in it directly. You can easily convert it to one if you need it with str(file); however, I don’t recommend that, since it will check to see if language is anywhere in the path; if it happens to have a parent directory that matches any language string, you won’t get the results you expect. Instead, with a Path object, you can use file.stem to get just the filename, without the full path or extension, which is actually what you want to check. I.e.

        if language in file.stem:

Also, as I noted in your previous questions, you don’t need .with_suffix(".txt") here since the extensions of the files found by .glob("*.txt") must already be .txt. In addition, you don’t need "a+" mode when you append to the file, only "a", since you’re not reading the file, only writing to it (at least as shown above). And you since file is a Path, you might consider replacing

with open(file.with_suffix('.txt'), 'r', encoding='utf-8') as f: language_input = f.read()

with just

language_input = file.read_text(encoding="utf-8")