Trying to take user input and search filenames for ones containing the user input

Hi! So, I am a noob to Python, though i in the past have dabbled in C and Java -

A program i started making as i try to pick up Python, is one that will be able to search a directory of files, and when it finds a filename with certain string in it, it’ll take the file, get rid of the junk text, and copy the new file with the fixed filename to a destination folder.

Example. You have a folder, it’s got a file in it named randomsong_abc[gobbledegook.com].mp3

The program runs, the file gets copied to another predefined folder, and it’s now named randomsong_abc.mp3

At this point, i have this working fully - and it works as is, by having the junk text hard-coded in the program specific things to look for in the filename.
I decided to make another function in this program doing the same thing, BUT to ask for and take user input, and then try to search for THAT.

I am having trouble chasing this down.

With it working so far- that was done using
sys , pathlib (well, pathlib import Path)
and shutil

I can’t find out how to pass user input, to pathlib properly…

Here’s the functional module



def remove_junk_in_title(folder_of_stuff_To_Edit,output_Folder):
    print("\n\n")
    for file in folder_of_stuff_To_Edit.glob("*[[]www.gobbledegook.com[]]*"): 

        print(f'{file}, Files filtered')
        

        renamedfile = file.name.replace(" [www.gobbledegook.com]", "")
        

        RenamedFileWithNewPath = output_Folder / renamedfilename
        print(f' Final Filename will be {renamedfilename},\n Final file will be saved as {RenamedFileWithNewPath}')

      
        shutil.copy2(file, RenamedFileWithNewPath)

    print('List of Corrected Files in proper location:')

    for CorrectlyRenamedFile in output_Folder.glob('*.*'):
        print(f'{CorrectlyRenamedFile}')

So that one works, here’s the function i’m working now
I’m just starting out trying to get it to print matching filenames, before i copy the rest of the
code over for it to make the change and put the new named files in a location.


def remove_specific_junk_in_filename(to_edit_folder, edited_folder):

for unedited_files in folder_of_stuff_To_Edit.glob('*.*'):
    print({unedited_files})    #Listing what we have, to select from

junk_text = input(" Enter string you want to remove from  all filenames containing it")

for file in to_edit_folder.glob(junk_text):  
    print(f'{file}, searched, File filtered. \n')

This doesn’t work-

I’ve been searching for examples and looking at the documentation on passing something to .glob, since i think the issue is at the for file in stuff_To_Edit.glob(junk_text) section. I want the user to be able to search directly- and when i test it right now and try to have it print it right back out, nothing prints out when i type in [www.gobbledegook.com].

I asked some resources ,and got the answer back that i needed to recompile the thing i’m searching for in filenames as regex. I was told to use re.compile, in a way like

 pattern = re.compile(junk_text)

for file in stuff_To_Edit.glob(pattern):  
            print(f'{file}, searched, File filtered. \n'
  • but that wasn’t quite it- I got a attribute error. Looked into it as well, but nothing resembled what i’m trying to do
AttributeError: 're.Pattern' object has no attribute 'replace'

I am still running into dead ends trying to look up how to send a variable to .glob
I DID briefly try , a few variations of

 file in stuff_To_Edit.glob(input())

to get it to directly take input, but this didn’t work. I’m back to attempting to find a good example of how to use pathlib to search filenames, but involving the user inputting the search term instead of pre-defined hardcoded text. I haven’t seen anything involving pathlib about re.compile,either, but am still looking…

What am i missing here?

Welcome! You are very close, just missing one thing to get your code to work, but I think this is a great opportunity to walk you through finding it on your own and give you the tools to do so in the future, as well as help you improve your code in the process.

First of all, I want to commend you for including code and output in formatted code blocks; providing a detailed, step by step summary of what you’re trying to do, how you tried to do it, and what didn’t work; and using a lot of the right tools for the job—f-strings, pathlib, globs, etc. It both looks like you have pretty much all the building blocks you need in place, and you’ve provided enough information to make it relatively clear what the issue is each time, and you’re really just one step away from having something working.


One suggestion, though—in general, to those trying to help, it’s not very useful to say things like

and

Describe, like you did the second time, what didn’t work—did you get an error? Did it not find any files? Did it not find the right files? Otherwise, we are forced to guess and try to reverse-engineer what actually happened.

I also did notice a lot of what appeared to be internal inconsistencies in the variable names in the code you posted—for example, you have:

Where the filename is assigned to the variable renamedfile, but then use the variable renamedfilename in the next line (where it would appear it is intended to be the same name). Likewise, the second code block has:

where you use folder_of_stuff_To_Edit in the first for loop, but a differently-named path, stuff_to_Edit (which I assume is meant to be the same) in the second.


Additionally, there are a number of instances where the indentation is missing/incorrect (e.g. your second code block doesn’t have any indentation under the function, while others have extra spurious indentation), which is significant in Python. Finally, you have

where unedited_files in fact should be unedited_file, as it is only one file per iteration, and the print function prints a set with one element, the Path object for the current file. I assume you meant to do print(f"{unedited_file}"); doing print(str(unedited_file)) avoids making this mistake.

Now, I’m assuming that these errors are not present in the actual code you ran, or you’d either see a hard error on execution or incorrect behavior. However, make sure they are not, and ensure that you provide the actual code you actually executed here (along with its output), otherwise it is impossible for us to isolate the issue with any confidence. In general, following the widely-used standard PEP 8 style guide (in particular, lower_snake_case for variable, function and method names) helps to avoid these inconsistencies.


In any case, the key strategy here is to look at your code and compare exactly what you are doing differently between the case that worked and the case that didn’t. Since computers are (usually) deterministic machines, if one bit of code worked and another didn’t, unless you changed the files in the directory you’re searching, it must be because of something that changed between the first and the second code snippet. It is key to minimize any unnecessary differences between the two code snippets—I note a few above (different variable names, different format strings).

To note, this is why its important to break your code down into small functions and avoid the use of global variables, as it makes the program easier to reason about and minimizes the number of free variables that could impact program execution at any given point—only the arguments (inputs) of the function, not any variable in your program. At least from what you’ve shown, you’ve done a pretty good job of that; it could be improved by moving your hardcoded and user-specified match pattern outside the function and passing it via an argument.

Pulling the hardcoded value and input call, respectively, outside the function and passing it as an argument, as well as avoiding other unnecessary differences and fixing the above-mentioned issues in the pasted code, we have the function:

def remove_specific_junk_in_filename(source_dir, match_text):
    for file in source_dir.glob('*.*'):
        print(f"{file}")  # List what we have, to select from

    for file in source_dir.glob(match_text):  
        print(f'{file}, searched, File filtered. \n')

You can then call it on your hardcoded input:

stuff_to_edit = Path(...)  # The directory of files to search
junk_text = "*www.gobbledegook.com*"
remove_specific_junk_in_filename(source_dir=stuff_to_edit, match_text=junk_text)

You should see that it prints the expected file(s).

Now, let’s try on some user input:

stuff_to_edit = Path(...)  # The directory of files to search
junk_text = input("Enter string to match: ")
remove_specific_junk_in_filename(source_dir=stuff_to_edit, match_text=junk_text)

Assuming you entered www.gobbledegook.com, the string you wanted removed, what happened? You should have seen that nothing is matched, and no output is generated, just like in your test above. You might already be able to spot why, if you think about what is different between junk_text between these two examples. If not, here’s where basic debugging comes in: as you’ve isolated everything down to only one thing different between the hardcoded and user-input case—the value of junk_text—you know the difference has to be in what that value is. Here, you can use one of any number of debugging techniques:

  • Use a print() call (run print(junk_text) after the input line)
  • Use a logging call (overkill here, but very useful once you’re writing larger applications)
  • Use a debugger like Python’s pdb, IPython’s ipdb or one built in to your IDE (set a breakpoint on the line after the input, and inspect the value of junk_text there)
  • Use your IDE’s features, e.g. Spyder’s Variable Explorer, to see directly what the value is after executing the code

Now, you should be able to see one very specific difference between user input and your hardcoded case. What’s missing in the former? To fix it, use e.g. f-string to add the missing characters, either right after the input call outside the function, or in the glob call within the function.

With that fixed, everything should work, and you can add back the rest of the logic to copy the file to a renamed path.


You could also just do the matching yourself without relying on glob, e.g. with the in operator on the file name, which avoids the issue you discovered above:

def remove_specific_junk_in_filename(source_dir, match_text):
    for file in source_dir.glob('*.*'):
        if match_text in file.name:
            print(f'{file}, searched, File filtered. \n')

You can also match the filename via more complicated regexes, or other methods beyond simple globs.


You’re trying to pass a compiled regex pattern object to the glob method, which expects a glob pattern string. To quote Scotty, “Ye cannae mix th’ regex an’ th’ glob, the ship’ll explode!” Even if you passed a regex string, it still wouldn’t work, as the glob method expects glob syntax, which is mostly incompatible with regular expression syntax. Also, there’s usually no need to run re.compile on the pattern, just use it directly in re.search, re.sub, etc.

If you did want to use regex, you could use a modified version of the above:

def remove_specific_junk_in_filename(source_dir, match_text):
    for file in source_dir.glob('*.*'):
        if re.search(pattern=match_text, string=file.name):
            print(f'{file}, searched, File filtered. \n')

But it’s overkill for your problem; you’d need to escape the input match_text first with re.escape unless you wanted it treated as an arbitrary regex pattern.


This has the same basic flaw as the first one—unless you are actually including the glob syntax (*) in the input, it won’t be magically inserted for you. However, as described previously, you really want to perform your input call outside this function, and at as high a level as possible.

1 Like

First off- you’re a friggin genius, and pretty amazing
Second- I think i’ve got it - all due to you. I had not believed i was that close to it
Third- I feel sorta stupid - but on the other hand, I see why i couldn’t get it, and …I am not surprised i was banging my head against the wall for quite a while on this … ugh

Anyway, first things first- I’ve been copying around these examples, slightly changing variable names, and you caught that, yes, it was all supposed to be the same- your code analysis saw that accurately.

Second, thank you for putting up with my variable errors and bad formatting- i’ll admit i don’t follow pep 8 exactly, but Pycharm does point all of this out, and i do strive to 95% follow the convention, while i’m learning Python. Though my old code habits are still there…

Third- I’m surprised at the idea of moving the user input outside. Also, my understanding is to avoid global variables, so even though when i built this i was thinking that global variables would be useful, my mind said “Nope, dont do it” - As for user input outside of the function, i had thought having the function get it would make it more self contained. When i first made that post , the program is just 2 functions and a main with the menu- so it seems odd to move that to main when the general gist

I do realize now i could have gotten it worked fully contained, - it’s not technically impossible, which is a bit of relief to me.
I was worried for a second there reading your response that it HAD to be outside the function- illogical stray thoughts, lol- …i was worried there were nuances to Python i didnt know for a second

Anyway,
I did a bunch of experimentation with prints indeed regarding solving this , and this was what nailed it

   
def remove_specific_junk_in_filename(folder_of_stuff_To_Edit, save_to_Folder,junk_text):

    print("You're in function 2\n")
    #print(junk_text)
    #print(f"*{junk_text}*")
    #print("*[[]www.gobbledegook.com[]]*")

    for file in folder_of_stuff_To_Edit.glob(f'*{junk_text}*'):
        print(f'{file}, searched, File filtered, search results. \n')

I DID stare at these for a long time

stuff_to_edit = Path(...)  # The directory of files to search
junk_text = "*www.gobbledegook.com*"
remove_specific_junk_in_filename(source_dir=stuff_to_edit, match_text=junk_text)
stuff_to_edit = Path(...)  # The directory of files to search
junk_text = input("Enter string to match: ")
remove_specific_junk_in_filename(source_dir=stuff_to_edit, match_text=junk_text)
  • It ended up requiring me to experiment a lot still, because i did not know f-strings could be put in glob’s field. I think i sort of got stuck in a rut getting it to work once, and thinking that nothing indicated it’d work with anything else- which made me panic a little

I have not yet seen in any of the multiple tutorials i’m using to learn python- a fstring used as a input, or a loop like that. I’m going to have to mess with using F strings to change strings in more situations than just printing out something to the user…ooof

This is a golden rule i am going to have to remember -

" In any case, the key strategy here is to look at your code and compare exactly what you are doing differently between the case that worked and the case that didn’t. Since computers are (usually) deterministic machines, if one bit of code worked and another didn’t, unless you changed the files in the directory you’re searching, it must be because of something that changed between the first and the second code snippet."

Also, i’ll have to mess with this too, haven’t tried it yet- but this isn’t something i’ve worked with before

def remove_specific_junk_in_filename(source_dir, match_text):
    for file in source_dir.glob('*.*'):
        if match_text in file.name:
            print(f'{file}, searched, File filtered. \n')

I’ve seen python code snippets with the keyword
in
but, …didn’t fully understand it or how widely it can be used.

-I’m ashamed to admit, i also feel bad because i’m …a tiny bit decent at searching in bash using regex(that’s like the limit of my regex knowledge ,anything more is literal gobbledegook to me )
-and that’s how you’d search in bash for a filename like that. I feel like it should have been easier for me to put together 2 and 2 here- but not knowing what python will and won’t allow trying to use stuff like that- is daunting.

That’s funny on the re.compile bit, that code was directly advised to me when i asked for help elsewhere…in any case, thank you for clearing my confusion over why i wasn’t seeing more mixing of regex and variables and glob.

C.A.M. Gerlach , Thank You again. This was an excellent bit of advice. You deserve a medal for going through my massive post and sorting out what i was doing.

1 Like

Update for anyone reading this in the future-

It turned out, i needed a LITTLE more tweaking on the output- of the regex, because of the brackets
The above isn’t fully correct on the filtering because of the filtering- but the below is, on the replace as well

Since some junk text would be something like [www.junk.com] , and if you don’t account for that , it causes the program to copy (but not alter) everything in the input folder to the output folder, while altering the ones that contained the junk.

 for file in folder_of_stuff_To_Edit.glob(f'*[[]{junk_text}[]]*'):                     
        print(f'{file.name}, searched, File filtered, search results. ')
        

        renamedfilename = file.name.replace(f" [{junk_text}]", "")

Those brackets took a bit of work to get, but THIS is how you highlight them

it’s [{junk_text}]
originally i thought i’d have to tell the user to include brackets, but…if you do it that way, it’s no easier. ugh…

So, we’ll go with this, and i’ll have a note that brackets will be removed as well.