Use of re.sub for renaming files/strings not working

var · October 26, 2022, 6:09am

Hi,

I have a scenario where I need to use an existing function of re.sub for renaming a particular file. we have a file at a particular location with a .zip extension as like abc.zip.
and I have a string like bcd, I need to rename the abc name of the file using re.sub to bcd.
If I use
re.sub('\\*.$', 'bcd', str(name of the file to renamed))
this does not rename the file name correctly what m I doing wrong?

vbrozik · October 26, 2022, 7:58am

Let’s rewrite your regex string to a raw string where backslash characters \ do not need to be escaped by doubling them:
r'\*.$'

Your regex matches this sequence of characters - searching from the left:

\* - an asterisk character *
. - any single character
$ - end of the string (or line) - This means that the sequence must be at the end of the string.

In other words your regex matches last two characters of any string whose second character from the end is an asterisk *. Your re.sub() will replace those two characters by bcd.

You probably want something like this:
r'^[^.]+'

^ - match only at the beginning of a string (or line)
[^.]+' - match one or more repetitions of any characters except a dot .

This regex does not check the suffix. But you can extend it to do that.

var · October 26, 2022, 8:30am

sorry, I should have been more clear. actually to clarify it more mystring 1 that I want to replace has nothing in common with the string 2 that i want the name to be replaced to. for example it is like
string 1 =‘xyz’
re.sub(r'^[^.]+', bcd, str(name of the file to renamed))
what will be the regex that I should be using in this case?

vbrozik · October 26, 2022, 8:39am

I am getting lost in your strings Let’s use descriptive identifiers for them. Did you test the code?

>>> import re
>>> replacement = 'bcd'
>>> file_name = 'anything.any_suffix'
>>> re.sub(r'^[^.]+', replacement, file_name)
'bcd.any_suffix'

replacement has nothing in common with file_name.

var · October 26, 2022, 8:16pm

It doesnot seem to work for me :(. I donot see the file getting renamed. Rather it is ended up getting deleted after the rename function is run now sure why?any ideas?

MRAB · October 26, 2022, 11:40pm

It would help if you posted the smallest complete example code that shows the problem.

var · October 27, 2022, 12:00am

I have a function that basically looks at the file json construct and does the renaming:
function:

def rename_file_using_json(workdir, rename_infos):
    for rename_info in rename_infos:
        path = workdir / rename_info['loc']
        for file_name_change in path.glob(rename_info['glob']):
            renamed_file = re.sub(rename_info['starting_regexp'], rename_info['ending_regexp'], str(file_name_change))
            logging.debug(f'Renaming the file {str(file_name_change)} to {renamed_file}')
            file_name_change.rename(renamed_file)

and this is a sample json:

{
                    title: ‘File rename’,
                    type: ‘files_rename’,
                    filename_changes: [
                        {
                            path: 'd:\src\test',
                            glob: '**/*.zip',
                            starting_regexp: '^[^.]+',
                            ending_regexp: 'ccd',
                        },
                        ],
}

MRAB · October 27, 2022, 1:40am

Basically, this:

re.sub('^[^.]+', 'ccd', r'd:\src\test')

will replace everything before a dot with ‘ccd’, provided that there’s at least one other character before the dot, so d:\src\test will become ccd.

If you want to replace only the last part after the \, you need:

re.sub(r'[^\\]+$', 'ccd', r'd:\src\test')

Incldentally, your sample JSON doesn’t look like JSON, it looks like Python, and if it is then 'd:\src\test' contains a tab character (the \t):

>>> print('d:\src\test')
d:\src  est

You’d need to either escape the escapes ('d:\\src\\test') or use a raw string literal (r'd:\src\test'). In JSON that would be "d:\\src\\test".

var · October 27, 2022, 5:20am

If I have a raw string, how can I specify the raw string in a field like starting_regex for example If i want to use r"^[1]+’ in starting regex, what should be the syntax in that case, I cannot change the existing rename function that has the re.sub, so need to find the best way to specify the raw string here in the json construct.

{
                            path: 'd:\src\test',
                            glob: '**/*.zip',
                            starting_regexp: '^[^.]+',
                            ending_regexp: 'ccd',
                        }

I cannot change the existing rename function that has the re.sub, so need to find the best way to specify the raw string here in the json construct.

^. ↩︎

vbrozik · October 27, 2022, 8:05am

It seems that you do not know yet what exactly the program should do. First you wanted it to replace part of a filename before its suffix. Now you want it to do something else. You have to decide first what should be the function of the program you are about to write.

Specify what could be the inputs of the program.
Specify what transformations you want to do on the possible inputs. Try to cover all the cases.
Currently you do not know how regular expressions work. If you want to use them you need to practice them from the simplest tasks. Start practicing with simplest regexes and do not continue until you understand them.

What you are showing is not a JSON. What you are showing looks like a dict literal in Python. If it is so (it is a part of Python code) then inside a dict literal you can use any expressions of course including any string literals. I assign a value to the starting_regexp identifier to make it valid Python code. This one is valid:

starting_regexp = 'starting_regexp'

{
    starting_regexp: r'^[^.]+',
}

Note that in this case the content of the string will be exactly the same. Raw strings make difference when the string contains backslashes:

>>> '^[^.]+' == r'^[^.]+'
True
>>> '\\' == r'\\'
False

As Matthew already explained in real JSON (as well as in regular Python strings) you will need to escape backslashes by doubling them:

>>> r"\x" == "\\x"
True

The sequence "\\x" is the same string in Python as well as in JSON.

Note: Enclose parts of code in your text between single backticks otherwise the code could get mangled like your r'^[^.]+' in your last post. Also after that it is much easier to recognize what is a code.

var · October 27, 2022, 6:48pm

Okay sure, but that construct is a jsonnet string and not python or exactly json so if i use this: r’^[1]+’ it errors out

^. ↩︎

MRAB · October 27, 2022, 7:09pm

It’s confusing if you call it JSON when it’s “jsonnet”. Try the other solution, namely, escaping the backslashes, so that a backslash is represented by a pair of them: "\\".

var · October 27, 2022, 7:13pm

I tried this: \\^[^.]+ and it keeps the name of the file as is without renaming it
for example if I use file_name as aries.zip and do the rename using this regex ‘\^[1]+’ I get back the same string and not the replacement string, the replacement string let’s say is cancer.zip.

^. ↩︎

MRAB · October 27, 2022, 7:36pm

Why are you using that regex? I think the one you want is the [^\\]+$ from my previous post, which, as a (Python) raw string is r'[^\\]+$' and as a JSON or jsonnet string with escaped backslashes is "[^\\\\]+$"

var · October 27, 2022, 9:34pm

You guys rock! thank you so much "[^\\\\]+$" worked!

var · October 28, 2022, 6:32am

Is there a direct way of converting this windows regex expression to mac? This works on windows but not on mac

vbrozik · October 28, 2022, 11:22am

Regexes are OS agnostic. There is no such thing as Windows/Linux/FreeBSD/MacOS regex.

Probably you meant a difference in the file paths? In Windows you normally use backslash \ as a path separator, in Unix-like systems (Linux/FreeBSD/MacOS…) you use a slash /. The regex for those systems as a Python/JSON/Jsonnet string would be: "[^/]+$"

steven.daprano · October 28, 2022, 11:23am

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. – Jamie Zawinski

Think about how many failed attempts you had trying to get the regex right, and even when you got it, it is unreadable line noise.

Here is the right way to rename a file, keeping the path and the file extension unchanged. It is readable and self-explanatory:

import os
filename = 'fee/fi/fo/fum/abc.xyz'
path, name = os.path.split(filename)
name, ext = os.path.splitext(name)
new_filename = os.path.join(path, "bcd" + ext)

And it works regardless of whether you use forward-slashes (Linux, Mac, Unix and Windows) or backslashes (only Windows).

It is a little bit longer to type, but you can see it is correct at a glance. Unlike this regex:

new_filename = re.sub("[^\\\\]+$", 'bcd', filename)

which can only work with backslashes, that is, if it works at all. Who the hell can tell just by reading it??? Not me, that’s for sure.

If you need to do it a lot of times, put it into a function, and call the function.

Alternatively, you can use pathlib:

import pathlib
filename = pathlib.Path('fee/fi/fo/fum/abc.xyz')
new_filename = filename.with_name('bcd').with_suffix(filename.suffix)

MRAB · October 28, 2022, 1:47pm

Windows will accept slashes in most places. Do you ever get backslashes in filenames on Linux or Mac? I think it’s unlikely, so you could add a slash to the regex:

new_filename = re.sub("[^/\\\\]+$", 'bcd', filename)

steven.daprano · October 29, 2022, 12:07am

Regexes are OS agnostic, but they aren’t language agnostic. Different programming languages have different regex features and syntax.

Windows pathnames can use forward slashes interchangeably with backslashes. If you need to specify a pathname in Python, there is very little good reason to fight with backslashes and raw strings when you can just use forward slashes in a file-system independent way.

I think the only exception is if you are calling out to an external Window application running in the Windows shell, there are some places where you can’t use forward slashes because the shell thinks they specify command line options.

Topic		Replies	Views
[SOLVED] Problem in version - delete the topic Python Help	9	356	October 21, 2023
Rename the files using the folder name and copy, move to new directory. Python Help	3	487	February 17, 2024
Why is the output different from what I thought? Python Help	9	354	September 3, 2023
Refenecing named capture groups in re.sub deprecated? Python Help documentation , help	1	1038	July 29, 2022
Trying to take user input and search filenames for ones containing the user input Python Help	3	4766	December 4, 2022

Use of re.sub for renaming files/strings not working

Related Topics