Use of re.sub for renaming files/strings not working

Hi,

I have a scenario where I need to use an existing function of re.sub for renaming a particular file. we have a file at a particular location with a .zip extension as like abc.zip.
and I have a string like bcd, I need to rename the abc name of the file using re.sub to bcd.
If I use
re.sub('\\*.$', 'bcd', str(name of the file to renamed))
this does not rename the file name correctly what m I doing wrong?

Let’s rewrite your regex string to a raw string where backslash characters \ do not need to be escaped by doubling them:
r'\*.$'

Your regex matches this sequence of characters - searching from the left:

  • \* - an asterisk character *
  • . - any single character
  • $ - end of the string (or line) - This means that the sequence must be at the end of the string.

In other words your regex matches last two characters of any string whose second character from the end is an asterisk *. Your re.sub() will replace those two characters by bcd.


You probably want something like this:
r'^[^.]+'

  • ^ - match only at the beginning of a string (or line)
  • [^.]+' - match one or more repetitions of any characters except a dot .

This regex does not check the suffix. But you can extend it to do that.

sorry, I should have been more clear. actually to clarify it more mystring 1 that I want to replace has nothing in common with the string 2 that i want the name to be replaced to. for example it is like
string 1 =‘xyz’
re.sub(r'^[^.]+', bcd, str(name of the file to renamed))
what will be the regex that I should be using in this case?

I am getting lost in your strings :slight_smile: Let’s use descriptive identifiers for them. Did you test the code?

>>> import re
>>> replacement = 'bcd'
>>> file_name = 'anything.any_suffix'
>>> re.sub(r'^[^.]+', replacement, file_name)
'bcd.any_suffix'

replacement has nothing in common with file_name.

It doesnot seem to work for me :(. I donot see the file getting renamed. Rather it is ended up getting deleted after the rename function is run now sure why?any ideas?

It would help if you posted the smallest complete example code that shows the problem.

I have a function that basically looks at the file json construct and does the renaming:
function:

def rename_file_using_json(workdir, rename_infos):
    for rename_info in rename_infos:
        path = workdir / rename_info['loc']
        for file_name_change in path.glob(rename_info['glob']):
            renamed_file = re.sub(rename_info['starting_regexp'], rename_info['ending_regexp'], str(file_name_change))
            logging.debug(f'Renaming the file {str(file_name_change)} to {renamed_file}')
            file_name_change.rename(renamed_file)

and this is a sample json:

{
                    title: ‘File rename’,
                    type: ‘files_rename’,
                    filename_changes: [
                        {
                            path: 'd:\src\test',
                            glob: '**/*.zip',
                            starting_regexp: '^[^.]+',
                            ending_regexp: 'ccd',
                        },
                        ],
}

Basically, this:

re.sub('^[^.]+', 'ccd', r'd:\src\test')

will replace everything before a dot with ‘ccd’, provided that there’s at least one other character before the dot, so d:\src\test will become ccd.

If you want to replace only the last part after the \, you need:

re.sub(r'[^\\]+$', 'ccd', r'd:\src\test')

Incldentally, your sample JSON doesn’t look like JSON, it looks like Python, and if it is then 'd:\src\test' contains a tab character (the \t):

>>> print('d:\src\test')
d:\src  est

You’d need to either escape the escapes ('d:\\src\\test') or use a raw string literal (r'd:\src\test'). In JSON that would be "d:\\src\\test".

If I have a raw string, how can I specify the raw string in a field like starting_regex for example If i want to use r"[1]+’ in starting regex, what should be the syntax in that case, I cannot change the existing rename function that has the re.sub, so need to find the best way to specify the raw string here in the json construct.

{
                            path: 'd:\src\test',
                            glob: '**/*.zip',
                            starting_regexp: '^[^.]+',
                            ending_regexp: 'ccd',
                        }

I cannot change the existing rename function that has the re.sub, so need to find the best way to specify the raw string here in the json construct.


  1. ^. ↩︎

It seems that you do not know yet what exactly the program should do. First you wanted it to replace part of a filename before its suffix. Now you want it to do something else. You have to decide first what should be the function of the program you are about to write.

  • Specify what could be the inputs of the program.
  • Specify what transformations you want to do on the possible inputs. Try to cover all the cases.
  • Currently you do not know how regular expressions work. If you want to use them you need to practice them from the simplest tasks. Start practicing with simplest regexes and do not continue until you understand them.

What you are showing is not a JSON. What you are showing looks like a dict literal in Python. If it is so (it is a part of Python code) then inside a dict literal you can use any expressions of course including any string literals. I assign a value to the starting_regexp identifier to make it valid Python code. This one is valid:

starting_regexp = 'starting_regexp'

{
    starting_regexp: r'^[^.]+',
}

Note that in this case the content of the string will be exactly the same. Raw strings make difference when the string contains backslashes:

>>> '^[^.]+' == r'^[^.]+'
True
>>> '\\' == r'\\'
False

As Matthew already explained in real JSON (as well as in regular Python strings) you will need to escape backslashes by doubling them:

>>> r"\x" == "\\x"
True

The sequence "\\x" is the same string in Python as well as in JSON.


Note: Enclose parts of code in your text between single backticks otherwise the code could get mangled like your r'^[^.]+' in your last post. Also after that it is much easier to recognize what is a code.

Okay sure, but that construct is a jsonnet string and not python or exactly json so if i use this: r’[1]+’ it errors out


  1. ^. ↩︎

It’s confusing if you call it JSON when it’s “jsonnet”. Try the other solution, namely, escaping the backslashes, so that a backslash is represented by a pair of them: "\\".

I tried this: \\^[^.]+ and it keeps the name of the file as is without renaming it
for example if I use file_name as aries.zip and do the rename using this regex ‘\[1]+’ I get back the same string and not the replacement string, the replacement string let’s say is cancer.zip.


  1. ^. ↩︎

Why are you using that regex? I think the one you want is the [^\\]+$ from my previous post, which, as a (Python) raw string is r'[^\\]+$' and as a JSON or jsonnet string with escaped backslashes is "[^\\\\]+$"

You guys rock! thank you so much "[^\\\\]+$" worked! :slight_smile:

Is there a direct way of converting this windows regex expression to mac? This works on windows but not on mac

Regexes are OS agnostic. There is no such thing as Windows/Linux/FreeBSD/MacOS regex.

Probably you meant a difference in the file paths? In Windows you normally use backslash \ as a path separator, in Unix-like systems (Linux/FreeBSD/MacOS…) you use a slash /. The regex for those systems as a Python/JSON/Jsonnet string would be: "[^/]+$"

1 Like

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. – Jamie Zawinski

Think about how many failed attempts you had trying to get the regex right, and even when you got it, it is unreadable line noise.

Here is the right way to rename a file, keeping the path and the file extension unchanged. It is readable and self-explanatory:

import os
filename = 'fee/fi/fo/fum/abc.xyz'
path, name = os.path.split(filename)
name, ext = os.path.splitext(name)
new_filename = os.path.join(path, "bcd" + ext)

And it works regardless of whether you use forward-slashes (Linux, Mac, Unix and Windows) or backslashes (only Windows).

It is a little bit longer to type, but you can see it is correct at a glance. Unlike this regex:

new_filename = re.sub("[^\\\\]+$", 'bcd', filename)

which can only work with backslashes, that is, if it works at all. Who the hell can tell just by reading it??? Not me, that’s for sure.

If you need to do it a lot of times, put it into a function, and call the function.

Alternatively, you can use pathlib:

import pathlib
filename = pathlib.Path('fee/fi/fo/fum/abc.xyz')
new_filename = filename.with_name('bcd').with_suffix(filename.suffix)
2 Likes

Windows will accept slashes in most places. Do you ever get backslashes in filenames on Linux or Mac? I think it’s unlikely, so you could add a slash to the regex:

new_filename = re.sub("[^/\\\\]+$", 'bcd', filename)
1 Like

Regexes are OS agnostic, but they aren’t language agnostic. Different programming languages have different regex features and syntax.

Windows pathnames can use forward slashes interchangeably with backslashes. If you need to specify a pathname in Python, there is very little good reason to fight with backslashes and raw strings when you can just use forward slashes in a file-system independent way.

I think the only exception is if you are calling out to an external Window application running in the Windows shell, there are some places where you can’t use forward slashes because the shell thinks they specify command line options.