Make `SyntaxWarning` for invalid escape sequences better reflect their intended deprecation

I would like to propose that the SyntaxWarning that is raised when an invalid escape sequence is used be updated to better reflect the fact that the ability of Python users to employ invalid escape sequences is intended to be removed in a future Python release.

At present, if one runs the code path = 'C:\Windows' in Python, they get this warning: SyntaxWarning: invalid escape sequence '\W'.

I argue that that warning is not as semantically meaningful as it could be. It does not immediately convey to the untrained and/or uninitiated that path = 'C:\Windows' will in fact break in a future Python release.

What is a better of way communicating that? How about, SyntaxWarning: '\W' is currently an invalid escape sequence. In the future, invalid escape sequences will raise a SyntaxError. Did you mean '\\W'?.

That message is a little bit longer but it immediately tells the user, without the need for heading to Stack Overflow or Python documentation, that:

  1. Although the code runs today, it won’t run soon!
  2. You can fix the code easily, just add an extra backslash.

Whereas all that SyntaxWarning: invalid escape sequence '\W' tells me is, at best, hey, there’s something wrong here, but you’ve gotta figure out what that is. Someone could easily read that message and think maybe Python is trying to be helpful and make me double check that I didn’t actually mean to type a valid escape sequence like \f or \n.

A message like SyntaxWarning: '\W' is currently an invalid escape sequence. In the future, invalid escape sequences will raise a SyntaxError. Did you mean '\\W'? makes it much more difficult to come away with the message that the Python developers might not like what I’ve just done but it works and it’ll keep working forever.

Asides

  • If you have an issue with the wording of the proposed message, please suggest an alteration instead of rejecting the overall need to make the message more semantically meaningful.
  • This suggestion originates in my thread ‘Please don't break invalid escape sequences’.
6 Likes

Simply open a new issue. It is a minor change which does not require wide discussion. You only need to find a core developer which accepts your change (it’s not me, I have no strong opinion about the wording).

5 Likes

Cheers, will do it tomorrow :slight_smile:

I do agree we can improve our communication on why this deprecation is happening.

Is there already a good page on docs.python.org explaining this? If not, we could perhaps add something and hope that it gets picked up when users Google “python invalid escape sequence” or similar.

3 Likes

Don’t know if there is a consensus about a recommended fix for invalid escape sequences. Is it doubling the backslashes or a raw string? Or “it depends …”?

The verbose warning should suggest the recommended way (if it exists) or both options.

It really depends. If it’s in a regex it’s probably “use a raw string to be precise and readable”. If it’s a one-off in a piece of prose it might be easier to use \\. And in some cases the answer is to remove the \ entirely–sometimes people like to sprinkle backslashes around because they aren’t sure what’s going wrong.

I don’t think the warning needs to be that verbose, but a docs page could get into it.

6 Likes

If the string contains valid escape sequences (\n, \x..., \u..., \ + newline , etc) which should be interpreted specially, not literally, you cannot use raw string literal (except when this is a regular expression, where most escape sequences are interpreted in the same way). In most other cases raw strings are preferable. So it depends.

9 Likes

I think double blackslash is preferred because it is guaranteed to work. If someone runs message = 'Hello\n World and\or fizz buzz' and you suggest they do message = r'Hello\n World and\or fizz buzz', you’ve now broken the \n.

In practice, a raw string is typically best, but categorically advising the use of raw strings could have unintended consequences, whereas a double backlash is much less likely to have that effect, though its a little more verbose.

I have created the GitHub issue here Make `SyntaxWarning` for invalid escape sequences better reflect their intended deprecation ¡ Issue #128016 ¡ python/cpython ¡ GitHub

1 Like

… but I would suggest they do message = "Hello\n World and/or fizz buzz" because it’s more grammatically correct. :slight_smile:

1 Like
path = 'C:\Windows\new_folder'

It’s challenging to communicate all of the following succinctly:

  • That the user has tried to use \W which is an invalid escape sequence
  • That invalid escape sequence may\will raise SyntaxError in the future
  • The users resolution options are either escape the backslash (replace \ with \\) or mark the entire string as a raw string literal.

Here’s the proposal from the OP:

SyntaxWarning: '\W' is currently an invalid escape sequence. In the future, invalid escape sequences will raise a SyntaxError. Did you mean '\\W'?

As has been pointed out, I think the “Did you mean” is too strong of a suggestion towards escaping the backslash when marking the string literal as raw may be an as good or better solutionn

Here’s another (unfortunately even more verbose) proposal for discussion/pruning.

SyntaxWarning: '\W' an invalid escape sequence. In future python versions this may raise a SyntaxError. Consider escaping the backslash ('\\W') or marking the string literal as raw.
1 Like

I don’t think the warning itself needs to have all the verbosity, so long as it has enough hints that someone who searches for information will find it. It already has invalid escape sequence in it, so that part is covered. Everything else is a tradeoff between giving suggestions and having too many words.

Here’s a less verbose version:

file:line:SyntaxWarning: '\W' is an invalid escape sequence, and will ultimately be an error. Did you mean '\\W'?

It has one suggestion - the one that’s guaranteed to work without having unwanted side effects. It’s short enough that people will hopefully actually read it. And it conveys the important information that this is to become an error.

4 Likes

It was intended as an example.

Did you mean '\\W'? is concise and also consistent with the SyntaxWarning you get for trying to run 1 is 1: "is" with 'int' literal. Did you mean "=="?. Arguably, you could make that message more verbose by also suggesting isinstance() as an option (as the warning will also get raised with 1 is int and if you’re trying to run code like that, my strong suspicion is that you probably want to use isinstance() and not ==).

In this case, however, there is no such ambiguity with the recommendation of '\\W'. It will work where a raw string might not. A raw string is nicer for certain uses, agreed, but its better to keep the message simple and give a quick immediate fix that is guaranteed to work. I agree with @Rosuav on this.

I appreciate your efforts to condense the message. However, the reason I proposed '\W' is currently an invalid escape sequence. In the future, invalid escape sequences will raise a SyntaxError. is that, if we say categorically, for example, \e is an invalid escape sequence and will break in the future, that’s not necessarily correct. As you highlighted in our other thread, \e is intended to be added to Python after removing invalid escape sequences. Therefore, there is a possibility where we jump from \e raising a warning to \e working perfectly.

The segmentation of the message into two statements: '\e' is currently an invalid escape sequence and In the future, invalid escape sequences will raise a SyntaxError is intended to convey that, right now \e belongs to the category of invalid escape sequences, and, in the future, the category of invalid escape sequences will not work. Which is not the same as saying, in the future, \e will not work.

Otherwise, I do like how you have simplified my message.

I think we can revise it to:
"\W" is an invalid escape sequence. Such sequences will not work in the future. Did you mean "\\W"?

This is only slightly longer than your revision but does not guarantee to the user that \W will not work in the future, only that invalid escape sequences will not work.

I also switched to double quotations as I use single quotations due to force of habit of Australian English.

2 Likes

Yeah, that’s fair.

Yep, I like that, that’s better than mine. The choice of single or double quotes doesn’t bother me much either way, I just used single since the existing warning does. I doubt anyone will be bothered either way.

3 Likes

@hugovk commented on my GitHub issue:

We can remove “Such sequences”:

"\W" is an invalid escape sequence and will not work in the future. Did you mean "\\W"?

Let’s also mention raw strings, as in many cases raw strings are a cleaner and easier solution.

In the thread you say:

In practice, a raw string is typically best, but categorically advising the use of raw strings could have unintended consequences, whereas a double backlash is much less likely to have that effect, though its a little more verbose.

This is a suggestion, it doesn’t have to cover all the edge cases. We can add more detail to the docs.

My response was:

We can remove “Such sequences”:

@hugovk The reason I included ‘Such sequences’ is that this warning could raise for \e and then read as "\e" is an invalid escape sequence and will not work in the future which may not be true as I believe the intention is in fact to add \e as a new escape. Thus, it is not that \e won’t work in the future, it is that, specifically, invalid escape sequences (which \e currently belongs to but may not always) will not. As I mentioned in the thread, “there is a possibility where we jump from \e raising a warning to \e working perfectly.”

It’s a bit pedantic but the current construction avoids ingraining into developers that \e or \W or whatever else is always invalid slightly less.

With all that said, if you still feel that its better to run with is an invalid escape sequence and will not work in the future, I’m happy to go with that :slight_smile:

Let’s also mention raw strings, as in many cases raw strings are a cleaner and easier solution.

Hmm… We could do "\W" is an invalid escape sequence. Such sequences will not work in the future. Did you mean "\\W"? A raw string is also an option.?

That way the user gets an immediate suggested fix guaranteed to work (in line with the broader reasoning of why Did you mean "=="? was added despite the fact that if you run 1 is int you probably want to be using isinstance()) and we also let them know raw strings are an option. Anyone familar with raw strings will then immediately know which they really want to be using and anyone not familar with raw strings will realise they’re worth learning more about.

I’m crossposting here for visibility.

I created the pull request: gh-128016: Improved invalid escape sequence warning message by umarbutler ¡ Pull Request #128020 ¡ python/cpython ¡ GitHub

2 Likes

If someone wants the code to not be broken, they should probably use pathlib instead.

path = 'C:\Windows\new_folder'

If anything, this example shows why the error message should not attempt to guess or provide suggestions. \n is a valid escape sequence, but the code is still broken.

That’s definitely an option, although it’s more of a change.

path = 'C:\Windows\new_folder' # broken
path = pathlib.Path("C:") / "Windows" / "new_folder" # working but now quite different
path = r'C:\Windows\new_folder' # working
path = 'C:/Windows/new_folder' # working

Suggestions are just that - suggestions. It’s absolutely fine for them to be imperfect or incomplete; but also, it is completely okay for them to change in subsequent Python versions. Perhaps a heuristic “this looks like a Windows path name” (eg a string literal beginning with a letter, then a colon, then a backslash) could recommend switching to a raw string, where others might recommend switching to doubled backslashes. But we don’t have to get this perfect immediately; changes to those sorts of messages don’t break backward compatibility, and they’re a useful enhancement for later.

5 Likes

As @Rosuav has stated, a suggestion is just that, a suggestion. But actually in the example you provided, the warning message would alert the user that they need to start using \ or raw strings for their paths.

By the way, the new warning message in the PR actually explicitly mentions raw strings and raw string would solve the problem you just raised.

2 Likes

Indeed, I switched to using pathlib for all my system path needs long ago to totally sidestip this issue because, at the time, I didn’t understand escape characters and raw vs. non-raw strings. I’ve been recommending pathlib to my colleagues for years because of this.

Sadly even with pathlib I have to remember a little bit about python backslash rules because of how drive letters work on windows:

Path("C:", "Users", ...)  # Doesn't do the right thing, sadly (to me)
Path("C:\", "Users", ...) # Syntax Error
Path(r"C:\", "Users", ...) # Syntax Error
Path("C:\\", "Users", ...) # Works
Path(r"C:\\", "Users", ...) # Works

Path("C:/", "Users", ...) # Works
Path("C://", "Users", ...) # Works
Path(r"C:/", "Users", ...) # Works
Path(r"C://", "Users", ...) # Works

So I have to remember

(1) that I must put a forward or backslash at the end of the drive letter or pathlib doesn’t do what I want
(2) but I can’t end a string in a single backward slash (and this is the one case where raw strings DON’T save me)
(3) using double slash will save me
(4) switching to forward slashes will save me
(5) mercifully the combination of (3) and (4) does NOT mess things up.