Please don't break invalid escape sequences

Rosuav · December 14, 2024, 10:50pm

That’s actually a really bad rule to follow. If you stick a backslash in front of an n you don’t get an escaped n, you get a newline character. Using backslashes just because you aren’t sure is a good way to make code that won’t work reliably.

It is a huge gain in that future versions of Python can introduce new escape sequences (I mentioned \e which is glaringly missing from Python, and others would be possible) without causing working code to change in meaning. By making invalid escape sequences into errors, we give ourselves the option to improve the language.

Really? I’ve used grep on non-ASCII files without issues. I’ve no idea what file viewers you’re using, but it’s 2024, they really should be Unicode-aware.

Wombat · December 15, 2024, 2:46am

This response seems somewhat dismissive. It is common to generate docs directly from docstrings using Sphinx. This naturally leads to putting reStructuredText directly in docstrings. And since Sphinx has a LaTeX builder, you can expect LaTeX markup in the docstrings when needed. That may not be how you do it, but that doesn’t invalidate other people’s choices.

Regardless of the particular choice of tooling, there almost certainly people legitimately using LaTeX in docstrings. In Jupyter notebooks, it is common to use LaTeX and it wouldn’t be surprising to see that spill into docstrings even if there is no downstream rendering engine (once you learn to write LaTeX formulas, it seems like a reasonable way to express simple formulas in plain text when there are no other options).

Until now there has be no restriction on what can be done with docstrings. For example, David Beazley’s wonderful PLY tool uses docstrings for BNF grammar rules and dynamic dispatch.

Rosuav · December 15, 2024, 2:49am

Yeah, I don’t think it’s inherently wrong to put LaTeX into a docstring - but I do think that, if you do, it should probably be a raw string literal.

mikeshardmind · December 15, 2024, 3:06am

The reality is that LaTeX’s syntax is not very friendly for inclusion in something parsed in another language’s syntax, including that of python. I argue that it is doing it wrong because there’s only two good outcomes here.

you carefully ensure it’s valid python and LaTeX simultaneously, and then have an equation in your code that someone has to visualize or go render anyhow.
You don’t put it in the python code, but still make it visible to documentation tools. You no longer have to think about the clash, and it’s still rendered in your documentation.

If you find that option 1 works for you without issue, then use option one and feel free to disregard the part where I think LaTeX in docstrings isn’t useful to whoever is reading the docstring because it isn’t rendered there.

However, if you were using option 1, this change won’t break you, and I don’t find “you put it in uncarefully, such that this change breaks you” a thing worth consideration, because this mindset doesn’t just prevent this change, but also prevents adding new escape sequences ever, without it breaking someone.

bavalpey · December 15, 2024, 4:14am

Wow this thread is quite ridiculous.

Let’s all take a step back and think about a few things.

Making invalid escapes errors into a syntax error will break existing code

Regardless of your opinions on the quality of the code that is to be broken, the fact is that breakage will occur.

This is the crux of the issue. Everything else in this thread is a distraction from this.

We need to evaluate the benefit that such a breaking change would introduce. If a substantial benefit cannot be stated, then this change cannot be argued to be worth the breakage.

So let’s examine the benefits that the change has. Some posts in this have identified outcomes, but their benefits have mostly been left unstated.

Some arguments I’ve collected;

This would allow python to add new escape sequences, such as \e

Okay. This is an outcome. What’s the benefit? Saving a bit of time over writing out the raw characters to achieve the same thing? This does not justify breaking code.

Consistency - strings like “C:\Users” don’t translate the same way as something like “C:\Documents and Settings”

I am sympathetic to arguments for pedagogy and simplicity, but these benefits can never outweigh breaking existing code.

In short, I see no compelling argument as to why this change that breaks existing code is warranted. I side with Umar here.

bavalpey · December 15, 2024, 4:23am

The kind of breakage you describe here is on an entirely different level than causing the entire code to not work.

A docstring being misprinted is benign, a file name not translating to what was probably intended is worth looking at, but if the code still worked then it doesn’t matter.

Causing a SyntaxError? This halts the entirety of the program, and no part of it works.

These are not the same kind of “broken”

Also, your argument is misleading to begin with. Code with invalid escape sequences is not “broken”. It is code with valid escape sequences that did not intend to involve the escape sequence that is. Those would also be immune to this breakage, so it is off topic.

Rosuav · December 15, 2024, 5:12am

Do you realise that the ship has sailed?

You can argue all you like, but this change already happened last year. The exact version when this becomes a full-on error hasn’t been chosen yet, but the decision to do this happened quite some time ago now.

I’m sorry that you don’t see the benefit of it, but the language really is improved by this. It’s time to fix code that’s been broken, and thanks to the warning, you get several versions’ notice ^[1] before things actually stop compiling.

The alternative to doing it this way is either “Python must never create any new escape sequences, NO MATTER WHAT”, which is a pretty serious restriction on the language; or “Adding a new escape sequence creates a situation where valid code in successive languages has subtly different meaning”. It is much much better to deal with the matter up-front.

advance [Syntax]Warning, you might say ↩︎

Wombat · December 15, 2024, 5:14am

Well said.

This would allow python to add new escape sequences, such as \e

Even if a new escape sequence were added, it would break less code than the impending break-everything SyntaxError.

And why does the language need a new escape character. Languages almost never do this. And even if the road were paved for making such a change, it would probably still be a bad idea.

Consistency - strings like “C:\Users” don’t translate the same way as something like “C:\Documents and Settings”

And there is no working around surprises like “\textfiles\notes.txt”. The unexpected tabulation will still have to be taught.

The way most people are “taught” is to try to see what works. Everyone who did that will soon regret it because code that works now will stop working.

In short, I see no compelling argument as to why this change that breaks existing code is warranted. I side with Umar here.

Me too.

Rosuav · December 15, 2024, 5:14am

Oh, and to further emphasize how long ago this decision was made: What I just linked to wasn’t the decision to deprecate invalid backslash escapes, it was the decision to change it from a DeprecationWarning to a SyntaxWarning. The original deprecation happened way back in 3.6:

bavalpey · December 15, 2024, 5:44am

At the risk of the quality of discussion devolving, please answer a few things.

Am I to take your word for it, or are you going to give concrete and compelling examples?

As I pointed out, the code hasn’t been broken.

These are not the only alternatives, and such an argument that stipulates that they are is lazy at best.

What does python need to add new escape sequences for? Has it done so yet? Are there concrete plans to?
And what are the advantages of adding these new escape sequences?

This isn’t relevant. At all. Bad decisions are abundant, and just because something has been established is no argument that it is right.

A warning makes sense for this issue. An error does not.

umarbutler · December 15, 2024, 5:59am

The code isn’t broken though. It now raises a warning. But its not broken. Ignoring the warning, the code runs exactly as it did before.

umarbutler · December 15, 2024, 6:32am

I know there’s been a pretty big focus on LaTeX over the past few replies and indeed some of the examples I provided did show that some Python users have already inadvertently created effectively broken code by doing stuff like writing \textbf without escapes such that it will end up rendering as \ extbf instead of \textbf. Fair enough.

But let’s return to the example of Windows paths. There’s still at least 11.7k Python scripts (see here and here) on GitHub that contain strings beginning with C:\Windows and, actually, a ton of those paths are in fact completely valid.

Take the string "C:\Windows\System32". This contains no escape characters. That exact string occurs in at least 590 Python scripts on GitHub.

The change being proposed would break all of those scripts (assuming they’re not contained in comments, which most aren’t).

C:\Windows, having 11.7k matching Python scripts, is just but one example.

What about C:\Program? There are 34.1k matching scripts… (see here and here). All of the examples on the first page of results do not use raw strings.

That is some serious breakage… I think any future Python release breaking all that code will end up with a bunch of people on these forums asking why their code suddenly stopped working.

The benefits should justify the breakage.

So far, the benefits proposed seem to be:

We can introduce \e (or any other escape code) later on without any additional breakage.
- My counter is, why not just add a SyntaxWarning in Python 3.14 (or another future release) for uses of \e and then break the code in Python 3.15? Why is it necessary to break C:\Windows, C:\Program, C:\Users, C:\Recovery, etc…?
New users will be forced to learn that they need to escape all backslashes so that their code doesn’t silently break in the future due to the addition of new escapes.
- My counter is, does that still really outweigh all this breakage? There’s a lot of other tricky stuff with Python that new users still need to learn painfully (eg, doing stuff like x = [1]; y = [x] * 2; y[0][0] = 2; y == [[2], [2]] eventually teaches you about mutability).
- Also why not just permanently raise SyntaxWarnings without ever breaking code in future releases to steer user behaviour but ultimately allow people to write C:\Windows if that’s what they want to do and also not break code unnecessarily.

Surely, there are more options for steering behaviour and/or paving the way for new escape codes that don’t involve breaking so much code.

The great thing about Python is that it’s really easy to iterate, it’s really easy to write quick and dirty scripts, and there’s so many quick and dirty scripts you can find online to adapt yourself. Adding new breakage makes it a little less easier for Windows users and risks making it a lot harder to find old unbroken quick and dirty scripts. It also adds a bunch of work for the Python community to go back and rewrite all those C:\Windows and C:\Program Files hardcodes.

And to reiterate, no, it’s not too late to go back. The code is not broken yet. And it is quite easy to stop it from being broken before it’s too late.

Rosuav · December 15, 2024, 6:41am

They have already been given. You have dismissed them as insignificant. At this point, there’s no arguing with you; clearly you place more value on the code continuing to behave as it previously has, than on consistency and dependability. That’s fine! Keep using older versions of Python and you won’t have an issue. This is ONLY going to affect future versions of Python. Nobody is stopping you from using something ancient, even Python 2.7, as is evidenced by some of the code presented above.

Wombat · December 15, 2024, 7:33am

The ship has not sailed. No working code has been broken yet.

The initial decision was made without input from affected users like Umar. It can still be changed.

The current release only emits a warning. The warning is likely a user’s first chance to realize how much they would be affected. Once the community of users notifies you of impending harm, then don’t make the damage real by escalating the warning to an error. Instead, accept the user feedback, remove the warning, and restore the prior state of affairs.

Nineteendo · December 15, 2024, 8:11am

I wonder if we could add a new prefix for Windows paths that allows a trailing backslash: p"C:\Users\".

Wombat · December 15, 2024, 8:34am

Maybe it will help to show a specific example that will soon be broken. This is an excerpt of an algorithmic diagram in my code:

Values:    A  B  C  D  E  F  G  H  I  J
Position:  0  1  2  3  4  5  6  7  8  9
Indices:         lo          hi
                 \--- valid --/

The string used to work fine but now triggers this output:

SyntaxWarning: invalid escape sequence '\-'

Rosuav · December 15, 2024, 8:41am

Values:    A  B  C  D  E  F  G  H  I  J
Position:  0  1  2  3  4  5  6  7  8  9
Indices:         lo          hi
                 |--- valid --|

umarbutler · December 15, 2024, 8:51am

Is there any counter to the two far less destructive solutions I’ve proposed here? Why break 100k+ scripts when you could break far less by adding whatever it is you want added, ie, why break every single invalid escape sequence permanently, when you could simply break \e or whatever else only?

I agree with Benjamin:

It’s not a couple dozen scripts that will be affected. Its hundreds of thousands or more (a reasonable assumption when there’s 34.1k hits for C:Program alone, most don’t use raw strings, and that’s just limited to GitHub, not private code bases, of which there are far more than public code bases (and which can often be more ‘messy’)).

And as Benjamin asks, what is the actual benefit? We can already add \e or any other escape codes without breaking invalid escape codes.

malemburg · December 15, 2024, 12:17pm

I’m not sure what all this excitement is about.

Python 3.12 raises a SyntaxWarning for cases which need to be fixed, which is good, since not using raw literal strings for Windows paths is clearly a programming error which should be fixed. The previous (mostly invisible) DeprecationWarning has not resulted in getting the needed changes implemented, so the (visible) SyntaxWarning is a clear win.

Victor mentioned that the intent is to eventually make this a SyntaxError, but there has been no decision on when to make this breaking change. Given the nature of the change and still wide-spread use of regular strings for Windows paths, this will likely take a longer while and possibly also needs SC approval before it can go in.

To give this topic a positive spin, I’d suggest to proactively contact projects which are using regular literal strings for Windows paths and open tickets or PRs. The needed changes are minimal, so easy to implement.

Thank you for your analysis of the problem and the useful Github queries you showed.

Wombat · December 15, 2024, 4:09pm

You don’t seem to get that this was legitimate code that worked before that will soon be broken for basically zero benefit.

Yes, everyone can all change our code to accommodate the impending SyntaxError, but they shouldn’t have to. Normally, this forum isn’t so anti-user.