I myself am very against this because I don’t see an issue with this as an error everywhere. Backslashes in strings can reduce verbosity, in regex. And there are cases where raw strings make it more verbose.
This is a scripting language. And it has allowed this way of writing strings for a long time. It should not break something so simple as long as it retains version 3. It should not deviate from this expectation.
To mirror Chris’s attitude, if you want this change so desperately, you can always write your own version of Python.
Add a new escape sequence, fine. But labeling code that has a backslash before a character that isn’t an escape as “broken” is wrong. And I want to stress this. It’s wrong. Code that has this works as intended. It functions. The code will operate. What will make it broken is causing code with absolutely no issues to error.
Does it indicate a potential issue? Yes. This is what linters are for. It is not the job of the language to hold the hands of its developers, preventing them from writing code how they see fit.
I would have been against the initial deprecation way back in 3.6, had I been privy to its discussion. And I’m against it now.
This is going to get repetitive, but I very much dislike the characterization of backslashes in non-raw strings as broken. Does it work as a literal backslash except when in front of the 8 (9 if you count escaped quotes) characters? Yes? Okay, so if you don’t need a backslash in front of one of those and so you use it verbatim, then you wrote broken code?
What exactly is the problem? It’s adjacent to error inducing behavior? At what point did Python become a language that took away agency from its users under the guise of protecting them.
I had wanted my initial post to be one and done, but admittedly I engaged in the back and forth myself.
Let’s all go back to it so we can remind ourselves of the only thing that matters:
Making invalid escapes into a syntax error will break existing code
Any time there is code breakage, there needs to be a significant demonstration of its advantage.
Unless there are serious security ramifications, it is not on the defenders of the status quo to show that the breakage is significant.
It is on those advocating for the breaking change to prove, with a compelling case, that the change provides significant advantages.
And the advantages posed, at least in this thread, are not significant enough for the large impact this will have. This already high bar is rightly set even higher due to the fact that this breakage will occur in code that people treat as comments. Let that sink in. Your scientific codebase perhaps no longer works because three dependencies deep there’s a comment that is now considered a syntax error.
So let’s visit the advantages, and then we as a community can determine whether they are significant enough to warrant making completely valid code break
- Allows python to add new escape sequences
This is an outcome, not an advantage. Can the same exact sequences not already be spelled out? If so, then what’s the advantage? That it’s easier to write? I personally don’t think convenience warrants breakage
If you really need the new sequence, then just introduce it. This is fine, you have already done your due diligence by having invalid escapes marked as deprecated for 6 minor versions, and marked as a warning for 2 (or more depending on when the new escape is added).
No one here is arguing that new escape sequences should be forbidden.
- The existing status quo is confusing for new users.
Quite frankly, no one here is qualified to speak definitively on the point about this being easier for novices one way or the other. We are not novices. Actual studies need to be conducted and examined for this line of argument to carry any water.
Anecdotally, I have firsthand experience teaching intro computer science students. This is not something they struggle with in my experience. Having to learn to recognize 8 escape sequences is a really low bar. Even if it were 12, or 15, it’s nothing. They already know about not being able to use certain variables names because they are keywords.
I’ve never had anyone express confusion over \n being special while \d, for example, is fine.
- Code with invalid escapes is more likely to inadvertently contain real escape characters
This is true, but the key word here is likely. It is not guaranteed. And that is why it is very appropriate for this to remain a warning. It should be a warning, and it should be visible. But this is only “more likely”. The likelihood of invalid escapes left in code is already small.