Please don't break invalid escape sequences

bwoodsend · December 19, 2024, 1:26am

Even if your invalid escape characters are confined to docstrings and you don’t care if your docstrings get misinterpreted, your code can still break if a new escape sequence requires certain characters to come after it.

As an example, imagine that \U didn’t exist before and someone wrote:

def foo():
    """C:\Users"""

Now reintroducing \U would cause a SyntaxError because you can’t write \U without following it with 8 hexadecimal characters. Even comment strings (I mean ones not assigned to any variable or docstring which the compiler strips as an optimisation) aren’t a safe place for invalid escapes.

Rosuav · December 19, 2024, 1:34am

Those words were spilled as a result of the mistaken idea that string literals are comments. They are not. They never have been. Some people take advantage of the fact that they get optimized out, but they always have been and always will be strings.

sirosen · December 19, 2024, 2:09am

Yep!

In fact, it’s a good thing that nobody has written code in which docstrings are parsed and used to drive runtime behavior! I’ve definitely never done that! Not me!

bavalpey · December 19, 2024, 3:19am

I know and agree. String literals are not comments. It is highly unfortunate that there is a substantial amount of code that is improperly treating them as such, though. This is the reality.

I want to point out a few issues with the current plan.

How it will work when a new escape sequence is added (assuming that it does not act like \U) is we will have this issue with code, let’s say it’s \e.

Here, we have this problem of continuity. The same code will go from warning → error → no error in 3 different versions.

Specifically, the current plan will work like so:
Python 3.A: syntax warning. Code still functions, in that it can be run.

Python 3.B: Invalid escapes result in syntax errors. Code will not run, regardless of how benign the escape is on the code’s core functionality

Python 3.C. \e is added as an escape. The code now functions as before, except that the \e will be rendered differently. This behavior may or may not be benign (e.g. file paths will be modified entirely).

I personally don’t think this is meaningfully better than warning → no error. It is especially not worth having these codebases break between python versions 3.B and 3.C.

That is, the warning should be sufficient! You as python developers have done your due diligence by communicating to users that their code may not work as expected in the future with the syntax warning. This never needs to escalate to an error.

You are not silently breaking code. You made so much noise with the warnings that have been communicated ever since Python 3.6.

Also, I know firsthand that people in these communities don’t increment python versions with every new number. E.g., one project I am aware of began in 3.6, went to 3.8, and now uses 3.11.
This upgrade to an error will not even prevent all cases.

Rosuav · December 19, 2024, 6:12am

In 3.6, the warning was there, but as it turns out, most people don’t run with warnings enabled. And not everyone has SyntaxWarning enabled either - but more do.

bavalpey · December 19, 2024, 6:14am

Sorry, I should clarify. The project I mentioned is an academic project akin to those described by Umar. It does not suffer from the invalid escape issue.

gerardw · December 21, 2024, 1:26pm

This has been a long and contentious thread because there are no good answers and, it seems, a lack of concern for the wider Python user community.

I literally have a job because maintaining old code is such a PITA.

Python was designed to be user-friendly, and a reasonably intelligent scientific person can write a Python script that works well today running locally. Writing code that works in the future installed in a read-only directory that deals with things like network outages and vanishing websites and APIs etc. requires more of a professional programmer skill set. The current incentive structure in science (publish!) is such that it’s rare a lab would put their resources into paying for that. A lot of work is done by graduate students and postdocs who move onto other labs. Third-party updating of code is problematic since reproducibility is a key component of FAIR principles, and non-programmer script writers typically don’t write regression test suites, and suitable input data is often not easily locatable.

“Just use the older Python interpreter” isn’t a long-term solution for many folks. Eventually, Python distros age out of things like deadsnakes. I just got to track down the source for 2.7 because it’s not on Ubuntu 24 and there are no resources to update old but still usable and scientifically valuable programs to Python3.

“We provided warnings” isn’t as useful as some in this thread seem to think. I get warnings from third-party packages. They often are rather vague (something somewhere is wrong) and I end every day disappointing someone because I didn’t get what they needed done: I’m not likely to track down and fix a Warning unless it’s obvious and it’s from code I have access to. (e.g. maybe I’ll see if there’s a new version is pypi, but I’m not going to fork a github repo and recompile just to clear a warning.)

To the issue at hand – I fully expect that either the escape sequence issue will break something at some point or I’ll retire first and won’t care , because, yeah, we can’t freeze the language as is.

But we can definitely tone down the rhetoric on Python ideas.

nedbat · December 21, 2024, 4:24pm

Your first and last sentences don’t seem to match. Did I misunderstand something?