PEP 701 – Syntactic formalization of f-strings

guido · December 22, 2022, 2:36am

It feels oddly asymmetrical that the opening quote is present, but the closing quote is implicit. Does it really have to be that way?

pablogsal · December 22, 2022, 2:42am

No, but it makes the current implementation easier and is not visible (at least from the C tokenizer).

In any case, the tokens in the PEP are there just for reference to understand what they do but they are an implementation detail. For the tokenizer module, we can obviously think of something that makes more sense as a public API.

barry · December 22, 2022, 3:45am

I think we need to be careful relying on syntax highlighting as a cure for confusing syntax. I use and rely on syntax highlighting, but what about visually impaired programmers or other folks for which highlighting might not help or be available?

I do think it gets worse with nested f-strings, as in this PEP example:

f"{f"{f"infinite"}"}" + " " + f"{f"nesting!!!"}"

That’s incredibly difficult for a human to parse IMHO. If this is the price we have to pay for consistent implementation, and some arguably useful features ^[1], then we should ensure that the tools, from the Python parser, to editors, to linters, are aligned about how to help the user understand these complex f-strings.

although TBH I’ve very rarely been hit by such f-string limitations and they are usually quite easy to work around ↩︎

steven.daprano · December 22, 2022, 4:35am

Really? How about i + 1 in Python compared to C, where i is an int in both languages?

In case the gotcha isn’t obvious, what if i = 2147483647?

The current restrictions on f-strings are not what I would call a gotcha. If anything, they are (minor?) annoyances – things that cause an explicit error when you try them, not things which silently do the wrong thing.

I disagree.

I think that Python absolutely made the right choice here. If a language provides only one of early-bound and late-bound default values for functions, I state that Python’s choice of early binding is by far better. But that’s an argument for another discussion

Bringing the discussion back on topic:

Can you give an example of code which silently does the wrong thing instead of raising an exception because of the lack of re-use of quotes?

If it really is a gotcha, causing code to silently misbehave, that may push me from “oppose re-use of quotes” to “support re-use”. (Depending on just how contrived or realistic the gotcha is.)

stoneleaf · December 22, 2022, 6:10am

Then don’t engage with me. I certainly have no desire to argue my case with someone who thinks so little of me.

I disagree. Mutable defaults serve a valuable purpose.

How? Misusing mutable defaults can be hard to track down, but trying to misuse f-strings by resusing the quotes is hardly difficult to find and fix.

markshannon · December 22, 2022, 10:05am

Personally, I found the prohibition on reusing the same quote mildly annoying.
f"You have the following in your basket: {", ".join(items)}." seems perfectly fine to me.

But I think I would find the restriction much more annoying if I knew that it was unnecessary, and that extra work had been put in just to stop me.

malemburg · December 22, 2022, 11:12am

Couldn’t resist…

def bart(foo):
    f"""{(

    foo := foo + 1,
    print('Finally, curly brackets in Python ;-)'),
    [
        print('Teacher: You should not use full Python in f-strings.')
        for i in range(foo)
    ],
    print(f'But you can... {foo}-)')

)}"""

bart(7)

…

Finally, curly brackets in Python ;-)
Teacher: You should not use full Python in f-strings.
Teacher: You should not use full Python in f-strings.
Teacher: You should not use full Python in f-strings.
Teacher: You should not use full Python in f-strings.
Teacher: You should not use full Python in f-strings.
Teacher: You should not use full Python in f-strings.
Teacher: You should not use full Python in f-strings.
Teacher: You should not use full Python in f-strings.
But you can... 8-)

This is already possible in 3.10, but I strongly doubt that it was ever intended. Should we really head on in the same direction even more ?

Just to be clear: I like the PEP and support it, since it simplifies the implementation and removes some annoying bits (e.g. the backslash limitation), but I don’t like the removal of the string literal quoting rules and support for arbitrary nesting of f-strings. By keeping the quote restriction, arbitrary nesting can be prevented.

arhadthedev · December 22, 2022, 4:13pm

I guess the problem arises on scanning code swiftly, diagonally jumping from one quote/bracket/colon to another in a hot potato style, reading everything else with a side vision.

Meanwhile, reading the code sequentially, line by line left to right causes absolutely no problem.

So I guess that the quote reuse punishes some reading styles that perfectly worked before.

dumbpotato · December 22, 2022, 6:26pm

Having read through everything here, I still believe that the mere possibility of having deeply nested f-strings(and the re-use of quotes) is not a good enough reason to disallow it on the parser level.

Ultimately(in my opinion), such restrictions belong in linters and code formatters rather than in the language grammar itself.

pitrou · December 22, 2022, 8:10pm

This is a bit of a silly excuse. To take an adversarial example, C++ is so complicated to parse that the only reasonable way to parse it reliably is to reuse an existing compiler frontend. Which is a concrete impediment to the development of an ecosystem of third-party source-level analysis (and/or transformation) tools.

cfbolz · December 22, 2022, 8:32pm

Wearing my PyPy hat: Overall I like the PEP from a parsing implementation point of view, because it removes a lot of complexity in the current f-string implementation, and likely reduces bugs around position information of the expressions within f-strings as well. Well done @pablogsal, @isidentical and @lys.nikolaou!

My comment is about the changes to tokenizing (I’m staying out of the language design questions). Given that some of the complexity of implementing f-strings moves there, I think this part of the PEP:

How the lexer emits this token is not specified as this will heavily depend on every implementation

is taking the easy way out. I agree that it doesn’t make sense to prescribe details about the tokens (like whether the starting/ending quotes should be part of them). But I think the PEP should add some detail about how this can be implemented in a tokenizer, because the new approach to tokenizing f-strings is another Python feature that requires leaving classical regular-language-based tokenization.

One way to do this is either to add some detail about how it works in CPython’s real tokenizer. Another option would be to add some Python code to the PEP that shows how it could be done in the pure-Python tokenization implementation in the tokenize module.

Speaking of the pure-Python tokenize stdlib module. From what @pablogsal wrote above (and the following comments) it sounded like the plan is to decide what to do about tokenize independently of this PEP. If that’s the plan, the PEP should mention it.

replabrobin · December 23, 2022, 10:20am

Will the change to peg parsing improve the speed of f strings. I notice that in python311/312 I see a speed difference between simple % formatting and f strings. As example here are some timeit outputs for several pythons

$ for v in 37 38 39 310 311 312; do echo -n “python$v %: “;python$v -mtimeit -s’a,b,c=1,2.1,“ABCDEF”’ '”%s %s %s”%(a,b,c)‘;echo -n "python$v f: ";python$v -mtimeit -s’a,b,c=1,2.1,“ABCDEF”’ ‘f"{a} {b} {c}"’;done
python37 %: 500000 loops, best of 5: 446 nsec per loop
python37 f: 500000 loops, best of 5: 454 nsec per loop
python38 %: 500000 loops, best of 5: 411 nsec per loop
python38 f: 500000 loops, best of 5: 427 nsec per loop
python39 %: 1000000 loops, best of 5: 396 nsec per loop
python39 f: 500000 loops, best of 5: 410 nsec per loop
python310 %: 500000 loops, best of 5: 407 nsec per loop
python310 f: 500000 loops, best of 5: 408 nsec per loop
python311 %: 1000000 loops, best of 5: 302 nsec per loop
python311 f: 1000000 loops, best of 5: 368 nsec per loop
python312 %: 1000000 loops, best of 5: 307 nsec per loop
python312 f: 1000000 loops, best of 5: 363 nsec per loop

so it seems that recently % formatting speeded up, but f formatting didn’t so much

malemburg · December 23, 2022, 10:58am

That’s a fair point.

Perhaps it would be enough to rephrase the PEP to not encourage such use and instead refer to it as an implementation detail. At the moment, it reads in a way which promotes reusing the same quotes inside f-string expressions as a feature.

pablogsal · December 23, 2022, 7:24pm

Thanks, everyone for your feedback and for raising your concerns. Your thoughts are fundamental to us and we want to ensure they are correctly reflected in the document and considered when decisions are made. I have prepared a first PR to incorporate some of the important points raised so far. Apologies if I am missing something, I will try to make another pass at a later date. Also, feel free to open PRs yourself if you have any enhancements in mind to the document and we can discuss them in the PR.

Thanks a lot for your help and your insights!

sweeneyde · December 23, 2022, 8:11pm

Make sure you compare '%s' to '{x!s}' or compare '%r' to '{x!r}'. In 3.11-3.12, the dis module is showing that these typically lead to identical bytecode, so they should have exactly the same performance. As far as I know, this PEP will result in no bytecode changes and thus have no runtime performance effect on existing code, though it might be worth spelling that out explicitly.

pablogsal · December 23, 2022, 8:17pm

Will do

mem_dixy · December 24, 2022, 3:58am

Personally I dislike the idea of nesting f-strings. Preventing it should be easy. Just check that you don’t see another FSTRING_START tag before seeing a FSTRING_END tag. Otherwise we might end up with code like this:

print(f"({print(f"[{print("print")}]")})")

But I do like the idea of making the parser simpler. And the code golf people might like it. So maybe it will have to be one of those “Just because you can, doesn’t mean you should!” kind of things?

UltimateLobster · December 24, 2022, 8:47am

This PEP looks great! I would be happy to see it implemented

However, I think that the abillity to reuse quotes is problematic. I’ve seen people here mention potential security/IDE maintainability issues. But I think that just from the simple readability aspect, this can be quite confusing and unintuitive (Which I believe to be the most important aspect to consider).

The {} syntax in f strings just doesn’t “feel” like it should be able to do it. For example, if the original syntax for f-strings would consist of the backtick character instead of the brackets, I would say otherwise since the backtick “feels” quote-like, it feels like we’re using a different kind of quote and therefore it should be possible. But the bracket character does not have this intuitive property (at least for me).

TeamSpen210 · December 24, 2022, 8:57am

Preventing nesting entirely wouldn’t be backwards compatible, f-strings can already be nested if you use different quote styles for each level. I wonder if a bit of a compromise might be possible though? Infinite nesting could be allowed, but with the restriction that any quotes can’t reuse the immediately surrounding quote type. The switching quotes might make it easier to match each up:

>>> f"{f'{f"{f'{f"{f'{1+1}'}"}'}"}'}"

replabrobin · December 24, 2022, 9:37am

Good idea; the timings seem much closer for the later pythons when we explicitly specify the f format.