PEP 501: (reopen) General purpose string template literals

nhumrich · March 9, 2023, 4:10am

With collaboration with Nick Coghlan (the original author of PEP 501), we would like to propose reopening PEP 501 after 7 years of being differed. In those seven years, f-strings have become common idiomatic python, and people are familiar and comfortable with them. Also, since then, PEP 701 has been accepted, which adds even more power, and a reference implementation to for PEP 501.

As such, a PR has been created in favor of reconsidering PEP 501, with additions and improvements on things learned. Re-open PEP 501 in consideration of PEP 701 by nhumrich · Pull Request #3047 · python/peps · GitHub

On top of learning from other PEP’s, a major change to PEP 501 is renaming “String Interpolation” to “template literals” in line with popular naming around a similar feature in typescript/javascript.

A built version of the new PEP is currently located: PEP 501 – General purpose string template literals | peps.python.org

ncoghlan · March 11, 2023, 12:25pm

Thanks once again for working on this @nhumrich!

For folks reading the thread, note that we’ve received some excellent editorial comments from @CAM-Gerlach, so it will take some time to work through those and get the update PR to a point where we republish the PEP itself. Fortunately the PEP PRs have the rendered previews (as linked in the initial post) these days

encukou · March 22, 2023, 12:31pm

Thanks! I’m looking forward to the PEP.

Here are some comments from reading the current draft.

The !a, !r and !s conversion specifiers supported by str.format [are] replaced in the parsed template with the corresponding builtin calls, in order to ensure that field_expr always contains a valid Python expression

This means that, for most purposes, the difference between the use of conversion specifiers and calling the corresponding builtins in the original template literal will be transparent to custom renderers. The difference will only be apparent if reparsing the raw template, or attempting to reconstruct the original template from the parsed template.

I’d assume that if you need to re-parse/re-evaluate the template, it you’ll usually want the difference between repr(foo) and foo!r preserved. The former depends on the current value of repr – a user can easily shadow it with a local, or monkeypatch builtins.

It looks like the proposed subprocess behaviour be unexpected/unsafe on Windows. There’s no way to pass the posix argument through to shlex.split. Should subprocess set it based on what kind of shell it calls? (Can it even know that?)

You’ll want to look at the current PEP template for new recommended sections. “How to teach this” would be relevant here.

nhumrich · March 24, 2023, 6:29pm

you’ll usually want the difference between repr(foo) and foo!r preserved

They would still be preserved in the raw_template attribute of the TemplateLiteral. The raw_template matches exactly what the code shows (what the user typed).

Should subprocess set it based on what kind of shell it calls? (Can it even know that?)

It can know that. Subprocess already has two implementations. One for posix and one for windows. So we could potentially only support templateLiterals for posix shells? Wondering if anyone has thoughts on this.

GalaxySnail · March 25, 2023, 3:53am

It should be safe with shell=False, because arguments are quoted by shlex.quote and parsed by shlex.split, it’s a useful syntactic sugar for shell-like scripts. I agree that it should be clearly documented that subprocess.Popen with t-strings uses POSIX shell syntax even on Windows.

On the other hand, it is indeed unsafe with shell=True on Windows. shlex is designed for POSIX-compliant shells, while cmd.exe has its own ~~chaotic~~ syntax. Here are some discussions about it: Why is subprocess.list2cmdline not documented. IMO making shlex to support cmd.exe or powershell is definitely out of scope of this PEP.

Should we add a runtime check such as:

if os.name == "nt" and shell and isinatance(args, TemplateLiteral):
    raise TypeError("t-string is not supported with `shell=True` on Windows")

or just warn about it in the documentation and leave it to linters? It may be a footgun on Windows, so I personally prefer the former.

pf_moore · March 25, 2023, 9:00am

I’m a strong -1 on the proposed subprocess behaviour. It’s not even reliable on all POSIX systems - it wouldn’t work if you used Powershell as your POSIX shell, for example.

The motivating example of os.system(f”echo {message}”) seems wrong to me - in my experience no-one uses os.system any more, as its insecurities are well known. subprocess.run([“echo”, message]) is the standard idiom these days in non-trivial code (and even in most trivial scripts).

I’m happy with the idea that the main use of template strings will be in application specific cases, where usage is tightly controlled, or in 3rd party libraries where limited use cases can be supported, but IMO stdlib implementations should be held to the same levels of portability and robustness an every other part of the stdlib (there’s a reason shlex is a relatively under-used stdlib module…)

GalaxySnail · March 25, 2023, 9:20am

FHS suggests that /bin/sh should be a POSIX compatible shell (or a hard or symbolic link). There are still some programs using system() on Linux. There are still many shell scripts with #!/bin/sh shebang which are written in POSIX shell syntax. Anyway it wouldn’t be safe to use powershell (or fish, or some other POSIX incompatible shell) as /bin/sh.

pf_moore · March 25, 2023, 10:50am

This doesn’t alter my view that we shouldn’t be using os.system as a motivating example (we should be recommending subprocess.run) and we shouldn’t be adding functionality to the stdlib that appears to make shell=True as safe as using an argv list, but without actually doing so cross-platform (which is not practical).

I guess I’m OK with having a shlex.sh renderer, as that is clearly subject to the same limitations as the shlex module as a whole. What I’m not comfortable with is having subprocess silently use that shlex renderer. The usage subprocess.run(shlex.sh(t"echo {something}"), shell=True) is explicit and makes the non-portability clear.

And this part of the PEP:

Alternatively, when subprocess.Popen is run without shell=True, it could still provide subprocess with a more ergonomic syntax. For example:

subprocess.run(t’cat {myfile} --flag {value}')

would be equivalent to:

subprocess.run([‘cat’, myfile, ‘–flag’, value])

is just flat-out wrong on Windows. And yet, people will assume that it’s correct, and introduce bugs into their scripts.

or, more accurately:

subprocess.run(shlex.split(f’cat {shlex.quote(myfile)} --flag {shlex.quote(value)}'))

This qualification is correct, but the difference between what this does, and what the statement it’s clarifying seems to do (to a Windows user) is IMO a major bug magnet.

Let’s just omit the change to subprocess. That’s the simplest approach, and requiring people to explicitly use shlex.sh shouldn’t be too big a burden.

GalaxySnail · March 25, 2023, 12:48pm

I agree. subprocess.run(..., shell=True) is better than os.system as a motivating example.

That makes sense. A subprocess funtion depending on a shlex function is something implicit. I think a good documentation can make it not that bad.

It’s correct on Windows. In fact, it is platform-independent. The following 4 examples are exactly equivalent:

subprocess.run(t'cat {myfile} --flag {value}')
subprocess.run(shlex.split(shlex.sh(t'cat {myfile} --flag {value}')))
subprocess.run(shlex.split(f'cat {shlex.quote(myfile)} --flag {shlex.quote(value)}')))
subprocess.run(['cat', myfile, '--flag', value])

A better example:

subprocess.run(t'grep -F "search pattern" {myfile} --color={value}')

It’s equivalent to:

subprocess.run(['grep', '-F', 'search pattern', myfile, f'--color={value}'])

It is useful for creating cross-platform scripts. For example:

subprocess.run(t'git -6 -q -b dev -c user.name=bot -c user.email={email} '
               t'--single-branch --filter=blob:none --sparse --no-checkout {url}')

It is much simpler and more clear than this one:

subprocess.run(['git', '-6', '-q', '-b', 'dev', '-c', 'user.name=bot',
                '-c', f'user.email={email}', '--single-branch', '--filter=blob:none',
                '--sparse', '--no-checkout', url])

It’s fine with me, but I would be happy if this functionality could be accepted as well.

pf_moore · March 25, 2023, 1:03pm

Untrue.

>>> cmd = 'py -c "import sys; print(sys.version)"'
>>> run(shlex.quote(f"{cmd}" 1 2 3), shell=True)
''py' is not recognized as an internal or external command,
operable program or batch file.
CompletedProcess(args='\'py -c "import sys; print(sys.version)" 1 2 3\'', returncode=1)

Note the single quotes - single quote is not a valid quote character for the Windows shell, so this confuses the heck out of the interpreter (it confuses the heck out of me!).

I don’t think we want the support burden of people asking “why didn’t my program work, it’s just like the docs describe” in cases like this…

GalaxySnail · March 25, 2023, 1:09pm

Paul Moore:

Untrue.
>>> cmd = 'py -c "import sys; print(sys.version)"'
>>> run(shlex.quote(f"{cmd}" 1 2 3), shell=True)
''py' is not recognized as an internal or external command,
operable program or batch file.
CompletedProcess(args='\'py -c "import sys; print(sys.version)" 1 2 3\'', returncode=1)
Note the single quotes - single quote is not a valid quote character for the Windows shell, so this confuses the heck out of the interpreter (it confuses the heck out of me!).

What I talked about is shell=False (which is the default value). As said above, I agree that shell=True is unsafe and definitely doesn’t work on Windows.

GalaxySnail:

Should we add a runtime check such as:
if os.name == "nt" and shell and isinatance(args, TemplateLiteral):
    raise TypeError("t-string is not supported with `shell=True` on Windows")
or just warn about it in the documentation and leave it to linters? It may be a footgun on Windows, so I personally prefer the former.

eryksun · March 25, 2023, 8:47pm

If I understand this proposal correctly, the shell=False case is okay on Windows because the work to quote() fields gets reversed by shlex.split(). Then a Windows command line gets created by subprocess.list2cmdline(). This would allow POSIX developers to write a template command line according to POSIX shell rules and have it automatically translated into a Windows command line. However, it’s more work than directly translating a template using a Windows-specific quote() function as the field renderer.

steve.dower · March 26, 2023, 8:01pm

The greatest benefit of this proposal, particularly in the subprocess case but also the SQL case, is to not require, offer or promote any “convert this to a properly quoted string” functions at all.

Based on integrating pathlib, it’s best to make the conversion entirely transparent (at least for people who aren’t implementing the APIs accepting them). That way, subprocess.run can convert a t"" string to a list of arguments, rather than necessarily going through a quoting step followed by an unquoting step (and a user who explicitly chooses to convert to a quoted string rather than passing the original object is just like someone who passes str(my_pathlib_Path_object) - slow, and potentially incorrect).

This is going to imply non-string-like semantics for the template literals, but that’s what makes it useful. If it were just a slightly more complex f"" string, it wouldn’t be worth it. It’s being able to say in the subprocess docs “when passed a t"" string, each substitution is treated as an entire argument and may be quoted, along with any directly adjacent or quoted text”. So now t"py my_{value}.py" and t'py "my {value}.py"'^[1] work as you’d expect, and the recipient can handle the quoting exactly as needed (including for shell=True cases on Windows ).

But it relies on actively discouraging using shlex.sh unless you are implementing the API. Regular API users should just pass the literal directly and let the receiver handle it (or raise).

And it also relies on us implementing the APIs. One of the biggest mistakes made with pathlib was in not making its objects accepted everywhere as soon as we could. So people had 2 whole releases to add str() calls everywhere. We should get that right this time.

e.g. with value='a "quoted value" with spaces', which is notoriously hard to quote ↩︎

encukou · March 27, 2023, 8:31am

If that’s where we end up, it would be great! But it’s important, and far from trivial, to do this correctly. I’m suggesting that it’ll need to be its own PEP, rather than a section in the present one.
It would be sad if this was another API that “runs the shell” or whose arguments “can be quoted”. What should run(t'''py 'my_{value}.py' "my_{value}.dat"\t--{value}''') do? With, say, value='haha"\0"'? IMO, the needed research is PEP-sized, even if the solution is, crisp, simple and transparent to users.

pf_moore · March 27, 2023, 9:38am

That seems reasonable. But it emphasises my point that we shouldn’t introduce any behaviour to subprocess in this PEP, so that we remain free to add the better behaviour later. And that includes, in my view, adding a shlex.sh wrapper that’s intended to be used with subprocess, as that will introduce churn if people add it and then need to remove it later to get the improved behaviour (as @steve.dower said, like the mistake we made with pathlib).

nhumrich · April 15, 2023, 1:47pm

Splitting the subprocess and shlex behavior out into its own PEP makes sense. I will work on removing that area of the proposal from this current version of the PEP.
Other than the subprocess changes, how do we feel about the rest of the PEP?

jimbaker · April 24, 2023, 8:56pm

Hi, jumping in here because of a conversation we are having now at PyCon sprints with @CAM-Gerlach , @pauleveritt , and @guido -

Please take a look at work we are doing on https://jimbaker/tagstr. In particular, with respect to the older version of PEP 501 (need to get current with the reopened version, where as I understand it, at least i → t), there’s this issue where we compare this work - Comparison to PEP 501 · Issue #7 · jimbaker/tagstr · GitHub along with an implementation of an i tag linked to that issue (tagstr/interpolation_template.py at main · jimbaker/tagstr · GitHub).

Note that we are currently updating the proof of concept in the work in Update proof of concept with respect to PEP 701 changes · Issue #22 · jimbaker/tagstr · GitHub

rmorshea · April 24, 2023, 9:04pm

Please take a look at work we are doing on https://jimbaker/tagstr

The link to the repo is actually: GitHub - jimbaker/tagstr: This repo contains an issue tracker, examples, and early work related to PEP 999: Tag Strings

pauleveritt · April 26, 2023, 11:46am

I’m pitching in a little on PEP writing with Jim and Guido. @nhumrich I will go through your effort first and understand the PEP and its structure.

kknechtel · April 29, 2023, 12:18am

The architecture of this seems really neat.

For example, since template literal expressions are arbitrary Python expressions, string literals could be used to indicate cases where evaluation itself is being deferred, not just rendering: logging.debug(t"Logger: {'record.name'}; Event: {event}; Details: {data}")

It would be nice to be able to leverage some of this architecture to create t-strings with deferred evaluation at runtime: i.e., a way to convert a string (not necessarily literal!) like "Logger: {record.name}; Event: {event}; Details: {data}" explicitly at runtime into a TemplateLiteral instance equivalent to t"Logger: {'record.name'}; Event: {'event'}; Details: {'data'}". That would be huge for i18n purposes, for example: we could read the template string from a l10n resource bundle.

Yes; right now that can be done more or less by just using .format on an ordinary string. However, this would be more flexible (since it opens up the context-sensitive formatting machinery), potentially more powerful (as f-strings are), and potentially simpler to use (perhaps there could be some simple way to provide the formatter with the namespace to use for deferred evaluation, rather than passing each variable explicitly as a separate keyword argument or having to write weird stuff like **locals() only to miss out on outer scopes anyway).