PEP 787: Safer subprocess usage using t-strings

With the acceptance of PEP 750, I would like to propose PEP 787: adding t-string support to the subprocess module.

Support for t-strings inside the subprocess module was a part of PEP 501. PEP 501 was closed, giving preference to PEP 750, but PEP 750 did not have this part of it.

I think this would serve as a good consumer reference implementation in the standard library for how to implement t-strings, to show the advantages, and capabilities, of t-strings. We are aiming for python 3.14, to pair well with PEP 750, but also understand that would be a very aggressive timeline, and are willing to put to python 3.15 if the timeline for python 3.14 does not work out.

13 Likes

Thanks for doing this, I think it’s a very valuable use of t-strings.

My first major comment on this PEP is that I think it glosses over the fact that quoting isn’t consistent across platforms. There are enough qualifications that the PEP isn’t wrong, but it should probably mention in the abstract and the first description of shlex.sh that they are for POSIX-compatible quoting only (at least until we one day decide how to implement platform-native quoting, or more likely, shell-specific quoting), while subprocess and system can (will?) be platform-specific.


With that in mind, I like this definition of the behaviour:

When subprocess.Popen is called without shell=True, t-string support would still provide subprocess with a more ergonomic syntax. For example:

subprocess.run(t"cat {myfile} --flag {value}")

would be equivalent to:

subprocess.run(["cat", myfile, "--flag", value])

But I don’t like this bit:

or, more accurately:

subprocess.run(shlex.split(f"cat {shlex.quote(myfile)} --flag {shlex.quote(value)}"))

I don’t have a problem with showing an example of how it’ll be calculated, but please frame it as an example implementation and not part of the specification (which is really just a case of changing “more accurately” to “hypothetically” or “potentially”). Or just leave the last example out completely if you prefer.

It’s been a while since I looked at how well shlex.split and quote round trip, but it’s possible there could be a lot of work to make this implementation reliable. And I’m pretty sure we’d have to diverge in a number of cases anyway (e.g. backslashes don’t get treated the same in all cases). But provided we don’t make the PEP reliant on those functions explicitly, we can sort out details in implementation without breaking any contract.

12 Likes

True, we don’t want to lock in that behaviour as the exact implementation, just make it clear that the problem isn’t as simple as chucking the individual template fragments in a list.

Perhaps we should include a pair of the more complicated examples that drive the need for a “join and split again” approach like:

t"cat '{dirpath}/{fname}'"
t"cat -v {filepath}'"

Then we would give the quote/split solution as the reference algorithm, but explicitly call it out as an implementation detail rather than as part of the spec (so we’re free to adjust it if edge cases require us to do so).

5 Likes

I usually write subprocess.run("run command".split()) which is fairly compact, easy to write, and safe IIUC. But I didn’t see that spelling mentioned in the PEP, maybe it should be discussed somewhere?

Very nice use of a t-string indeed.

By the same token we can make sqlite3.Cursor.execute support a t-string for named parameters that are both safe and DRY as well (in a separate proposal).

It’s generally safer to simply leave off the .split and pass the command as a single string. Your version doesn’t handle quoting, but will add quotes to each element (if required) as they are processed, which is only going to be safe/correct if quotes are never required. In this case, subprocess is just going to " ".join(your_list) and use that to launch the process.

Safely adding user-provided arguments into the command is where a t-string (or proper quoting algorithms) become relevant. The challenge being that we need to first split up t"run command {argument}" correctly so that the argument can be quoted/escaped.

1 Like

I was thinking there was something I’d missed, thanks!

This is the example that makes it all worth it, because you don’t want to end up with "cat ''spam'/'eggs''" at the end of it. So at the very least, the algorithm needs to split first, substitute (possibly multiple values in a single argument), and then quote before joining.

But we don’t need to put all the details in the PEP, just enough to indicate that we intend to handle cases like this properly. Then we can handle all the edge cases the implementation, as we do with os.path and friends. All that’s required now is to not specify ourselves into a particular implementation.

1 Like

On windows maybe, but on posix Popen works with a list.

3 Likes

Ah, you’re right that it’s different. A nuance I’d never noticed - passing a str to Popen on POSIX is equivalent to passing a single argument, not the entire command line:

        def _execute_child(self, args, executable, ...):
            """Execute program (POSIX version)"""

            if isinstance(args, (str, bytes)):
                args = [args]

That’s unfortunate, I prefer to have our APIs such that users can write the same code regardless of platform. I’ve never heard of this causing a cross-platform issue before though, so I guess everyone just uses lists?

Either way, my point about the t-string behaviour being more complicated than originally proposed still stands.

5 Likes

I’m a Linux system admin who frequently uses subprocess run executing as root.

In my opinion a developer who users shell=True or the string version of subprocess.run is making a poor choice. The current subprocess documentation correctly states:

args is required for all calls and should be a string, or a sequence of program arguments. Providing a sequence of arguments is generally preferred, as it allows the module to take care of any required escaping and quoting of arguments (e.g. to permit spaces in file names).

(emphasis mine).

Putting a shell in the way of Python and what you’re trying to execute adds a layer of indirection that often has no benefit.

I don’t think this is good first use of t-strings.

9 Likes

I see people trip over that one quite often. Normally it’s a Windows user who writes somethng that works for them then wonders why they get nonsense when they go cross platform. Honestly, I had assumed that subprocess.run("string", shell=False) not being an error was an oversight. I’m surprised that this PEP is marking it as a feature.

3 Likes

(replying to Geradwx)

I think you misunderstand: this does not propose to add shell=True to subprocess calls, but to let people write one string (convenient) and get a list of arguments properly separated and escaped (safe).

4 Likes

As far as I can tell, it’s not; can you point me to where it does?

However, I agree that run(str, shell=False) is more of an attractive nuisance than a useful feature. I would be in favor of the PEP deprecating str/bytes as the args argument to Popen et al (clarifying edit: for the shell=False case), with the eventual goal of only accepting a list or Template to avoid doing the wrong thing when one forgets the t prefix on their args template. To me, without something like that, this is really just a logical feature that just needs maintainer approval rather than something that needs a whole PEP.

1 Like

Sorry, got my wires crossed.

1 Like

Do we really want the subprocess.run(t"echo {message_from_user}", shell=True) variant? It’s only even theoretically safe insofar as a compliant shell is used (and shlex.quote() has zero bugs in it). It’s more exposed privilege escalation via things like shell aliases. And the safety is easily stripped by a careless shellism. e.g. (somewhat fabricated I know)

subprocess.run(t"echo `echo {untrusted_input}`", shell=True)

If nothing else, wouldn’t it be nice to avoid having 5 overloads of subprocess.run()?

2 Likes

It shouldn’t be any worse than run(["echo", message_from_user], shell=True).

The t-string here is really just a way to reliably split the string (and it may get recombined later). Since f"echo {message_from_user}".split() (or any other variant on .split) isn’t safe, because the insertion has already happened. But a t-string passed to a function that knows to treat each substitution as a single argument can safely split into ["echo", message_from_user] and then let subprocess.run quote it correctly.

This kind of example is why I say we need to be able to split safely in the first place and then substitute, and also why the PEP shouldn’t try to cover all these cases exhaustively.

1 Like

Honestly, I’d probably never use this feature, and I’d continue to require code to use the list form of args (with shell=False).

The only realistic use I could see for this feature would be the case where you’re using shell=True and you’re explicitly using shell-specific features like pipes. So I’d very much expect it to appropriately quote interpolated arguments in the manner appropriate for the shell I’m using. Of course, that’s hard (if not impossible), given that the subprocess module doesn’t document what shell it will use (Presubably the user’s default shell? Or the shell that invoked the Python process?) and shells have different quoting rules.

IMO, it would be very bad if subprocess.run(t"some command {with} {interpolations}", shell=True) failed to correctly quote the command for my shell. Which I think is what the PEP as it stands does :slightly_frowning_face:

And while the case of subprocess.run(t"some command {with} {interpolations}") is clearly defined by the PEP, relying on the shlex (POSIX) quoting and splitting rules seems like it would be a huge footgun for people (Windows users, for example) who aren’t familiar with POSIX. For example, shlex defaults to using single quotes, which aren’t recognised by the Windows cmd.exe shell. And shlex.split("a\\c") apparently returns ['ac'] - which is completely wrong on Windows (and to be honest, I can’t make sense of it as POSIX…).

Hmm. In the process of getting some examples for this post, I’ve now convinced myself that I would never use this feature, and would argue strongly against anyone else ever using it. So I guess I’m -1 on the PEP as it stands. Which is a shame, because I expected this to be a good use case for t-strings :slightly_frowning_face:

13 Likes

It’s no worse than subprocess.run(["some", "command", with, interpolations], shell=True) failing to quote the command for your shell (which it often fails at…).

The c is escaped by the backslash, so it does makes sense. shlex.split actually parses the entire string and then returns each item minimally escaped - since you don’t need to escape c, the backslash is omitted.

And I think it’s blindingly obvious that subprocess processing should be platform-specific, even if shlex processing isn’t.

Explicit reliance on shlex is a major flaw with the current PEP text. It’s hardly worth continuing any discussion until that’s corrected. Adding features to shlex is fine, but subprocess is a cross-platform module, so it isn’t and cannot be defined in terms of the shell-specific shlex module.

7 Likes

Am I missing something, or why can’t the implementation just be subprocess.list2cmdline([p.value if isinstance(p, Interpolation) else p for p in template]), using the exact same semantics as a list of strings when shell=False?