TL;DR
- The legacy
os.spawn*
functions don’t handle spaces or quotes in arguments on Windows, as they are just naively concatenated without quoting/escaping - This is unsafe, leads to many subtle bugs (in CPython’s own test suite, even) and inconsistent with the POSIX behavior
- Presently, they are little-used, considered a security risk and most uses have some form of these bugs
- At minimum, fix the tests and warn about this in the docs, as well as at least one of the following:
- Deprecate these little-used, unsafe legacy functions in favor of
subprocess
(more work up front, but best long-term) - Fix the issue with the Windows arg handling (most straightforward, but a compat-breaking behavior change for users with existing workarounds)
- Expose an existing manual workaround (
subprocess.list2cmdline
) that users can use in both 3.12 and earlier versions
- Deprecate these little-used, unsafe legacy functions in favor of
The Problem
Right now, as @storchaka originally documented on python/cpython#75729 and as I confirmed just now, the legacy os.spawn* functions do not correctly handle spaces (or inner quotes) within arguments on Windows, and instead just naively concatenate the argument list, due to the underlying spawnv implementation. This results in arguments being split on spaces (unless quoted), and any quotes parsed as argument groupers instead of retained.
For example:
>>> import os, pathlib, sys
>>> script = pathlib.Path('test_spawnv_issue.py')
>>> script.write_text('import sys; print(sys.argv)')
>>> os.spawnv(os.P_WAIT, sys.executable, [sys.executable, script, 'test arg'])
['test_spawnv_issue.py', 'test', 'arg']
>>> os.spawnv(os.P_WAIT, sys.executable, [sys.executable, script, 'literal "quotes"'])
['test_spawnv_issue.py', 'literal', 'quotes']
>>> script = pathlib.Path("test spawnv issue.py")
>>> script.write_text('print("Hello world!")')
>>> os.spawnv(os.P_WAIT, sys.executable, [sys.executable, script])
python.exe: cant open file '.\\test': [Errno 2] No such file or directory
This is not only inconsistent with its behavior on POSIX platforms (which preserve the arguments as entered), but is quite unsafe and leads to a variety of bugs in various cases, including the unit tests for these functions in Python’s own test suite, which fail when there are any spaces in the working dir path (e.g. when the username contains a space).
The primary current workaround is to manually quote any arguments that contain spaces; however, neither the behavior nor the workaround is currently documented in the spawn* docs. Furthermore, the right quote handling algorithm, as used by [subprocess.list2cmdline
](subprocess.list2cmdline (which is not documented, but used internally by subprocess
to avoid this, is quite non-trivial and of the third-party users I surveyed, none actually got it right themselves.
Therefore, it seems something further should be done to address this; however, each possible approach has some caveats, which is why @storchaka suggested discussing this more broadly.
Survey of current usage
FWIW, there are 438 hits for os\.spawn[lv]
on grep.app and 2 hits for from os import spawn[lv]
(matching any of these functions), which is a very small number (there are 30 000 for os.system
(), 13 000 for subprocess.run
, 32 000 for subprocess.Popen
and 23 000 for subrocess.call
, and also less than most of the libraries removed by PEP 594. Checking each of the first 50:
- 17 (34%) are from security tools/demos warning that using this function is unsafe
- 16 (32%) are just vendored copies of Python/its stdlib
- 3 (6%) are platform-specific code/libraries for non-Windows platforms
- 6 (12%) are Scons and vendored copies of it, which uses naive quoting
- 2 (4%) are vendored copies of Pydev, which uses a less naive but still not correct quoting
- 3 (6%) are standalone scripts and packages that are formally retired or very old/unmaintained for 5-10+ years
- 1 (2%) is part of a modern, well used library (opencv), in a utility submodule, and does not use quoting
As a sidenote, there were 443 hits for subprocess.list2cmdline
and 55 hits for from subprocess import list2cmdline
, more than all of the os.spawn*
functions combined, especially when considering actual genuine unique uses by popular, maintained projects, despite it being not documented and not in __all__
.
Key Takeways
- These functions see little use in still-used public Python code, with perhaps ≈20-40 total unique instances across GitHub
- Most genuine unique uses are unquoted, and none actually get the quoting correct
- Over half of non-stdlib hits were actually in security tools/demos warning how it could be exploited
The Solutions
First off, we should at minimum do the following, which are non-controversial and can be backported to earlier versions:
- Fix the existing tests so they work reliably with the current behavior (opened as PR python/cpython#99150)
- Add a warning in the docs explaining the problem and the workaround
Then, we have three options for how to handle this in 3.12 and above:
- Fix the issue with the Windows arg handling (at the cost of a compat-breaking behavior change)
- Deprecate these little-used, unsafe legacy functions in favor of subprocess
- Expose a workaround that users can manually use in both 3.12 and earlier versions
Option 1: Deprecate
We could just deprecate these legacy functions in favor of subprocess
, and remove them in two or more releases from now. Theoretically, just the Windows versions could be dropped, but IMO it should be done for the functions as a whole if they are no longer useful, since they have other problems (security, etc) and it would make existing and future code using them non-portable, and is also potentially more challenging to communicate to users. Conversely, if they are still useful, it would be both less work on our and users’ end to just implement the above fix.
These functions are relatively little-used in user code, and are used only two other places in the stdlib, once in a test of os.waitpid
and once in a test of tempfile
; both already use and require subprocess
, so could presumably be migrated to that.
-
+
Removes little-used, insecure/unsafe, redundant legacy functionality that has several other issues in favor of the much safer, more powerful and easier to usesubprocess
module -
+
Users get deprecation warnings and a deprecation period instead of an immediate behavior change -
+
User code can immediately switch tosubprocess
alternatives, instead of having to potentially handle both the <3.12 and >=3.12 behavior -
-
Requires somewhat more change to existing user code -
-
More work to implement up front (in exchange for less long-term maintenance burden) -
-
The existing functions go away
Option 2: Fix
In Python 3.12, the undocumented subprocess.list2cmdline, which implements the correct quoting logic for subprocess
, could be imported (inside a function, just like other stuff around it does), and then the Python wrappers for the relevant spawn*
functions on Windows could be added/tweaked (some already have them, some don’t) to call it on each arg before passing them to the inner functions. The one issue with this option is that user code that already quotes args with spaces (generally incorrectly) would need to remove that logic for Python 3.12+
-
+
Simplest to implement in CPython -
+
Retains the existing functionality -
+
Requires least (or no) change to existing user code -
-
Breaks, potentially non-obviously, existing user workarounds (without adding more complexity) -
-
No reliable way to first deprecate existing behavior (without adding more complexity) -
-
Legacyspawn*
functions still have other problems vs.subprocess
Option 3: Work around
There is a third alternative—document subprocess.list2cmdline
publicly and describe in the above docs, with a prominent warning about the problem how to use it to work around it, and don’t change os.spawn*
at all. As a docs-only change, this could be backported to bugfix versions, as an easier to use and more correct workaround than trying to manually describe the algorithm (or undocumented functionality).
-
+
Least amount of work for us up front -
+
Doesn’t break working (if unreliably) user code -
+
Could be backported to existing bugfix versions -
-
Requires that all users work manually read the docs and correctly work around the problem themselves -
-
Legacy, unsafe and redundantos.spawn*
still have other issues -
-
We need to makelist2cmdline
public, which there were previous concerns about
Recommendation
Given the very low usage per the survey, the other problems with these functions, and their redundancy with subprocess
that avoids all of the same, as well as consideration to the backward-compat policy and ensuring a smooth transition and reducing long-term techdebt and user errors, I would recommend the options in the order listed above.
If we deprecate these functions, users will have clear warning of any change at least several versions in advance, per our deprecation policy, and will be able to immediately move to a long-established and much improved alternative, rather than having to add condition code to cope with an immediate behavior change. This ends up being simpler for us and users longer-term, and we can help the very small number of active users migrate.
Alternatively, if it doesn’t make sense to remove these functions, the fix for Windows seems relatively straightforward to implement with the help of subprocess.list2cmdline
, and we can help the small number of downstream uses this would adversely affect (those already naively quoting their args) either adapt to the change or (probably simpler) just migrate to subprocess
…though, in that case, we may as well have just deprecated them for removal.