Deprecate and remove code execution in pth files · Issue #78125 · python/cpython · GitHub has been stalled for years due to the significant backwards compatibility challenges of changing how startup code execution works. It’s getting some attention again now due to the role *.pth files can play (and have recently played) in turning a temporary compromise of an updated dependency into a compromise of all code execution in a given Python installation or virtual environment.
There are benign uses for *.pth files (for example, editable installs rely on them, and setuptools uses one to implicitly upgrade legacy distutils installation scripts to adhere to modern packaging standards), and the sitecustomize and usercustomize modules also have legitimate use cases.
However, those legitimate use cases don’t generally include existing packages suddenly deciding to add such files to a new release with zero forewarning.
We’ve historically thought of this as an interpreter level problem to solve, but perhaps it would make sense to instead treat it more like the problem of moving from all package installs requiring arbitrary code execution to most package installs being able to run from auditable binary packages, and encourage package installers to default to disallowing (or at least noisily warning about) installation of *.pth files or top-level sitecustomize or usercustomize modules unless the packages are explicitly approved to do so? (approvals could come either from the installer tool developers, in the case of widely known instances like the setuptoolsdistutils modernisation hack, or by end users in the general case).
Edit: requiring PyPI admin approvals for the distribution of implicit startup files was also suggested in the CPython issue. That seems reasonable to me, since legitimate use cases for distributing implicit startup files via PyPI are genuinely rare, but I’d see it as complementary to this idea (since it would protect every PyPI client, regardless of the installer they use, but would still leave the escalation vector open in scenarios involving other indexes without a corresponding filtering mechanism)
This very much aligns with what I’ve been thinking about a lot recently[1]. *.pth files aren’t the only attack vector[2], but they are a significant one. All you have to do is get a single dependency package to install a *.pth file, and the environment is compromised.
From a quick look into my “package testing” environment (which has lots of random packages installed), I only have two *.pth files: one from setuptools, the other from coverage. So perhaps an allowlist approach is a feasible solution.
Though my conclusion was that this is an inherent weakness of an open package distribution system such as PyPI. ↩︎
For example, installers currently permit any package to overwrite any file on the system, including overwriting files installed by other packages. So a compromised minor dependency that you don’t even end up importing can easily overwrite a commonly used module. ↩︎
While .pth files are a surprising attack vector for most users, in practice are they actually more significant than hijacking a popular import name?
For example if a malicious package hijacked sys, let all calls passthrough to the real sys, but executed their code, would the impact been measurably different than a .pth file?
I’m not against discouraging .pth files, I just think we need to be clear that this does not make running a Python environment you’ve installed 3rd party code into really any safer.
The case I have in mind is publishing an sdist-only hijacked package with a malicious build backend.
If pth files are just too easy to exploit, we can pursue locking them down, but we should remain clear that doing so only addresses one of many ways that a compromised package is dangerous.
Regarding the ideas themselves here, my concern is alert fatigue and UX for users. If installers start warning users “Hey, pre-commit-uv uses a pth file! Are you sure you want to proceed?” we’re probably teaching those users to ignore the warning. So I don’t think that solution puts the added friction around this feature in the right place. Package authors and PyPI admins should be the ones to bear the extra burdens, since they’re the people best qualified to understand these features.
I know PyPI has quarantine capabilities. Could that be a way to address this? “Sudden introduction of pth files” would be treated as an indicator of compromise and trigger quarantine. And setup some pathway for pre-approval for legitimate use cases? This makes the mechanism a bit more generic, so that we’re setting up a solution which can cover other known attack vectors. @miketheman can probably tell me why this won’t work.
Another (albeit UNIX only and requires the user use an entry point) example of hijacking the environment without .pth files: This one takes advantage of dirname(sys.argv[0]) -> sys.prefix/bin coming in sys.path before the standard library:
# pyproject.toml
[project]
name = "spam"
version = "1.2.3"
[project.scripts]
"pathlib.py" = "spam:malware"
> pip install .
> pip --help
Traceback (most recent call last):
File "/tmp/16155/env/bin/pip", line 5, in <module>
from pip._internal.cli.main import main
File "/tmp/16155/env/lib/python3.12/site-packages/pip/_internal/cli/main.py", line 10, in <module>
from pip._internal.cli.autocompletion import autocomplete
File "/tmp/16155/env/lib/python3.12/site-packages/pip/_internal/cli/autocompletion.py", line 10, in <module>
from pip._internal.cli.main_parser import create_main_parser
File "/tmp/16155/env/lib/python3.12/site-packages/pip/_internal/cli/main_parser.py", line 9, in <module>
from pip._internal.build_env import get_runnable_pip
File "/tmp/16155/env/lib/python3.12/site-packages/pip/_internal/build_env.py", line 6, in <module>
import pathlib
File "/tmp/16155/env/bin/pathlib.py", line 5, in <module>
from spam import malware
ModuleNotFoundError: No module named 'spam'
(Of course a real package would actually provide spam.malware which could possibly even do something clever to load the original pathlib and forward its contents so that the user doesn’t notice anything is wrong.)
I imagine that malware distributors use .pth files just because they’re the most obvious option. Take it away and they’ll just move onto the next one.
Sure, and there are some such vectors (like naming conflicts between packages) that we’ll likely never close completely because it’s too hard to tell the difference between legitimate usage and suspect usage, so the false positive rate would be too high for a mandotory check at the index server level to be feasible. A feature to error on RECORD conflicts at installation time instead of following the current “last installed wins” approach might be feasible, though (since legitimate use of multiple packages distributing the same files should still only have one owner of each file in any given environment, they just might have different owning packages in different environments).
Script names that shadow stdlib modules, or data files that target stdlib paths are even sketchier in distributed packages than the implicit startup files are, though, so they’re more an argument for a generalised suspicious file filtering feature than they are an argument for continuing to presume that all distributed implicit startup files are benign just because we know some of them are.
I think the only real viable way to do this is to add the hypothetical pip audit command and have it include certain checks like this (which is likely even more valuable than checking CVE data, tbh). If we can progressively migrate that to on-by-default and breaks-in-CI, then I think we’re in an actual good spot.
Relatively easy checks we could do post-install:[1]
any .pth files with arbitrary code
any RECORD files with mismatched hashes
any RECORD files referencing “unusual” locations (such as inside other top-level package names)
bundled native executables
non-Python shebangs in scripts
Obviously anyone can opt into doing these checks already, but that’s not the problem we face. The problem we have to deal with is the default behaviour doesn’t check, and so people using the defaults get caught. So the most important aspect of this whole discussion must be that we are changing the default (with migration period), not merely adding new non-default ways to protect yourself.
At severe risk of turning this thread into an argument about each of these. Don’t do that. If you’re mad about these suggestions, go start a new thread. Mods, if people want to argue about these suggestions, please just split them into a new thread. These are here for context and to support my argument that a post-install audit command could do more than just CVE data (the usual proposal) or look for .pth files (the current topic), not to start 100 side arguments about the specifics. ↩︎
If we’re auditing for malicious behavior, and not just unintentional bad behavior, then the problem is pip is typically part of the Python environment, so a pip audit command can be hijacked by a malicious install.
Such an audit would either need to run from a separate Python environment, or as part of the install itself (with the guarantee that pip does not import any modules once install starts, which is not currently guaranteed [I have an open PR towards this]).
Given these constraints, I would strongly suggest this be a separate tool that is designed to be installed independently of the Python environment it is auditing.
I agree this will take a multi-pronged approach and that “closing the pth hole” will still only partially address the overall issue. I just want to add a couple of points[1].
A solution that relies solely on packaging tools will be impartial. Not everyone uses pip and there are definitely other indexes than PyPI out there.
It’s likely useful to separate out the two functions of pth files: to extend sys.path and to execute arbitrary code. The latter is IIRC kind of an accident of history; pth processing in site.py executes lines that start with ‘import’ which I think was added to support some Zope thing ages ago. Then people remembered that Python supports ; for separating statements . There have been some pretty nasty hacks around that particular quirk.
The Python interpreter should be part of the solution, since it’s literally the thing that can be affected. Maybe that’s some -X switches to control how site.py handles pth files, rather than the big -S hammer.
The problem is we need it to be the default. Anything that requires the user to take more responsibility means the user could also do 100 other things to protect themselves (including using a different package index that doesn’t end up with blatantly malicious packages on it).
So for this forum, that means fixing PyPI, pip, and/or CPython, since those are the defaults under our control. If we can’t come up with a fix for any of these, then we just can’t fix it and need to advertise the risks and recommend users switch to a tool that can fix it.
Really hope we don’t go the route of having installers interactively prompt for confirmation. IMO installers should be able to work non-interactively. Think of all the provisioning flows out there in CI instances, docker, etc.. Ones that do prompt (chocolatey is one) end up having to provdie a cli option to pre-answer, and that just becomes part of a user’s muscle memory (I always type choco install -y …) and so defeats the purpose.
I don’t think anyone was suggesting – I definitely wasn’t – breaking non-interactive installs. My point was only that I don’t think that such prompting or warning achieves the goal of “a more secure packaging ecosystem”.
A couple of scenarios I’d like people to keep in mind, partly because they’re common developer workflows, are pre-commit/prek and uvx/pipx run. Those tools pull a repo or package and then install it as a package, including resolution of dependencies.
Both of these patterns involve users installing packages without locking their dependencies, or even having an obvious UX to show them that they aren’t locking.
pre-commit could be updated to support lockfiles (I expect this to happen someday), and I wouldn’t be surprised if uvx does already. But these are just the first indicators of a variety of workflows which have been built without any notion of locking baked in. That’s why I suggest looking for a solution on the publishing/index side, upsteam of naive installation patterns.
I’ve started to think about what uploading pylock.toml files to PyPI and other indexes might look like. I’ve got a PEP coming probably next week and the prebuilt binaries idea, though, so no one should wait for me if they are so motivated to work on this.
If we stay focused on defaults and not knobs and about .pth files, I don’t think CPython will work due to editable installs.
PyPI could do the quarantine step or require admin opt-in for a specific project (and in case people don’t know, coverage.py has a .pth file it ships with, so there are legitimate use cases).
For pip, there could be requiring an opt-in for what can get installed from a wheel (although how sdists get treated is another question; --only-binary would be a good default someday). We could even go as far as having a PyPA project that applied checks/audits to wheel files so it’s “impartial” like Barry suggested that any installer could use. But there would need to be a way for the tools to skip auditing per-tool so projects like coverage.py could be installed (which could be “install from a lock file” with the assumption you’ve audited the lock file, via a CLI option, etc.).
But honestly, a bigger win is an idea from @sethmlarson is having a default cooldown for uploaded files in installers. So I think this is also asking the bigger question of where to focus our energy.
My vote is one of the PyPI solutions if we are talking about .pth files specifically. And then we can continue on trying to improve our general security defaults.
Honestly, I think that’s just editable installers being lazy. They can find another way to do the same thing without too much trouble - there’s no actual need to do that work at import site-time, and they’re already installing, which is allowed to modify the environment. I’m far more sympathetic to the coverage.py scenario (and have used it myself in a few tools).
I think I’d be okay with PyPI (or pip) quarantining/rejecting .pth files with an admin override. Or maybe tools like coverage would be happy enough with a “run python -m coverage install to enable it by default” instruction, and then don’t bother with the override. I don’t think it needs to necessarily be universal, provided users who aren’t specifically thinking about this attack vector get reasonable out-of-the-box protection.
Cooldowns are not a silver bullet, they create their own security problems, for example you could miss a zero-day security vulnerability in your web stack because you have a default cooldown enabled.
See the latest discussion in pip#13674, by using cooldowns you are effectively signing up to review (or have some tool review) your entire dependency stack for vulnerabilities and update them in an independent way than the cooldown method. I am not comfortable as an installer maintainer to opt users into that without their knowledge.
Further, I think enabling cooldowns by default violates the principle of least surprise, especially on private indexes that support upload-time, the user flow will be: upload-package → try to install → don’t see version → ???.
I don’t think more warnings here for users, or more opt-ins are likely to help with the specific issues that have caused this to need to be discussed. I believe this is going to be a point of fatigue and possibly be more detrimental than helpful for this specific goal.
I wrote a short blog post about this a while ago, but it’s quite trivial to create a malicious wheel that does the expected behavior and a small random portion of the time also does malicious behavior. Over time, in widespread use, the malicious behavior is an eventuality, but it avoids initial widespread notice, while being hidden in compiled code where it is also less visible. It continues doing what the original code was meant to in all cases.
The fact that it hasn’t happened yet is only because there are easier ways to compromise users already available and that those compromising existing packages have not had to take steps that would be more difficult to notice yet, not because other methods such warnings would not be able to cover are infeasible.
Each one that we go after that requires focusing on the specific method + prompting the user contributes to fatigue and weakening the warnings, so we really should be looking to use such warnings sparingly, and only when most likely to be helpful.
Trust on first use is unlikely to help with the current issues around hijacking existing packages.
In truth, wheels with native code bundled are the least auditable option for the average user, and I wonder if we’re soon to reach a point where audited source distributions + building it yourself goes from something only large companies end up doing to becoming standard guidance. It feels like that’s the only guidance that actually solves the root problem, though it limits who would be able to follow it.