Prevent open(2)ing of `.pth` files during startup

Most developers using Python have never heard of .pth files. They have never used written, seen, or sought to install these files. With how stealthy this feature is to most developers, it being exploited by the recent LiteLLM attack resembles the Windows SMB vulnerability.

Since the GitHub issue asked me to write “a concrete proposal”, here is what it would look like on Linux:

  1. Acquire a seccomp(2) notification fd and start a thread to handle it
  2. Install a seccomp(2) BPF filter in a C static constructor in the parent, so that the filter will
  3. Intercept all openat(2) calls, and send them to a the USER_NOTIF fd, which will
  4. Inspect the filename to return ENOENT if it ends with .pth else allow it, and
  5. Allow everything once it sees a Python open() call from a traceback.extract_stack()[-2] that comes from outside /usr/lib/python3 or inside /usr/lib/python3/dist-packages, i.e. allow reading when normal developers’ code tries to open it

I and most Python developers don’t know whether the code that loads .pth files is, and it’ll take us days to find where, so seccomp is whatever I could come up with. Interposing libc could be simpler but I don’t know how to. In response to “we can’t remove .pth file support”, I would like to reiterate that normal Python programmers don’t want to see .pth files or have them continue to function. We usually work on .py files, get slightly annoyed when .pyc files pollute our Git until we exclude them, then only see .pyi in PyCharm and .egg and .whl in pip’s stdout. Looking at Stack Overflow, I’d say that the proliferation of unpronounceable Python file extensions need to stop. At least, they should start with py. Yes, alongside .pth, I object to .pxi and .pxd, which normal people think is related to pixel art and image editing programs.

Some said “There would need to be a lot of backwards-incompatible changes”. That’s just a hand-wavy excuse. Python 2to3 was difficult because it affected normal people’s scripts. Normal people don’t use .pth, so it’s not a problem. It’d only break advanced libraries, where Python year-after-year doesn’t have backwards compatibility. I can’t update system Python by 0.1 without breaking Ubuntu. If Python cared about it, they wouldn’t remove the time-honored audioop from stdlib that existed since Python 1.4. PyTorch and 50% of my ML frameworks break on every Python update. People legitimately using .pth are advanced users like them. If we release this .pth-breaking change, they’ll find a workaround like all the other advanced users. If it’d be too hard on the .py side, they can always use embedding or LD_PRELOAD to do what they want.

pwilkin asked for data:

I wouldn’t know how to get the statistics honestly, but I’d start by actually doing concrete steps on what sethmlarson already mentioned above:

  • what specific needs are being currently served by .pth files and what are the functionalities used

  • which packages / how many packages use that specific functionality

It seems to me like a lot of people here have floated a lot of various concepts around, but having a structured table with the exact functionalities and an idea on how to get the list of affected packages would be a good start, no?

So I ran cd /usr/lib/python3 && fd -uuue pth && cd ~/CLionProjects/pytorch/venv && fd -uuue pth. The things I found can all be migrated:

Path Description
{/usr/lib/python3/dist-packages,venv/lib/python3.13/site-packages/distutils-precedence.pth}/distutils-precedence.pth This mostly prints a setuptools warning and unloads distutils. It’ll be gone in the future if setuptools just replaces the entire deprecated distutils folder. In the present, normal programs don’t need to load setuptools or have it remain installed in their venv. The logic should be moved into venv/bin/pip or /usr/bin/pip
/usr/lib/python3/dist-packages/coloredlogs.pth I didn’t know that installing ocrmypdf caused additional (albeit stub) code I didn’t approve of to be injected into all of my Python programs. Any users should import the package manually, instead of having all programs check COLOREDLOGS_AUTO_INSTALL
venv/lib/python3.13/site-packages/__editable__.*-*+git*.pth ncoghlan talked about “virtual environment chaining”. This logic should be moved into venv/bin/activate or into venv/bin/python3 if it’s turned into a wrapper script

Therefore .pth is an unnecessary feature that only caters to packages that should have properly respected encapsulation. I don’t want to hear from any package I installed until I import it. People have said there are other Python startup files that a virus can infect. The other global one I’ve found is /usr/lib/python3.13/sitecustomize.py, which being plaintext, is easier to infect than a binary. On Ubuntu, it only loads apport_python_hook, so I think we should also remove support for it, and ask Ubuntu to move this startup logic into a new file in debian/patches/ to be alongside the 42 other patches they applied. To ensure reduced complexity/attack-surface and increased maintainability, the replacement should not continue to smell like a global variable, and should as I proposed, be local to each tool that had used .pth.

Finally, to expand to Windows, macOS, and other Unix, the .pth-blocking code should be moved into where the .pth-loading code lives, once someone manages to find it. To limit the suddenness, .pth loading should be reenableable by an environment variable in 3.15 before being removed in 3.16.

urllib3-future uses pth file to overwrite urllib3 with urllib3-future.

The urllib3-future is a fork of urllib3 and acts as a drop-in replacement for urllib3. However, since the current Python packaging ecosystem does not support such drop-in replacement packages, a forceful method using pth files is used to override it.

The urllib3-future is a fork of urllib3

See Pillow, which is a fork of PIL

urllib3-future uses pth file to overwrite urllib3 with urllib3-future.

With Pillow, I was able to do python3 -m venv venv ; . venv/bin/activate ; pip install pillow ; python3 -c 'import PIL' ; fd -uuue pth , showing that it doesn’t use .pth. python3 -c 'import pillow' doesn’t work, showing that Pillow simply creates the venv/lib/python3.13/site-packages/PIL/ folder instead of messing with .pth.

urllib3-future should use this method that Pillow uses.

“Therefore”? Three examples don’t constitute a proof.

As an example of a use case you haven’t covered, editable installs are currently implemented using a .pth file. Please demonstrate how that would be replaced. Editable installs are a very popular feature, and absolutely cannot simply be dropped.

The .pth loading code lives in Python’s site module. If you don’t know this, it seems unlikely that you’ve done sufficient research on what .pth files are and how they work for your proposal to be credible.

6 Likes

Using -S to skip running .pth files. Though you cannot import any site packages.
The only way to 100% prevent all malicious packages is forbidden uploading any packages.

1 Like

I agree. My ~/CLionProjects/pytorch/venv example is an editable install, so I also use that feature.

That would easily be replaced by venv setting PYTHONPATH:

#!/bin/bash
rm -rf create-your-own-python-pip-package
git clone git@github.com:KarmaComputing/create-your-own-python-pip-package.git
cd create-your-own-python-pip-package
python3 -m venv venv
cat >> venv/bin/activate <<\EOF
# PowerShell version will be similar
pip3() { pip "$@"; }
# Adjust version if necessary
pip3.13() { pip "$@"; }
>> "$VIRTUAL_ENV"/editablepath
PYTHONPATH="$(<"$VIRTUAL_ENV"/editablepath)$PYTHONPATH"
export PYTHONPATH
pip() {
  command pip "$@"
  venv_new_editable_pth="$(find "$VIRTUAL_ENV"/lib/python3.*/site-packages -name '*.pth' -exec cat {} + | tr \\\n :)"
  echo "$venv_new_editable_pth" >> "$VIRTUAL_ENV"/editablepath
  find "$VIRTUAL_ENV"/lib/python3.*/site-packages -name '*.pth' -delete
  # Add deduplication and uninstall (directory removal) detection later
  PYTHONPATH="$venv_new_editable_pth$PYTHONPATH"
  unset venv_new_editable_pth
}
EOF
. venv/bin/activate
pip install -e .
python3 -c 'from example_python_package import *;print(add_one(0))'
sed 's/1/100/' -i src/example_python_package/example.py
python3 -c 'from example_python_package import *;print(add_one(0))'
find -name '*.pth' -print

Had most Python developers known this, I would not have proposed its removal. On GitHub, people who oppose .pth removal frequently have [Member] next to their names, and those who support removal don’t. I bet that if a survey of average Python developers was done, they’d also be shocked that this feature exists.

I have a computer but modern Python developers increasingly don’t. They open their phone’s browser to Jupyter notebook inside Google Colab or alternatives. They will be confused if after they %pip install something, and something changes without them importing anything.

1 Like

You are quoting me in specific here. Please note that this was not said in reference to removing support for .pth files, it was referring to “completely preventing interpreter hijacking at startup“.

I am gonna try to clarify my position, as the discussion on the issue tracker has become quite convoluted.

I am not against removing support for .pth files. What I am against is removing support for .pth files without tackling the other attack vectors that allow a malicious package to hijack the interpreter, or at the very least, without a plan to do so.

My reasoning here is that removing .pth files will cause a lot of breakage downstream, but it doesn’t move the needle whatsoever in terms of security without also tackling the other attack vectors. To justify putting users through the .pth file deprecation, I simply need a real commitment that we will also be tackling the other issues, as the work and churn that we will be placing on downstream users will otherwise be meaningless.


Okay, now that it’s clear where I stand regarding the .pth deprecation, let’s look at the changes that would be needed to prevent malicious packages from hijacking the interpreter. Remember, without these changes, removing support for .pth files doesn’t do absolutely anything.

A proper security audit would be needed to define the needed changes, but here’s a rough list for the interpreter startup:

  • Remove support for .pth files
    • Will break editable install, custom codecs, etc.
  • Remove support for sitecustomize
  • Remove support for usercustomize
  • Remove support for ._pth files in non-embedding use-cases
    • Users that embed Python into other applications/programs often use ._pth files, but since we don’t want regular Python installations to support them, embedding users would have to manully enable support if they need this functionality
  • Hardcode installation prefixes
    • This is the change that would cause the most breakage, it completely changes how Python is distributed, it would require distributors to adapt the way they distribute and provision Python installatations to reflect the way prefixes are now computed
  • Remove support for the home key in pyvenv.cfg
    • Per the point above, the base prefixes would be hardcoded

Additionally, we would need to figure out what to do in order to mitigate the following attack vectors in the import system:

  • Hijacking imports of common modules by taking advantage of the importer order (eg. shipping a .zip file or a native extension, which take precedence over pure modules)
  • Getting code execution by defining common entrypoints

The interpreter startup changes are doable, they will just break a lot of dowstream code, but I am totally blank when it comes to the import system mitigations.


A different approach…

After reading the section above, it may be clear how difficult it would be tackle these security issues on the interpreter runtime. I think perhaps it would make more sense to tackle them in packaging tooling.

Package installers are in a position where it is fairly easy to prevent all attack vectors covered above:

  • Disallow installing a sitecustomize module
  • Disallow installing a usercustomize module
  • Disallow installing a pyvenv.cfg file to the scripts directory
  • Disallow installing a ._pth file to the scripts directory
  • Disallow installing two packages with import conflicts (prevents import hijacking)
  • Query the user whether .pth files should be installed (could be disabled by default if not installing from source)
  • Query the user whether the package entrypoints should be installed

This approach doesn’t require any changes to the Python interpreter, and would also cover old Python versions.

9 Likes

While others might’ve focused on .pth, my first post in this forum included a proposal to also remove sitecustomize and let distros add a patch to Python like they already do.

Seeing your list makes me sad about the state of things. PEP 20 import this says that Python should have one obvious and explicit way. Instead, we have six ways.

Here’s what each should be replaced with:

  • Remove support for .pth files
    • Editable install will be supported by my modified venv/bin/activate above
    • Replacement packages like urllib3-future should use whatever method Pillow uses to rename itself to PIL
  • Remove support for sitecustomize
    • Distros can patch Python itself like they already do
  • Remove support for usercustomize
    • Users should edit or wrap each Python tool they use. Forgotten code in usercustomize that causes an incompatibility risks wasting maintainer time
  • Remove support for ._pth files in non-embedding use-cases
    • Yes. But also remove ._pth for embedding use-cases. They can use PyRun_SimpleString before running other code
  • Hardcode installation prefixes
    • Which? PYTHONHOME is used by venv, and is an envvar. Envvars already contain secret API tokens in CI so PYTHONHOME needn’t be removed, and being in the parent shell, are unmodifiable by malware until restarting venv
  • Remove support for the home key in pyvenv.cfg
    • Where is the home key read? I inserted random letters into it, and both pip and python still work without showing any errors

I don’t think this should be in the package installer layer. That would increase complexity, not reduce it. There’s a risk of such a component growing into an large antivirus or vulnerability scanner, while the Trivy scanner problem was exactly what caused the incident.

I’m not emphasizing the anti-hijacking rationale of removing .pth. Whenever I pip install something, I usually import it 10 seconds later, so there’s not much difference between malware in __init__.py and *.pth. What removal would help prevent is stealth and persistence. Most Python developers know how to set a breakpoint before the first import, but wouldn’t look for *.pth.

Even though PYTHONHOME and PYTHONPATH can ultimately control code execution, I think we should keep, and centralize on them. I’m surprised venv wasn’t already using PYTHONPATH. It’s a PEP-20-like “one obvious way to do it”, and one that people might find on Stack Overflow, while the 5 other ways are too implicit non-obvious.

If it really requires a modified activate, then that’s a show stopper for me. I’ve never “activated” a venv, instead I just use the full path to python. I’m not the only one who does this.

12 Likes

And if you’re not using a virtual environment then you’re stuffed?

After some coding, I have moved away from Bash-specific venv stuff.

My current pull request is linked in the other topic. Basically, I replaced .pth files with regular .py files in site-packages. Those then load the source of the editable project only when I ask for them to be imported. It can be adjusted to include in pip.

It is difficult to use the same method as Pillow in nowadays for urllib3-future.

urllib3-future aims to be a drop-in replacement for urllib3 while maintaining the ecosystem of urllib3. That is, it needs to work alongside libraries that depend on urllib3, such as requests and requests-cache.

Since requests and requests-cache depend on urllib3, installing urllib3-future should resolve this dependency. However, the current Python packaging system does not support such drop-in replacements.

Support for drop-in replacements is one of the features that the Python packaging system must provide before removing code execution from pth files.

1 Like