Most developers using Python have never heard of .pth files. They have never used written, seen, or sought to install these files. With how stealthy this feature is to most developers, it being exploited by the recent LiteLLM attack resembles the Windows SMB vulnerability.
Since the GitHub issue asked me to write “a concrete proposal”, here is what it would look like on Linux:
- Acquire a
seccomp(2)notification fd and start a thread to handle it - Install a
seccomp(2)BPF filter in a C static constructor in the parent, so that the filter will - Intercept all
openat(2)calls, and send them to a the USER_NOTIF fd, which will - Inspect the
filenameto returnENOENTif it ends with.pthelse allow it, and - Allow everything once it sees a Python
open()call from atraceback.extract_stack()[-2]that comes from outside/usr/lib/python3or inside/usr/lib/python3/dist-packages, i.e. allow reading when normal developers’ code tries to open it
I and most Python developers don’t know whether the code that loads .pth files is, and it’ll take us days to find where, so seccomp is whatever I could come up with. Interposing libc could be simpler but I don’t know how to. In response to “we can’t remove .pth file support”, I would like to reiterate that normal Python programmers don’t want to see .pth files or have them continue to function. We usually work on .py files, get slightly annoyed when .pyc files pollute our Git until we exclude them, then only see .pyi in PyCharm and .egg and .whl in pip’s stdout. Looking at Stack Overflow, I’d say that the proliferation of unpronounceable Python file extensions need to stop. At least, they should start with py. Yes, alongside .pth, I object to .pxi and .pxd, which normal people think is related to pixel art and image editing programs.
Some said “There would need to be a lot of backwards-incompatible changes”. That’s just a hand-wavy excuse. Python 2to3 was difficult because it affected normal people’s scripts. Normal people don’t use .pth, so it’s not a problem. It’d only break advanced libraries, where Python year-after-year doesn’t have backwards compatibility. I can’t update system Python by 0.1 without breaking Ubuntu. If Python cared about it, they wouldn’t remove the time-honored audioop from stdlib that existed since Python 1.4. PyTorch and 50% of my ML frameworks break on every Python update. People legitimately using .pth are advanced users like them. If we release this .pth-breaking change, they’ll find a workaround like all the other advanced users. If it’d be too hard on the .py side, they can always use embedding or LD_PRELOAD to do what they want.
pwilkin asked for data:
I wouldn’t know how to get the statistics honestly, but I’d start by actually doing concrete steps on what sethmlarson already mentioned above:
what specific needs are being currently served by .pth files and what are the functionalities used
which packages / how many packages use that specific functionality
It seems to me like a lot of people here have floated a lot of various concepts around, but having a structured table with the exact functionalities and an idea on how to get the list of affected packages would be a good start, no?
So I ran cd /usr/lib/python3 && fd -uuue pth && cd ~/CLionProjects/pytorch/venv && fd -uuue pth. The things I found can all be migrated:
| Path | Description |
|---|---|
{/usr/lib/python3/dist-packages,venv/lib/python3.13/site-packages/distutils-precedence.pth}/distutils-precedence.pth |
This mostly prints a setuptools warning and unloads distutils. It’ll be gone in the future if setuptools just replaces the entire deprecated distutils folder. In the present, normal programs don’t need to load setuptools or have it remain installed in their venv. The logic should be moved into venv/bin/pip or /usr/bin/pip |
/usr/lib/python3/dist-packages/coloredlogs.pth |
I didn’t know that installing ocrmypdf caused additional (albeit stub) code I didn’t approve of to be injected into all of my Python programs. Any users should import the package manually, instead of having all programs check COLOREDLOGS_AUTO_INSTALL |
venv/lib/python3.13/site-packages/__editable__.*-*+git*.pth |
ncoghlan talked about “virtual environment chaining”. This logic should be moved into venv/bin/activate or into venv/bin/python3 if it’s turned into a wrapper script |
Therefore .pth is an unnecessary feature that only caters to packages that should have properly respected encapsulation. I don’t want to hear from any package I installed until I import it. People have said there are other Python startup files that a virus can infect. The other global one I’ve found is /usr/lib/python3.13/sitecustomize.py, which being plaintext, is easier to infect than a binary. On Ubuntu, it only loads apport_python_hook, so I think we should also remove support for it, and ask Ubuntu to move this startup logic into a new file in debian/patches/ to be alongside the 42 other patches they applied. To ensure reduced complexity/attack-surface and increased maintainability, the replacement should not continue to smell like a global variable, and should as I proposed, be local to each tool that had used .pth.
Finally, to expand to Windows, macOS, and other Unix, the .pth-blocking code should be moved into where the .pth-loading code lives, once someone manages to find it. To limit the suddenness, .pth loading should be reenableable by an environment variable in 3.15 before being removed in 3.16.