Let me take over (with the caveat that I think my views on lazy imports are slightly different than Damian’s, so I’ll try to stick to factual aspects of how pip is vulnerable).
The key points here are:
- Users have a reasonable expectation that running
pip install <some_wheel>will not execute arbitrary code. That’s a deliberate design feature of the wheel format. - Nobody is disputing that if you then run the wheel you’ve installed, the code in that wheel will be executed. The key distinction here is that you choose to run the installed code, whereas simply installing it is not considered a choice to run it.
- Pip’s code runs in the same environment as the wheel is being installed into. That could be considered a bad design, but it’s very deeply embedded into how pip works -
ensurepipnormalises that approach, for example. Even the pip zipapp works that way - it re-executes the pip code inside the target environment. There’s an open issue exploring options for not doing this, but it’s not really made much progress. - The way Python’s import system works means that if you modify files that are exposed on
sys.path, while a process is running, you will get the new code if you import the module after the file has been modified, but the old code if you imported before then. - Any form of lazy loading (whether PEP 810, or the old “manual” approach of importing inside a function) exposes code to the risk that if site-packages is changed while an application is running, the “wrong” code will be imported - this is a particularly significant risk when the purpose of the application in question is to modify site-packages.
- As a result, any lazy loading in pip’s code (either our own, or vendored libraries) is at risk.
- Pip had a specific issue where this was flagged (in the code to check if pip is up to date). We fixed it by removing the specific instance of lazy loading, but we did not address the general problem of lazy loading allowing newly-installed code to affect the running instance of pip.
- PEP 810 lazy loading doesn’t affect any of the above, but it does make it more plausible that pip’s vendored dependencies could start lazily importing code, to improve startup times. That results in a larger chance that this could turn into a potential route for malicious wheels to inject code into the running pip instance.
So to summarise, pip is unable (for various reasons mostly related to historical design choices that are extremely difficult to change) to guarantee that installing an unverified wheel is not safe. We attempt to ensure that, and we may well be able to make that guarantee in the future using some of the approaches suggested here. The “globally disable lazy imports” option appeared, superficially, to provide such a solution, but it doesn’t help with “old style” manual lazy imports, so we probably won’t take that option even if it does remain available.
The key attack vector here is where an attacker crafts a malicious wheel and persuades the victim to just install it (presumably using the argument that “if you don’t run it, you’re safe”). The attacker needs no access to the victim’s environment. As far as I am aware, the import system is the only means an attacker has to inject code into the running installer process, and the only fix available to installers is to not use the Python import system[1] after any code has been installed.
PEP 810 lazy imports don’t alter any of this directly. They do, however, make it easier for libraries to use lazy imports[2], hence increasing the burden on installer maintainers to ensure that those lazy imports don’t happen after the environment has been modified.
The -X lazy_imports=none option goes further and takes control out of the application code altogether, and puts it into the user’s hands. That makes the problem of auditing and hardening installer code significantly harder. Although to exploit that extra risk, the attacker now needs to persuade the victim to run pip with a non-standard invocation, so I don’t think it’s a practical issue, even if it is a theoretical one.
@notatallshaw - I hope none of the above undermines what you were trying to say. I don’t intend to. The security issue for pip is real, and lazy imports will be something we need to look at. I just don’t think pip is an argument that lazy_imports=none needs to stay, or that the stdlib cannot use lazy imports.