The meaningful difference is that it is a one step to execute code on AND you didn’t expect any code to be executed. Instead of two steps to execute code where you did expect code to be executed.
Users could have a workflow where they first only install wheels and then review the environment, manually or using automated tools, to make sure it matches their expectations.
Of course, once users are expecting to execute third party code there’s nothing that can be 100% guaranteed. But this expectation is not there for installing wheels.
This discussion is getting pretty tiring to justify that a real ACE issue pip had due to non eager imports is a real security implications for a PEP that allows users to force pip’s imports to be non eager. I’m going to disengage for awhile.
Let me take over (with the caveat that I think my views on lazy imports are slightly different than Damian’s, so I’ll try to stick to factual aspects of how pip is vulnerable).
The key points here are:
Users have a reasonable expectation that running pip install <some_wheel> will not execute arbitrary code. That’s a deliberate design feature of the wheel format.
Nobody is disputing that if you then run the wheel you’ve installed, the code in that wheel will be executed. The key distinction here is that you choose to run the installed code, whereas simply installing it is not considered a choice to run it.
Pip’s code runs in the same environment as the wheel is being installed into. That could be considered a bad design, but it’s very deeply embedded into how pip works - ensurepip normalises that approach, for example. Even the pip zipapp works that way - it re-executes the pip code inside the target environment. There’s an open issue exploring options for not doing this, but it’s not really made much progress.
The way Python’s import system works means that if you modify files that are exposed on sys.path, while a process is running, you will get the new code if you import the module after the file has been modified, but the old code if you imported before then.
Any form of lazy loading (whether PEP 810, or the old “manual” approach of importing inside a function) exposes code to the risk that if site-packages is changed while an application is running, the “wrong” code will be imported - this is a particularly significant risk when the purpose of the application in question is to modify site-packages.
As a result, any lazy loading in pip’s code (either our own, or vendored libraries) is at risk.
Pip had a specific issue where this was flagged (in the code to check if pip is up to date). We fixed it by removing the specific instance of lazy loading, but we did not address the general problem of lazy loading allowing newly-installed code to affect the running instance of pip.
PEP 810 lazy loading doesn’t affect any of the above, but it does make it more plausible that pip’s vendored dependencies could start lazily importing code, to improve startup times. That results in a larger chance that this could turn into a potential route for malicious wheels to inject code into the running pip instance.
So to summarise, pip is unable (for various reasons mostly related to historical design choices that are extremely difficult to change) to guarantee that installing an unverified wheel is not safe. We attempt to ensure that, and we may well be able to make that guarantee in the future using some of the approaches suggested here. The “globally disable lazy imports” option appeared, superficially, to provide such a solution, but it doesn’t help with “old style” manual lazy imports, so we probably won’t take that option even if it does remain available.
The key attack vector here is where an attacker crafts a malicious wheel and persuades the victim to just install it (presumably using the argument that “if you don’t run it, you’re safe”). The attacker needs no access to the victim’s environment. As far as I am aware, the import system is the only means an attacker has to inject code into the running installer process, and the only fix available to installers is to not use the Python import system[1] after any code has been installed.
PEP 810 lazy imports don’t alter any of this directly. They do, however, make it easier for libraries to use lazy imports[2], hence increasing the burden on installer maintainers to ensure that those lazy imports don’t happen after the environment has been modified.
The -X lazy_imports=none option goes further and takes control out of the application code altogether, and puts it into the user’s hands. That makes the problem of auditing and hardening installer code significantly harder. Although to exploit that extra risk, the attacker now needs to persuade the victim to run pip with a non-standard invocation, so I don’t think it’s a practical issue, even if it is a theoretical one.
@notatallshaw - I hope none of the above undermines what you were trying to say. I don’t intend to. The security issue for pip is real, and lazy imports will be something we need to look at. I just don’t think pip is an argument that lazy_imports=none needs to stay, or that the stdlib cannot use lazy imports.
Or only use it in extremely tightly controlled ways, which is probably more difficult than simply not using it at all. ↩︎
Just one minor point I don’t think is clear from this post, or the thread in general, the security implication from the PEP is the mode -X lazy_imports=all, not from the introduction of lazy imports themselves, as pip has no control if a user has enabled that mode.
Re-reviewing the PEP I believe it’s also generally sufficient for pip to set lazy imports to normal (or at least unset all if it is set), rather than none. Whether explicitly listed lazy imports affects the security of pip will go through the normal review process.
Good point. I think the PEP is pretty clear that users forcing all imports to be lazy are responsible for any breakage caused by code that’s not prepared for lazy imports. While that’s typically modules that do initialisation on import, I think it also applies to cases like pip, where we’d say “we haven’t audited pip’s behaviour if all imports are forced to be lazy, so don’t do that in a security sensitive context”.
So while I agree that lazy_imports=all is a problem for pip, I don’t think it’s one we need to be overly concerned about. And as you say, we can force it off in our initialisation if we want to.
FWIW I wasn’t questioning the mechanism, more questioning whether it’s actually a meaningful security issue or not. I’m personally (and always have been) less interested in academic concerns and I care a lot more about practical concerns. Hence why I was trying to figure out a threat model in which an attacker could exploit this, which didn’t also apply to the fact that the installed code is now on the user’s system and will be used at the next invocation.
Which I believe that Damian did provide as an answer to me! In my head the user was always responsible for auditing a wheel prior to installation, and it honestly never occurred to me that a user may be trying to audit post installation. That’s not something I’d recommend, as I don’t think we’ve ever guaranteed that wheels would not allow arbitrary code execution on install [1], but it’s definitely something that exists today that someone could reasonably be doing and makes a coherent threat model that applies to lazy imports but doesn’t apply to the ability to just install anything at all.
The attack vector ends up roughly being that, when lazy imports are being used, a malicious wheel could bypass a user’s ability to exclude ACE using --only-binary=:all: and post install auditing.
In fact there’s a long standing issue asking for post install hooks for wheels, which effectively concluded with “someone would have to write a PEP for this, and I expect it to be a contentious PEP, but it’s not inherently something that cannot happen”. ↩︎
This has been on my mind as I’ve been reading this thread and trying to think if the conversation needs a nudge away from looking at pip.
Imports can still be lazy even with the flag set.
So. Does the global flag to disable lazy imports serve any useful purpose? (Other than to make new style lazy imports unusable in the stdlib and any libraries which want to support the flag, I mean…)
Addendum: I’m aware that latency sensitive applications might use it if it were available. Maybe a better phrasing is “does this flag need to exist for any important use cases?”
No, because it’s behavior can trivially be implemented with other mechanisms in the PEP.
The only thing removing the flag would do is to signal that it’s ok and expected for libraries to have a strict reliance on lazy imports for e.g. optional dependencies or circular dependencies, which is a significant change from the conditions under which the PEP was originally accepted, directly contradicting it’s text and the expressed intentions of the authors.
The PEP explicitly says it can be used for circular imports.
Emphasis mine:
Additional documentation will be added to the Python documentation, including guidance, a dedicated how-to guide, and updates to the import system documentation covering: identifying slow-loading modules with profiling tools (such as -X importtime), migration strategies for existing codebases, best practices for avoiding common pitfalls with import-time side effects, and patterns for using lazy imports effectively with type annotations and circular imports.
It does note, that you still have to avoid using the references during module initialization if you want them to work:
Lazy imports don’t automatically solve circular import problems. If two modules have a circular dependency, making the imports lazy might help only if the circular reference isn’t accessed during module initialization.
And also:
The best practice is still to avoid circular imports in your code design.
But it still indicates that, yes, these can be used for circular imports.
It may be that this means that in time, as more and more packages embrace both typing and lazy imports, the global disable becomes mostly unused and unusable.
I think what wasn’t anticipated is how much implementing lazy imports in the stdlib would break the flag and make it immediately unusable, rather than gradually unusable.
In some workflows, pip runs with more capabilities than the python environment it is building, and isn’t reinvoked ever again (some build systems for container images, as one example).
In that example case, getting execution during the initial invocation of pip gives the attacker more capabilities, and potentially a foothold in a build system rather than in the resulting container image.
I know at my day job, we don’t fully rely on this, as we have the resources to review more before used, but it may be a property people rely on as:
I don’t think so. I think all of the uses that need it can be solved with runtime reification as needed. Possibly with a helper function in importlib if it’s deemed tricky for end users to get this right, and this version should work with circular imports properly, as the initial module declarations have already happened.
I think it’s arguable here that if -X lazy_imports=none doesn’t exist, that overtime as the language provided way of doing this becomes the “one way”, pip (and other apps with a reason to reify) have an easier time long term ensuring they do so. The version provided by the language should be what people use here long term, and this one is visible and possible to interact with in a uniform way.
I agree that there will be a benefit in the kind of uniformity of behavior we’ll get when 3.14 or 3.15 goes EOL. But, just to say it aloud, lazy imports will never be implemented only one way.
I think focusing on pip is very much misleading.
The big questions I have:
Does the stdlib ever get to use the new syntax for performance (but not correctness)?
Does stdlib testing need to be run with the flag both on and off?
I originally saw the flag as a shorthand for a kind of filter you could write yourself. But it also makes a promise that such a filter will work with the whole core language. Maybe that’s a bad promise to make.
I think earlier versions of the text did not. They did however have the surviving claim that:
Lazy imports eliminate the common need for TYPE_CHECKING guards.
I pointed out that this was in contradiction with lazy_imports=none because the TYPE_CHECKING blocks are commonly used in libraries to avoid circular imports for typing-only imports. At the time some text was added to the PEP to the effect of “don’t expect lazy_imports=none to work with any libraries” but I don’t see where that text is now. It seemed obvious to me that lazy_imports=none would become unusable as soon as any libraries adopted lazy imports.
As long as -X lazy_imports=none exists, I’m reluctant to port existing forms of deferred imports to lazy import knowing that I would be slowing down or breaking anyone who uses -X lazy_imports=none.
Quick use case: When moving over to using lazy imports (via __lazy_modules__), it’s really handy to set -X lazy_imports=none to disable the usage of lazy imports and see if it is working (such as comparing with -X importtime). Without some way to disable all lazy imports, it’s harder to tell if it’s working. (in 3.15.0a7, none doesn’t affect __lazy_modules__, so I’m working around this by deleting all __lazy_modules__, which is easy to do because I’m in git and this is the thing I’m adding).
Honestly, I’d be fine with the opposite: none (or some new term) disabling __lazy_modules__ and not the explicit syntax. You can use the explicit syntax to make a circular import safe, but you can’t use __lazy_modules__ to do that, and the standard library can use the lazy syntax, keeping it lazy. And I expect “slowly break” is much more applicable for __lazy_modules__, since most libraries will be using that for 3-5 years.