I first noticed this two days ago, when Pablo opened a pull request to move “manual” lazy imports (import in a function body or similar) to the new lazy import syntax. In that PR, @DavidCEllis pointed out that this breaks a very large majority of the standard library when -X lazy_imports=none is enabled, which resulted in the PR being closed.
I questioned this on the #internals-and-peps channel of the public Python discord server. For more context, you can read the discussion there. In short, @ichard26 made the point that pip needs -X lazy_imports=none for security reasons, because a malicious package might replace one of pip’s modules with itself, allowing it to execute arbitrary code. See this issue on the pip repo for more information.
I’m not convinced that -X lazy_imports=none adequately solves this problem, because pip doesn’t allow imports in function bodies either. Realistically, I see two paths for the new flag:
Most libraries ignore the existence of -X lazy_imports=none, so disabling lazy imports results in circular import errors, making the flag effectively unusable anywhere outside the standard library. This seems most likely to me.
Or, people see this as too frustrating to reason about, so they continue to use the old system of using an eager import to mimic lazy imports, preventing widespread adoption of the new syntax.
Either way, I think it would be very unfortunate if the standard library couldn’t use the new syntax because of this flag.
I wasn’t involved in the creation or discussion of PEP 810, so I may be missing context, but it seems that an audit hook would have been sufficient to prevent imports where necessary. In fact, we do have an import audit event already, but I don’t know if reification triggers that. If it does, then I think pip can just add an audit hook for import that raises an exception once a package has been installed.
Small correction, this isn’t true, but moving imports into functions has to have measurable impact and additional review due to the potential security issues.
Yes, in fact pip shouldn’t import anything post install, as a module could use a stdlib name.
This thread did make me think, rather than disabling lazy importing, I wonder if we can actively enforce no importing. Directly after the installation step break the ability for imports to work. This would solve potential security issue in any context then, not just lazy vs non lazy situations.
I’ll look at raising an issue and do some preliminary work when I have a moment.
Yeah, my suggestion is to just install an audit hook for an import event that raises an exception. I think it should be as simple as:
import sys
def hook(name, args):
if name == "import":
module_name = args[0].split(".")[0]
if module_name not in sys.builtin_module_names:
raise RuntimeError("For security reasons, pip doesn't allow imports at this point")
sys.addaudithook(hook)
That should cover both function-body imports and PEP 810 lazy imports without the need for a global filter.
Ah right, sorry I read your original post on the train and I clearly didn’t fully comprehend all the points.
Yes that looks good, it would need to be applied to all modules, not exclude standard library names though, third party modules can use a standard library names.
Do you know when the lazy import triggers this event? When lazy import is called or when the module is about to be materialized? I’m not sure how much difference it makes but it would be good to understand in terms of design.
Looking at the code for _PyImport_LoadLazyImportTstate, it simply calls __import__ upon reification, which should emit the audit event. But, it’d be pretty easy to add a new audit event for the lazy import statement itself if that’s better.
I think, much like the PR moving manual lazy imports everywhere you need to be careful where you set such a hook. I’m not sure what the pip staging looks like and how much work there is still to do after the installation step.
I wonder if it would be possible to have some kind of function that reifies all ‘pending’ lazy imports. Something like that could probably make such a hook easier to work with. Reify any pending imports first, then insert the hook to prevent any new ones.
sys.lazy_modules doesn’t seem to actually have enough information to potentially do that, and also somewhat surprisingly given the PEP - isn’t a set?
It’s kind of unfortunate that pip (usually) works from the same environment that is installing into. I actually think there are some performance gains in pip that can be had from using lazy imports more broadly but on attempting to test to see if it’s actually worth investigating I ran into a lazy import bug.
[edit: I initially said ‘enabling’ lazy imports everywhere when I meant ‘moving’ them as in the PR mentioned in the initial post]
It’s not quite as simple as a call to sys.removeaudithook (as that doesn’t exist), but an audit hook won’t stop malicious code, only nominally raise the cost for it:
In particular, malicious code can trivially disable or bypass hooks added using this function.
But I can’t tell if that will also reify any pending lazy import statements.
That would require the package to be able to execute arbitrary Python code already. By preventing the import, a malicious package can’t do that. The audit hook is merely a guard rail.
Apologies if this is also naive or ignorant, but could pip insert a simple Stdlib importer (Finder/Loader pair) into sys.meta_path, before the normal Path importer?
Pip doesn’t require shadowing of stdlib modules to be a supported feature in Python, does it?
Or just clear out sys.path. Once pip is imported, you should be able to get to all submodules without anything on sys.path (via pip.__path__), so if you know you don’t need anything else from site-packages, then removing everything but the stdlib from sys.path should be fine.
I agree this is most likely, and would be a shame. But equally, it may just result in bugs being filed by those who have a need for it. (I added “hypothetically” to the quote because that’s how I read it - if most libraries are already ignoring the possibility that their lazy imports may not be lazy, that would be very good to know.)
The other problem is that performing an import on one module can create new lazy imports.
I think something that might work would be:
import sys
import types
def resolve_all_lazy_imports():
new_modules = sys.modules.copy()
seen_modules = set()
while new_modules:
for module_name, module in new_modules.items():
for name, obj in module.__dict__.items():
if isinstance(obj, types.LazyImportType):
module.__dict__[name] = obj.resolve()
seen_modules.add(module_name)
# New modules may now have lazy imports
new_modules = {
k: v for k, v in sys.modules.items() if k not in seen_modules
}
This would require pip to eagerly import all submodules which is expensive for a lot of commands, like pip -- help. Otherwise pip modules themselves would be targeted by arbitrary execution, which was the original security report and the use case I outlined in the security implications in the PEP.
Further this doesn’t help redistributors who debundle vendored libraries and expect to be able to access them via the sys.path.
import A.B first tries to import A (looking in sys.modules first) and then uses A.__path__ to locate B.py. So it only relies on sys.path if you haven’t imported A yet. You can clear sys.path in between and it won’t matter unlessB.py needs something that can only be located by going back to the global search path.
It’s not particularly obvious, but it’s how the import system works.
(And in case it’s not obvious, you can leave the stdlib on sys.path and _internal/__init__.py will be able to import logging and typing itself. I just didn’t want to write that whole calculation here.)
I think the issue here is that the problem pip has is that pip install can replace part of pip. The initial report involved replacing something in pip/_internal.
So the fact that import pip._internal still works is actually a problem, rather than a positive?
Sorry @steve.dower , I made a mistake which might have created confusion. The concern is pip’s own modules being hijacked, not so much the stdlib (I’m nto sure why though, assuming that’s also viable).
I’m suggesting adding an un-hijackable PipInternalModuleImporter (a Finder and Loader) to sys.metapath, before the nomal Path based finder.