PEP 690: Lazy Imports

carljm · May 5, 2022, 4:26am

Thanks for all the great discussion! Consolidating replies to a few different things that have come up:

Several people (@methane, @pf_moore, @notatallshaw, @layday) have observed that neither -L nor an env var are usable ways for a distributor of a Python application to turn this on (particularly an application distributed via Python packaging.) I agree this is a problem.

If there were a reliable cross-platform way for pip/setuptools to support creating script entry points with arbitrary Python CLI flags (as @jack1142 suggested) that would be really neat. I don’t know how feasible that is.

The other way we can fix this (as @methane suggested) is to provide a programmatic API (e.g. importlib.set_lazy_imports(True)) to globally enable lazy imports from that point forward. There’s no technical obstacle preventing this. What I don’t love is that it allows libraries to try to assert control over this setting, which I think belongs in the hands of the application developer / integrator. But it wouldn’t be Python’s first global config that’s programmatically settable and that libraries probably shouldn’t touch (hello there, entire import subsystem). I think we should add this API to the PEP.

Re suggestions from @brandtbucher and @itamaro about clearing the lazy lookup function from module dicts: this seems worth experiments once we have an implementation ported to 3.12. I’ve added it to the implementation todo list.

Re testing to see if your codebase remains import-cycle-free, if your main test suite runs with lazy imports enabled:

I don’t think there’s any need for eager_imports here; it would just look like one or a few smoke tests of your library, run without -L.

I agree with @brettcannon that while a library certainly could add -L as a full new axis to its test matrix, doubling the size of the matrix, in practice it will probably be fine to just run tests with -L (if you care to support it) and have just one or a few smoke tests verifying your library imports without -L.

There is an important question here also about what PEP 690 would mean for the CPython buildbots. I think since the effect of lazy imports is very much not platform-tied, one buildbot running the test suite under lazy imports should suffice.

This would be lovely, but I’m not sure how it could work. You can’t import both eagerly and lazily, and if you’re importing lazily it doesn’t seem possible to me to reliably detect that eager imports would have caused a cycle. I can imagine some best-effort approaches, but an imprecise approach seems worse in practice than just trying to run the code under eager imports.

I think you still might be overstating the real impact. In a library like pydantic or FastAPI, there may be some side effects of importing a module that defines some model classes, but you very likely also use those model classes directly in your code. As long as that’s true, they will be imported (maybe just a bit later than they would have been otherwise) and everything will likely work fine. Import side effects are not necessarily a problem for lazy imports; the real problem is modules that are imported only for their side effects, and nothing from them is otherwise used. This is not as common, since it also causes problems with e.g. linters complaining about unused imports. (The change in timing of the import can also be a problem in some cases, but not usually.)

In any case, if you play with it we’ll welcome your reports of what breaks and how hard it was to track down! That’s valuable data. Or experience so far (using this with a number of large CLIs with lots of dependencies) is that breakage, when it occurs, is not hard to to fix with a very limited set of module opt-outs that still preserves almost all the benefit.

(I’ll admit that I’ve had to track down enough ugly problems caused by the inherently unpredictable ordering of import side effects, even sans lazy imports, that I won’t be sad if one effect of PEP 690 is to apply some additional design pressure over time discouraging Python libraries from relying on them.)

Nodd · May 5, 2022, 3:10pm

What if the programmatic API was more a way to tell the import mecanism “all directly imported modules are compatible with lazy import” rather than “enable lazy import for the whole world” ? Let’s call it importlib.set_lazy_import_compatible(True).
It’s a bit weird that the parent module has to tell that the dependancies are lazy-compatible, but it’s the only way to know the information without importing the dependancies. The speed boost would be gained for each module declaring that its dependancies work lazily. The dependencies themselves don’t necessarily have to declare that they are compatible so you get some benefits immediately.
What I’m not sure about is if the flag should be recursive or not. Module A could say that it works when the imported module B works when imported lazily as a whole, but once the module B import is actually triggered, should it import its submodules lazily or not ? Does A have to guarantee that the whole import chain can be lazily imported ? If module B does not import its dependencies lazily it would be more retrocompatible, but gain less speed boost. Note that you would still have a boost on startup, since B itself wouldn’t be imported until later.

You could also set the flag after having imported non compatible sub-modules, to have a mix of compatible and non compatible submodules.

I’m not sure that what I wrote is clear enough, here is a made-up example of what I propose:

### main.py
import non_lazy_module  # This is a standard import, lazy module is not enabled yet

import importlib
importlib.set_lazy_import_compatible(True)
# From this point on, lazy module is enabled
import lazy_module  # This is lazy imported

import time
time.sleep(10)
lazy_module.func()

### non_lazy_module.py
import something  # eager import, since non_lazy_module.py was imported eagerly and didn't define set_lazy_import_compatible
print("No Ma, I'm not lazy")
do_something_weird()

### lazy_module.py
import module_A  # is this one lazily imported because main.py said so ?

import importlib
importlib.set_lazy_import_compatible(True)

import module_B  # This is lazy imported in any case since the flag is set

def func():
    print("I'm sooo tired")

brettcannon · May 5, 2022, 7:21pm

Since this isn’t widespread enough I don’t think there is a wide practice. My key point is the people using lazy loading now are not running their entire test suites twice for this.

Yep, it would be.

But to be clear, I put that discouragement in the importlib docs as I didn’t want people misusing lazy loading and having to be the only person to explain why their code didn’t act the way it used to. If this becomes a feature of Python itself then explaining the ramifications becomes way easier (i.e. not just me ).

It’s a possibility, but there’s zero chance of it if this doesn’t go in first and the community has a ton of experience with the semantics first.

There’s a shim that’s used on Windows and I don’t know how that is controlled which is the tricky one. But that’s a question for Packaging - Discussions on Python.org .

This also runs into the issue everyone is concerned about: some package being used which isn’t structured to work with lazy imports. It would effectively need to be a context manager in order to avoid side-effects (heh) from flipping the setting on in some random bit of code.

That’s what I would assume we would do, just like our e.g. ASAN buildbot; just another flavour of how we run the test suite.

I’m not even sure how that would work since execution already began of that module before you hit that line. Having such a state change in the middle of execution would be very difficult to tweak since you’re in the middle of exec() at this point.

pf_moore · May 5, 2022, 7:25pm

There’s currently no way to add a command line option to the Python invocation in the shim. One could be added, obviously, but I’ve no idea how much work that would be. And it doesn’t help when people run the code using python -m app, as I mentioned before.

Nodd · May 5, 2022, 7:38pm

The line is hit because this module is already being imported or executed. In my example it would be main.py. The lazy import concerns the modules imported inside this module (lazy_module in my example).
The state change would impact the import mechanism from this point on, while the module is run.

kpfleming · May 5, 2022, 7:39pm

I’m really not sure I should even be saying this, but I have experience in other domains where this problem occurs, and in those domains the APIs of this type are ‘set once only’. They cannot be changed after being set. In Python you could go even further and restrict the API to only being callable from __main__, with an exception thrown if it gets called from anywhere else.

I freely admit these are inelegant ways to tackle the problem, and are definite layering violations, but they are options

carljm · May 6, 2022, 5:38am

The intended usage of PEP 690 is that the application developer flips on lazy imports globally for their application and then may set some per-module opt-outs if some module doesn’t work well with it. I’m not aiming here to address any complaints about that model or provide any kind of more complex or granular opt-in. I’m just trying to provide a third way to flip it on globally (just as you would with -L or PYTHONLAZYIMPORTS=1) that’s usable for applications distributed with pip, since they don’t have any way to provide CLI flags or env vars for their own execution. This programmatic option would be intended to appear once in an application’s main module and really nowhere else. Much like, say, installing a custom import loader.

If the effect of this localized opt-in is not transitive, then I think it becomes too difficult to get value out of lazy imports in practice; see this section of the PEP.

If it is transitive, then I think this adds unnecessary complexity to the mental model and makes it quite hard to reason locally about which imports will be lazy and which will be eager. See this section of the PEP. It keeps things much simpler if there is just a single global “lazy imports are on” and then the per-module opt-outs that the PEP already describes.

Yes, you’ve perfectly described the intended use case for this programmatic API (an application sets it once in __main__.) But I’m not sure we need to enforce that restriction; consenting adults and all that.

malemburg · May 6, 2022, 8:55am

I have used lazy imports in the past and they work great as long as you know what you’re doing, i.e. you explicitly lazily import a module or package.

However, given that Python is not a side-effect free language, there are too many ways this can break in larger code bases when used globally.

Whether a code base is safe w/r to lazy imports is hard to test and what makes things worse is that the import exceptions can suddenly pop up in code which was not written to handle ImportErrors (or other errors happening as a result of the import).

Instead of making this a global option, I think making it easy to use lazy imports on a case by case basis would be better, something like:

lazy import re

E.g. let’s say one of the functions in your module uses the re module, but all others don’t. In this case, a lazy import of the re module at the top of the module would make sense. Alternatively, you can import the re module just in that function, but that hides away a module dependency, which is not always good style.

By putting the lazy import at the top of the module, everyone reading it will immediately know that the module will be used lazily in that module and can take appropriate precautions when writing code in that module.

You know: explicit is better than implicit…

notatallshaw · May 6, 2022, 10:37am

I have one last question about entry point scripts. A motivation for this PEP is to make start CLI tools have faster start-up, currently if I look at an example like rich-cli you call it like rich --help.

Would there be a way for such tools to force lazy importing or is it going to always require the user to set environmental variables or call the module without using the entry point?

keosak · May 6, 2022, 10:59am

A pattern in our company code is to define a base class for all events. Our RabbitMQ messaging service looks at the set of all subclasses of this base class to figure out which events can be received, and sets up deserialization for them. It essentially maintains a mapping from class names to classes, and finds the right class after receiving a message using the value of its message type property.

If I understand the proposal correctly, enabling lazy loaded modules would break this completely. At the point where the messaging service starts, not all event classes (if any) would be imported.

This goes for any kind of registry that needs to do something with its contents before the program starts, whether it’s based on subclassing or decorators.

I don’t mind as long as lazy imports are optional, but I wonder if the extra complexity is worth it.

# ===============================================================
# model.py

import messaging

class WidgetCreated(messaging.Event):
    pass

# ===============================================================
# processing.py

import messaging
import model

def dispatch(event: messaging.Event):
    match event:
        case model.WidgetCreated():
            print("Widget created")

if __name__ == "__main__":
    # Here the model.WidgetCreated class has not been accessed,
    # so the module hasn't been loaded, and therefore messaging
    # has no idea that this class of event exists.
    messaging.consume("widget_queue", on_message=dispatch)

jack1142 · May 6, 2022, 11:05am

Potential solutions have been mentioned above:

Carl Meyer:

If there were a reliable cross-platform way for pip/setuptools to support creating script entry points with arbitrary Python CLI flags (as @jack1142 suggested) that would be really neat. I don’t know how feasible that is.

The other way we can fix this (as @methane suggested) is to provide a programmatic API (e.g. importlib.set_lazy_imports(True) ) to globally enable lazy imports from that point forward. There’s no technical obstacle preventing this. What I don’t love is that it allows libraries to try to assert control over this setting, which I think belongs in the hands of the application developer / integrator. But it wouldn’t be Python’s first global config that’s programmatically settable and that libraries probably shouldn’t touch (hello there, entire import subsystem). I think we should add this API to the PEP.

Maybe I missed one of the ideas but you should be able to find them if you look through this thread.

takluyver · May 6, 2022, 2:00pm

I think this misses an important point, and it’s a smaller version of the same oversight that made the Python 2 → 3 transition so painful: library authors. All those of us who write code to be imported and run by other people, and therefore don’t get to control the environment in which it’s running.

Realistically, we won’t get to tell everyone that if they want to use our library they can’t use this new lazy import thing that Python just added. Especially as it’s meant to make startup faster, and performance tricks always get cargo-culted to people who don’t want to think about what they mean (one weird trick to make your Python scripts start 70% faster!). Within a year or so of releasing a version of Python with this option, we’ll probably have to ensure our libraries and examples work with and without it. I’m sure we’d manage, but please remember that opt-in features for application developers aren’t really optional for library developers.

Features like __init_subclass__ make it really easy for code to have non-obvious side-effects, like a parent class keeping track of subclasses that have been defined. And ‘import this module to add something in it to a registry’ is a moderately familiar pattern, e.g. hdf5plugin is an example that springs to my mind.

I know this is already in the ‘rejected ideas’ section, but I’d much rather see something like this be opt-in at a module level - some way of declaring that a module has no import-time side effects (or perhaps none that matter) and so it’s safe to lazily load it, and maybe also load it in parallel with other stuff. The main argument against that is that application developers can’t get a big speed up straight away - but I think there would still be a significant impact in the longer term as modules opted in to this.

notatallshaw · May 6, 2022, 2:22pm

I had read those but I had not internally reconciled that the PEPs motivating example is about CLIs but in Python’s current state CLIs would not be able to take advantage of this feature without upfront work from the users of the CLI but the PEP does not mention this. I guess I was just being a bit slow and others are already aware of this.

jack1142 · May 6, 2022, 2:37pm

I definitely think it’s a good point, it’s just that your original question made me think that you didn’t see that lack of such a feature has already been brought up. Sorry about that.

gpshead · May 6, 2022, 6:12pm

Marc-André Lemburg:

Instead of making this a global option, I think making it easy to use lazy imports on a case by case basis would be better, something like:
lazy import re
E.g. let’s say one of the functions in your module uses the re module, but all others don’t. In this case, a lazy import of the re module at the top of the module would make sense.

I don’t find this useful. Code already has a way to do this today: import re in the code right next to where it is used instead of as a top level import. Exact same effect.

The startup time benefit of lazy imports only comes from enabling them broadly, not by requiring all code in your application’s entire transitive dependencies to be modified to declare most of their imports as lazy. We do not want people to need to learn yet another import syntax to replace what we already have.

gpshead · May 6, 2022, 6:37pm

Somewhat agreed. I think per-module-opt-in being in Rejected Ideas is might need a rethink. I realize per file opt in is not a feature that most would use initially as it requires syntax incompatible with earlier Python versions (we could devise a syntax that avoids this problem). But the ability to tag a source file as “always lazily imported” and have the interpreter treat it as such upon import (the pyc is opened and a lazy tag is found before anything is even unmarshalled - work stops there) seems potentially useful for modules an application may have a lot of like generated code (hello protobufs…). It also allows for benefits even in applications that don’t themselves try to aim for full-lazy.

A practical problem with per-module-opt-in however is transitive. If a module declares itself lazy, it would in effect declaring all of its transitive imports as lazy-ready as well. Yet it cannot know this and must not speak for others. They won’t be executed unless something else non-lazy imports them. This is where it gets weird. We could create half-lazy semantics for this scenario where the laziness only applies to everything but top level import statements in a module: Something like the pyc is opened and a lazy tag is observed, coupled with a list of transitive imports to execute immediately. Filesystem operations to resolve all these imports and check their own laziness (recursively down the tree) happens but latency inducing work stops there.

carljm · May 7, 2022, 3:01am

I recognize this (providing a new semantics for imports that all Python authors now have to account for as a possibility) as a significant drawback of the PEP that has to be weighed against its benefits. I think that only permitting module-by-module opt-in destroys too much of the benefits, though. So we have to weigh the costs against the benefits.

If a library author knows that certain imports within their library must always be eager for the library to work, there are workarounds described in the PEP, some of which would be usable on all Python versions (even those from before the PEP implementation is merged). Placing imports within a try or with block fully protects them from being made lazy. Adding this type of “protection” against import laziness where needed is admittedly still some burden on library authors, but it’s not like library authors are left without any reasonable recourse.

I think we should not go down this route. This style of half-laziness (where we must go through all the module-finder code and open the pyc file before we discover that this import should be lazy – or even worse go through all transitive imports and open all their pycs!) adds a lot of new complexity to the implementation and significantly reduces the speed benefits.

Lazy imports is a change in semantics and it can break things. If it doesn’t work for your program, opt modules out as needed until it does, or don’t use it. I don’t think it is a good tradeoff to make lazy imports less useful in order to provide “protection” that still won’t be able to prevent the possibility of breakage.

If we need a per-module opt-in in addition to a per-module opt-out (I’m not convinced there’s a strong case for it), I’d prefer to do them both in the same style as the currently proposed opt-out API in the PEP (set_eager_imports) which allows a callback to decide based solely on module name. This still gives control to the application developer, allowing them to opt-in libraries that haven’t opted themselves in – I think that’s a good thing. And it doesn’t require new syntax or opening pyc files.

Correct. You would either have to use the options described in the PEP to make sure the imports of your subclasses are eager, or not choose to use lazy imports with this codebase.

This type of system is already fragile, because many Python linters will suggest removal of “unused imports,” and removing an apparently-unused import can already easily break this setup.

The existence of some patterns that don’t work with lazy imports doesn’t negate the significant value it can provide for many other codebases where it works fine.

methane · May 7, 2022, 4:54am

I think environment variable is harmful for per-app opt-in.

If we provide PYTHONLAZYIMPORTS envvar for opt-in, users might write export PYTHONLAZYIMPORTS=1 in their .bashrc or .envrc.

Then it would affect many tools written in Python.
Users may see random errors and unexpected behaviors (e.g. subcommand disappeared).
They might report issues to tools without enough information about how to reproduce.

This is not good for Python ecosystem.
Commandline option + Python API + C API would be enough for per-app opt-in.

gpshead · May 7, 2022, 3:45pm

I think an argument that people could shoot other programs in the foot by setting an environment variable is overblown. People can already shoot themselves in the foot by setting our existing environment variables. PYTHONPATH
PYTHONSITECUSTOMIZE
etc…

This is what the -E flag to the interpreter is for.

bernatgabor · May 7, 2022, 6:50pm

As a CLI tool writer, I’d like if this feature would be enabled via a __future__ flag, not just an environment variable. And I’d accept libraries to be added this if they truly function in a lazy way and have no import side effect. Wide adoption might take some time but I feel would make this a more stable and usable feature than the environment variable that’s a global force flag that’s not guaranteed to work.