PEP 690: Lazy Imports

methane · May 8, 2022, 5:40am

But PYTHONPATH and PYTHONSITECUSTOMIZE have per-site/per-environment use case.
Some tools sharing one Python environment would need it.

On the other hand, I think this feature should be opt-in per tool, not per site/environment.

zrothberg · May 8, 2022, 6:48am

I think this is really the only way to make this work short and long term. Short term enabling it programmatically is going to be much easier to start small and slowly build it up. As well as this enabling the ability to do this to optimize your own module without end users needing to do anything. Long term I don’t think the current use of it is representative of how it would behave in the wild. Right now only Meta’s team knows that it is happening so library authors aren’t dealing with it. It is not hard to envision, without a cooperative model, library authors opting to just force eager imports to avoid breakage.

These really highlight that there realistically needs to be a way to globally override behavior and force the system to auto lazy import everything in at least a downward fashion. Long term, without turning into version pinning hell, it’s not hard to imagine situations where incompatibility arises that requires monkey patching egear imports to fix. The user can always import that packages and short circuit the behavior so allowing them to disable eager imports globally would allow this to work correctly.

It may be because I spend a lot of time coding with the type hints but I feel like random import exceptions are already a problematic issue. Just to deal with circular reference issues you end up inline so many import statements that stuff blows up at seemingly random places. If lazy importing can really fix that problem I think it’s a no brainer for any of the code bases that use type hints for programmatic purposes.

For the purposes of keeping those items in mind I believe we would need something more complex than an on/off switch value. I believe we would need to be able to declare levels of laziness for which the interpreter is in. Setting aside any technical implementation issues something along the lines of the following.

with import.shallow_lazy:
   # toplevel is lazy but module controls its own behavior once imported
   import boto3

with import.deep_lazy:
   # toplevel is lazy and module is also lazy. 
   import requests

with import.eager:
    # still imports eagerly even if called in deep_lazy context
    import numpy as np

with import.force_lazy:
    # ignores all context handlers and everything is lazy
    import pytorch

Shallow and deep should behave the same for packages that have been updated to support lazy contexts. For those that have not shallow allows us to still gain advantages without having to toss out lazy imports entirely. Lazy importing a single module should be identical to for the most part with non top level imports so I imagine it’s almost always safe to do. Deep would behave the way the spec does now lazy loading any non context managed imports and working with eager to not prevent breaking.

Force lazy would be the overriding option that will definitely break stuff but allows the calling program to control behavior. This would allow for monkey patching as well as any other things that may arise.

This would prevent the need for us to signal in advance if a package is safe for lazy loading. It also creates a nice path for libraries to take to upgrade themselves to lazy loading without forcing the issue. Further it provide a solution to the adoption issue and legacy code. The top level program can behave the way Meta is currently using it and packages can upgrade themselves without either breaking.

If it does fix the circular reference issues that would provide a natural point for teaching this and would really help adoption as its a common enough issue. It would seem easy enough to provide a something similar to pdb that can be run from command line and create a context around the script triggering this behavior for legacy code situations.

As long as the contexts are non strict supersets of previous ones it shouldn’t ever cause an issue where a package should have been lazy loaded but was eager loaded. The opposite just requires moving between categories or doing manual eager imports to prevent. Breaking on force lazy would be expected and an assumed side effect of using it.

steve.dower · May 9, 2022, 7:19pm

And they already do, and it would be great to avoid making that worse.

Also, as @methane points out, most of our other variables are reasonable to set once per session (which is basically the only option on Windows). The proposed variable is very much per-invocation. ^[1]

Another example that has tripped me up is the PYLAUNCHER_DEBUG variable, which increases the output from py.exe but also interferes with the CPython build when it uses a different instance of that tool. ↩︎

steve.dower · May 9, 2022, 7:29pm

I’m -1 on this right now.

It seems we either need it to be so transparent that libraries don’t have to design around it, or so explicit that both apps and libraries can use it without impact.

The latter case is adequately covered by function-level imports, which is the status quo, and so for this to be worthwhile I think it needs to be near-magical - certainly enough so that it can be on by default. It doesn’t sound like it is, and I personally don’t see a path to get there.^[1]

I’m assuming that any per-module opt-out mechanism would require reading the file on import, which is going to reduce the benefit so much that it isn’t worthwhile. And it’s unlikely to be easier to figure out than function-level imports. ↩︎

gpshead · May 9, 2022, 9:41pm

The proposed feature seems very much per application installation. Where I assume “application installation” implies a fixed controlled set of code that was tested as good with the lazy import feature enabled. That suggests not having an environment variable trigger at all given that environments tend to exist way beyond any given application.

The context manager import suggestions are interesting. Though I wonder if the plethora proposed by Zachary is over-engineering it by suggesting four different behaviors (do we really want to support all of those?).

When you think of this as something to turn on for a specific controlled application, you already assume control of the main entry point .py file which is where such a declaration of enabling the feature belongs (be it in a #! line -flag or an import context manager or some other mechanism - the point being it exists in this .py file).

carljm · May 9, 2022, 9:51pm

I think the question of what this would mean for libraries has been a bit vague in this thread, and would benefit from more concrete examples. Specifically, I don’t think the idea of libraries “opting themselves in” to lazy imports, which has been mentioned a few times, is actually relevant or helpful to the real use cases I’m aware of.

If we consider a library like click or pydantic, the typical case is that the library provides a registration decorator or a base class, and application developers will apply this decorator or subclass from this base class in their own code. This means that whether the library opts in to its own modules being lazily imported is beside the point. The library provides a utility which has the potential to add import side effects to a few of the application developer’s own modules, which may require explicit eager importing for the side effects to occur in time. So the application developer, who is choosing to explicitly enable lazy imports for their application to make it start up faster, may have to also eagerly import one or two of their own modules; all of this is in their control.

So I don’t think we need the complicated set of controls that @zrothberg proposed. I don’t see practical problems solved by that complexity that aren’t already addressable with a global opt-in and a per-module opt-out controlled by the application developer, which is what the PEP already proposes.

If anyone is aware of a library that relies on import side effects within its own code such that lazy import of the library’s own modules would break the library, I’d be interested to take a look at that case!

For a little context on prevalence in practice, the Instagram Server codebase is multiple million lines of code, uses lazy imports applied globally, and has precisely five modules opted out.

carljm · May 9, 2022, 9:53pm

I’m sympathetic to the argument that PYTHONLAZYIMPORTS would be worse here, as there’s a more obvious motivation for people to try to apply it globally to their shell (“make my Python CLIs run faster!”) than for the other existing env vars you mention. And I don’t see a clear-cut need for the env var for the intended use case of application developer opt-in. I wouldn’t personally have a problem removing the env var in favor of just -L and programmatic opt-in.

steve.dower · May 9, 2022, 10:39pm

The example that comes to mind for opting-in is this one (dataclasses → re → enum), but I’m inclined to agree that library opt-in isn’t an important piece here. If it matters enough, it’s easy for the library to import explicitly in a function (and re should be good enough at caching that pre-compiling expressions is rarely a huge benefit, so it can just be imported when needed).

Opting-out is my bigger concern. Who is supposed to do the opt-out that triggers mimetypes.init() or protects against sys.path modifications when those are buried 2-3 levels deep in someone else’s library? The application developer obviously owns the final responsibility, but realistically their only available mechanism isn’t going to be very fine-grained (other than filing bugs against library devs, which is what we want to minimise).

@carljm Do you have a sense of how deep the lazy imports need to be to get the benefits you’ve seen? Or at least most of them? My own experience optimising CLIs has suggested that it’s quite diminishing returns after your “top level” imports - the ones that are used directly by command implementations, which seems to be what is implied by some references to __main__ above.

Would requiring CLI developers to use an equivalent of with import.shallow_lazy around just the imports they control offer all/most/some/none of the benefits? ^[1]

I would have guessed “most”, and that the rest could be helped with a few updates to libraries, rather than implicitly switching all imports to this behaviour, but would buy a few counterexamples of libraries with many transitive imports but non-overlapping sets of functionality (such that you can use the library for something useful while only needing some of those imports).^[2]

And could it be spelled with importlib.util.lazy(): and be implemented today…? ↩︎
One such example is the inspect module, which doesn’t seem to need both enum and importlib.machinery for the same scenarios. Is this common? Should it be encouraged at the cost of less predictable side-effects, or should we instead be encouraging separation of distinct functionality as a performance (and arguably, readability and maintainability) improvement? ↩︎

pf_moore · May 9, 2022, 10:51pm

My concern is more that as a library developer I have no intention of even thinking about whether my code is “lazy import safe”. I just write “normal” Python code, and if my test suite passes, I’m done. I don’t particularly want to run my test suite twice (with and without lazy imports) and even if I did, what am I supposed to do if something fails under lazy imports? The fact that it works under “normal” imports means it’s correct Python, so why should make my life harder by avoiding constructs just to satisfy an entirely theoretical possibility that someone might want to import my code lazily?

How would an application developer know if it’s safe to enable lazy imports, without asking library developers to confirm whether their libraries are “lazy import safe”? And what happens when library authors don’t know?

Having said this, I do agree that it’s likely going to be rare that libraries won’t be lazy import safe. So maybe just assuming the best will be sufficient.

malemburg · May 10, 2022, 3:21pm

What happened to “explicit is better than implicit” ?

Python should not assume that libraries are lazy import safe. The libraries should tell Python that they are and have been tested in that kind of setup.

There are many libraries out there which register e.g. import hooks, codecs, atexit functions, apply monkey-patching, etc. All those libraries are intended to have import side effects by design and require to be imported early to make sure their side effects are in place before other modules are imported or functions are called.

A general lazy import would also break all those libraries which provide optional C acceleration modules or test the environment for other possibly installed libraries, since the try-import would no longer raise an ImportError.

I don’t things that the fact it works with one application (the Instagram Server) can be used as proof that it works in every situation.

That said and also because we need to maintain backwards compatibility, having lazy imports work is the special case and needs to be marked explicitly. Not the fact that a library is not compatible with lazy imports.

carljm · May 10, 2022, 3:26pm

Hi Steve,

It’s the responsibility of the application developer to opt out modules as needed to make their application work with lazy imports. The PEP proposes an importlib.set_eager_imports(...) API for them to do this, which can take a list of module names whose imports should be eager, or a callback that can decide based on the dotted module name. This allows the application developer to granularly opt-out any modules as needed, their own modules or library modules.

This is also discussed a bit in the PEP text. It’s generally true that if you are hand-optimizing a CLI to get the fastest possible --help, you can do that by moving imports around in your __main__. The problem in practice for an actively developed application is that this shallow hand optimization of imports is extremely fragile. One mis-placed import by a naive contributor that isn’t caught in code review can easily destroy the optimization entirely. And sometimes this isn’t even a mistake: it’s an import that is genuinely needed “early,” so now you have to go carefully sift through all of that import’s dependencies and make them lazy by hand: a simple change to the code quickly becomes not-simple.

The attraction of PEP 690 is that you no longer need to do this careful hand optimization at all, nor build a bunch of CI or code review infra around making sure that nobody ever places an import wrongly: instead you just always pay for exactly the imports you use, and no more.

carljm · May 10, 2022, 3:28pm

Hi Marc-André,

This is in the PEP. Imports within try/except or with statements are never made lazy, precisely for this reason (errors triggered by the import should be catchable.)

carljm · May 10, 2022, 3:44pm

Such libraries have no way of ensuring that today.

I think specific examples will be more useful, because the details matter.

For instance, a common case is “import side effects that must occur before any functions or classes from this module are used.” That case has no problems with PEP 690, in fact if the import side effect is expensive, PEP 690 is excellent for that case.

malemburg · May 10, 2022, 3:51pm

That sounds like a very fragile rule.

The failing import could in fact be a lazy top-level import somewhere deep inside a package that you import with try-except, so the eager import you are forcing at the higher level doesn’t protect the lazy import which is done a few levels deep and the ImportError would not bubble up the stack.

Lazy imports are good way to achieve better startup time, but this feature should not be a global setting of the interpreter.

Modules need to be made compatible with lazy imports and marked as providing that functionality. We cannot just have the user hope that everything works, only to find that e.g. a codec is missing because the import got deferred.

carljm · May 10, 2022, 3:51pm

I think that’s entirely reasonable, and a fine position for a library developer to take.

Totally recognize also that a library developer taking that position might get user complaints about it, and this is a significant cost of the PEP.

carljm · May 10, 2022, 4:08pm

In theory, sure. In practice I’m not sure I’ve ever seen a try/except around an import that wasn’t intending to catch the non-presence of the immediately imported module. Do you have a real-world counter-example?

I think a key question is whether the robust startup time and memory benefits of PEP 690 are realistically ever achievable without allowing the application developer to control it globally for their application. There is a collective action problem here. There is very little incentive for any given library to bother marking themselves as “lazy import safe,” especially the many libraries which receive only bare-bones maintenance at all. And there’s less value for an application developer in lazy imports if none of their libraries can be lazy imported.

Also, as I already discussed above, the idea of libraries marking themselves “lazy import safe” is itself overly simplistic and not sufficient to guarantee to an application developer that nothing will break with lazy imports.

I don’t think there is any way to provide lazy imports in Python that can ever be 100% transparently guaranteed to not break anything when adopted, so if that’s the bar for an opt-in feature, we may as well just say that we will never provide it. (Which would be odd, considering that the standard library already provides global opt-in lazy imports via LazyLoader, with fewer benefits but all the same potential problems that have been discussed in this thread. Should it be removed? Would PEP 690 be more acceptable if it were simply an improvement to the already-existing LazyLoader to reduce its overhead and allow it to make a few more styles of imports lazy, but installing LazyLoader was the only way to opt into it?)

Or we can choose the PEP 690 approach, which is to provide maximum potential benefits immediately to any application developer, with a clear caveat that it is a change to semantics and the application developer adopting it is responsible to ensure their application still works correctly with it. This seems very much in the “consenting adults” spirit.

malemburg · May 10, 2022, 4:55pm

I think we should focus on Python startup time, not application startup time. The latter is not something we control as core devs. The former is.

If we make (and mark) several of the commonly used stdlib modules lazy import compatible, this will improve startup time where it is needed the most: simple scripts which are run often.

Applications tend to be longer running and can easily be designed to defer imports to the time the functionality is actually needed (e.g. via plugin mechanisms or import hooks). If you control the application, you can do the same and mark your modules as lazy import compatible to benefit from the automatic logic.

The key problem kicks in when you don’t control the code, but it gets added as a dependency. In those cases, a global switch puts you at risk of causing failures in the most odd places, or what’s worse: cause silent data corruption because e.g. a global decimal context setting wasn’t applied as intended by a library author.

Of course there is: The programmer in control explicitly marks his/her code as lazy import compatible. Python picks up the marker and defaults to lazy import of that module.

Aside: I could also imagine async imports and having an await in the top-level module code to defer further setup until a module attribute is requested, but that may prove to be problematic with threaded code or async code using custom event loops.

carljm · May 10, 2022, 5:29pm

I don’t think this approach provides transparent safe adoption either. If module foo has import side effects that really must be eager, and module bar imports foo, and baz imports bar (continue as deep as you like), is it the responsibility of every module author in that chain to realize that because they import something not-lazy-safe, they have to explicitly mark themselves as not-lazy-safe also, even though their module may be very simple? If they don’t, then the import of foo may be delayed simply because import of baz is delayed.

As mentioned above, I think the frequency of “import side effect that must occur even if nothing else from the module is referenced anywhere” is likely overstated in this thread, but if we are concerned about that case, per-module explicit opt-in does not make this safe.

malemburg · May 10, 2022, 5:47pm

That’s why it’s so important to have the code owner mark the modules lazy import compatible. The stdlib is a system of modules which is under our control and so we can make sure that things work out fine. This will result in better startup time for Python, helping everyone.

As an application designer you can also group eager imports in a way that makes sure they get imported before the lazy imports. For packages where you don’t know whether they are lazy import compatible, it’s better to opt for the eager import. For ones which you control, move all the foo module imports to the top level in your application, so that they do get (eagerly) imported, even when your application baz module is imported in a lazy way.

By making these things explicit, we have the application designers make conscious decisions, which is a lot better than having an application user simply flip the switch, not knowing what the implications might be.

I agree that import side effects are not as common as their mentions in this topic may imply, but simply ignoring them and using a global switch will also not result in a stable system.

pf_moore · May 10, 2022, 5:58pm

How is the application developer supposed to know they need to opt out? In smaller organisations, or open source projects, it’s unlikely that projects will have the resources to test the various interactions thoroughly. Which brings us back to pressure on library developers to state that their code is import safe.

More so than a blanket assumption that it’s OK to lazy import everything? This feels like an attempt to provide an automatic “go faster” switch. Even if you say that’s not the intention, that’s what people are going to see, and how they are going to assume it should be treated.