PEP 690: Lazy Imports

fungi · May 11, 2022, 8:07pm

I’m not exactly clear on how that opt-out proposal would work in
practice. If I have to release a new version of my library to
indicate it’s not lazy-import-safe, then there’s no way to indicate
that all previous versions of my library are also not
lazy-import-safe.

This is similar to the argument against upper bounds for interpreter
version support: if you add a cap in a new release, how do you
indicate that older releases also didn’t support the new version? By
extension, you can’t really indicate that your old releases don’t
work with the version of the interpreter which introduced the lazy
import feature, as a work-around.

Time flows in one direction, which makes going back and updating
history to comply with recent standards rather challenging.

guido · May 11, 2022, 8:23pm

I would expect explicitly opting out to be more of a hint than a hard requirement. Presumably older versions would also fail but the error would be less clear.

As an analogy, a function that knows it only works with lists might have an explicit if not isinstance(arg, list): raise TypeError("requires a list") at the top – most of the time you won’t need to state this, but occasionally it’s helpful to save confused callers a bit of debugging time.

jack1142 · May 11, 2022, 8:29pm

Personally I don’t see how that’s any different from you just not supporting Python version 3.x (where 3.x would be whatever version this PEP would supposedly be accepted into) because you have not yet added support for it. Library users can’t just expect your library to always work with newer version without you working out support for it since Python may and does make breaking changes in new 3.x versions which may potentially affect you. The library just working with a newer version may be likely but should never be taken for granted.

zrothberg · May 11, 2022, 9:04pm

Development time. Now you are adding an entire layer of behavior that only exists at the top layer. It would be a nightmare to design tests around it from inside your library. It would just create fragile libraries. It would be much easier for each library to be able to define how to use it correctly. Context managers add minimal code, are easily comprehensible, and don’t require any kind of packaging inspection.

I do agree it feels like too many options but i don’t really see a way around it without missing a key requirement. Since each one is just a special rule of the base case I don’t think it would create a big technical implementation issue. Shallow_lazy is equivalent to deep_lazy(eager(module)). You realistically need the force ability otherwise it will be a long time until performance benefits are realized.

The biggest issue I see with a smaller implementation (purely all or nothing) is that it will fail to provide library level optimization in vain of SPEC 1. My version would have all of the behavior of SPEC 1 and allow for the desired behavior of this pep.

I don’t see it as being very different then say log levels which are easy to understand. I think anything less and you are going to be making serious tradeoffs in either how quickly this is adopted or how difficult that adoption is.

Would it feel less over engineered if it was one function that took an enum value?

with import.mode("lazy"):

brettcannon · May 11, 2022, 10:11pm

Last I heard from the team, they forked importlib.util.LazyLoader (I actually wrote it partially for Mercurial to ease their Python 3 transition).

steven.daprano · May 11, 2022, 10:22pm

No reasonable person expects libraries to guarantee that they will work with future versions of the interpreter using unimagined features into the indefinite future. That is asking for the impossible.

And if some unreasonably demanding user insists that you support Python 3.12 lazy import mode in the old version of your library written for Python 3.10, you can just say no, and close the bug report Will Not Fix.

fungi · May 11, 2022, 11:32pm

Yes, agreed that future-compatibility can never be guaranteed. I
guess the opt-out idea is that if your library depends on
import-time side effects and is breaking for applications which
attempt to lazy import everything, you can tell users to upgrade to
a newer version which declares itself explicitly lazy-import-unsafe
so that your lib will get traditionally imported instead?

And then the application can declare a requirement on that version
of your library or later… but if that’s the case, then the
application could also just not lazy-import the library too (and
thus continue working with older versions).

So I guess the opt-out is mainly intended for situations where the
users of the application are complaining to the library maintainers,
or the application maintainers are otherwise uncooperative and
insist on lazy-importing libs which lack support for it?

carljm · May 12, 2022, 2:16am

I tried to get at this point once above already, but I’m not sure it came through clearly, so let me try again.

What I think the discussions of “library opt-out” are missing is that “safe for lazy imports” is fundamentally not even a meaningful or coherent property of a single module or library in isolation. It is only meaningful in the context of an actual application codebase. This is because no single module or library can ever control the ordering of imports or how the import-time code path flows: it is an emergent property of the interaction of all modules in the codebase and their imports.

Every single Python module that subclasses a class from another module has visible import side effects: it changes the value of __subclasses__() on the parent class. So is it lazy import safe? Well, if you are being 100% conservative, no! We can try to be 100% conservative, and the result is that you just can’t use lazy imports at all (which won’t happen: whether PEP 690 is accepted or not, people will and already do use globally-enabled lazy imports via demandimport, LazyLoader, etc., and Python already supports them.) But in reality, it depends: is anyone looking at the value of __subclasses__() on that base class at a time when it isn’t populated yet? 99+% of the time, the answer is no, and the module is perfectly safe to use with lazy imports. But it’s a question that has no useful answer for the module in isolation: there is only a meaningful answer for an entire application in context.

Same is true (as I already outlined above) for libraries like click or pydantic. Their own modules are perfectly fine for lazy import. But they provide decorators / base classes that could be used in an application module in a way that makes an application module potentially need eager importing. So what does it even mean for a library like that to define itself or its modules as “lazy import safe” or “not lazy import safe?” How do the proposed APIs help this case?

I think the nature of the opt-out in PEP 690 is not well understood. It is not an exercise in categorizing modules into neatly-defined objective categories of “safe for lazy import” and “not safe for lazy import.” (If it were, the only possible answer would be that no module is ever fully lazy import safe.) Rather, it is a way for an application developer to say “in the context of my specific entire application and how it actually works, I need to force these particular imports to be eager in order for the effects I actually need to happen in time.”

This is why I don’t see much value in providing APIs to allow people at different points in the chain to nest both deep and shallow opt-outs and opt-ins. These APIs are mostly harmful, because they serve to take away control from the only person who can provide meaningful answers to questions of what actually must be imported eagerly, and that is the application developer who can test how their application actually behaves.

Take even the most obvious case you could imagine of “not lazy import safe” in a library: imagine some library that has some module that is imported only for its side effects, and must be imported before other things in the system happen (that don’t involve referencing names in that module.) (Set aside for the moment the fact that such a library design is already highly fragile even with eager imports: anyone can import your library lazily today by inlining the import and libraries already get no say over this.) You would think that this “definitely not lazy import safe” library is the perfect case for an opt-out. But the opt-out doesn’t even help this “obvious” case! A module earlier in the import chain leading to this module could still be lazily imported despite the library’s opt-out, and (in the context of the whole system) the side effects will still be lazy. (Exactly as can happen today with manually inlined imports.) Again this is an illustration of the same basic point: the overall application is the only context in which questions of import ordering and import side effects can be usefully resolved.

The one case in which it makes sense for a library module to explicitly opt out of a lazy import is if the library author knows that their own module B imports module A and the rest of the code of B directly depends on side effects from A (without referencing any imported names from A). Again, this is an already-fragile and I think rare case. But PEP 690 already provides all the tools a library author needs in order to handle that case (if they want to): put the import of module A, in module B, inside any try/except or a with statement.

@zrothberg I think the only way we can reasonably evaluate the need for the full matrix of context manager APIs you’ve suggested is to look at specific real world examples. Can you propose a real-world library that you think would benefit from these capabilities? Then we can look concretely at how that library could handle its needs with the existing APIs proposed in the PEP or with the full matrix of context managers.

carljm · May 12, 2022, 4:26am

Hi Eric, this is a great set of detailed notes, thanks! Lots of good notes in there on stuff we should address more clearly in the PEP, we’ll take those into account for our next draft. Just a couple things I want to comment on:

Library opt out

I think it would be unfortunate if we encouraged libraries to raise an error if lazy imports is enabled, since this makes it hard or impossible for an application using that library to try lazy imports. It’s a little better if the opt-out error doesn’t fire if that module has been top-level opted out by the application developer; then at least there’s a path for the app developer to still use lazy imports in the rest of the app. I’m guessing this is the behavior you intended?

I still think that if a library author is going to do anything at all, it would be just as easy and more productive to wrap a known problematic import in with importlib.eager_imports(): rather than throw an error. So I’d prefer for the PEP to encourage that instead.

Dynamic Paths

That’s an interesting idea in principle that seems reasonable cost. One challenge would be that I think only dicts have versions, not lists (AFAIK)? And I think the two things that would potentially want that monitoring (sys.path and sys.meta_path) are both lists. Probably introducing list versions is not worth it just for this.

I think if import errors from deferred imports have clear contextual information that the import was deferred, and where the original import statement was, that should really help in identifying the problem when it occurs.

Deferred Exceptions

I think we already do mention in the PEP that we will add “original import location” context to exceptions raised during a deferred import. Definitely seems important for debuggability.

carljm · May 12, 2022, 4:35am

Hi Marc-André,

If you agree that a CLI application developer should have the ability to try lazy imports throughout their codebase, including for libraries whose code they don’t control, then I think we are most of the way to agreement. (I think everyone is already agreed that application end users should not be opting in, and thus the env var is not a good idea and will be removed from the PEP.)

I’m not sure if you feel that application developers should always be required to opt-in module by module, rather than having a way to globally opt-in and then opt-out specific imports? If so, we do disagree there; I think that just makes it more difficult for application developers to use, without significant benefits, because (as I described above) modules don’t neatly fit into “safe for lazy imports” and “unsafe for lazy imports” categories, it depends on the context of the specific application. In practice the easiest way to figure out where the problems are is to just try your application (to be clear, by “your application” I mean one you are the author/developer of, you are trying lazy imports as a change to your own codebase) and see what doesn’t work.

I’m not totally opposed to the idea of providing a per-module opt-in API, if some people prefer that approach to adoption, as long as global opt-in and per-module opt-out is also available.

eric.snow · May 12, 2022, 4:43am

If the lib fails to import like I suggested, the application would have to use the PEP’s context manager around the import of the library. That’s it. The library is effectively forcing the application to acknowledge the incompatibility, with a simple, explicit escape hatch. IMHO, it’s a better user experience without much fuss.

sweeneyde · May 12, 2022, 5:12am

I haven’t read through the every word in this thread, so apologies if this is duplicate.

Could opt-in/opt-out somehow live in the importee rather than the importer? Presumably, only a particular module knows whether it will have relevant side effects that need to happen at import time. I’d imagine something like

my_module.py:

__lazy__ = 1

from big_module import do_some_computations
do_some_computations()

def function():
    ...

Standard REPL:

>>> import my_module # nothing happens
>>> my_module.function # imports big_module and runs do_some_computations()

carljm · May 12, 2022, 5:35am

Hi Dennis,

Modules in general don’t get to make decisions about when they are imported; the person writing the import statement does. The importer decides if they want to put the import inline or at the top of the module, or at the end of the module, or whatever. I don’t think that PEP 690 should invert that control.

If the import side effect is some initialization that just needs to run before stuff from the module is used, this can be silent and transparent and already works great with PEP 690 by default; it will just transparently delay that work until it is actually needed, as in your example. Pure win.

If the import side effect is a more global change that could affect the order in which the importer wants this import to occur relative to other imports and code in their system, then (with or without PEP 690) there is no substitute for this side effect being a part of the module’s documented contract, and it still should be up to the importer to control when that side effect occurs. PEP 690 just gives the importer some new options (to make the import lazy or not) in addition to deciding where in the code to place the import.

More practically, the difficulty is that if you want modules themselves to control whether they are lazy imported, then by definition imports can only be half lazy at best; there’s a non-trivial amount of filesystem work you have to do to find the module pyc file, read it, etc. If you always have to do that part of the work eagerly for all imports, over a large codebase it reduces the benefits of lazy importing quite a bit.

zrothberg · May 12, 2022, 5:46am

This is the only thing I think we are actually disagreeing on. While yes the application developer needs to be able to have final control (hence the full override part), I believe a library probably has a MUCH better idea then I do what parts of its system are lazy safe or not. I think it is harmful to force everyone to have to manage that themselves when using this. Especially for larger libraries where it’s unlikely the vast majority of end users know anything about how it functions or anything about its submodules. I think one of the main differences is that if a library has been updated to support lazy loading with context managers it would be obvious upon reading it. If it has been updated to support lazy loading without context managers its only obvious if the code is doing eager imports because of a clash.

I do not think there is every issues at the first level of importing a single module. I am under the impression outside of some silly stuff this is usually fine. Really a modules import statements are the concern. I also think this design gets rid of the main drawback to libraries having an opt out namely that it does not require examining of the content as the behavior is declared by the caller.

I will follow up over the weekend with some libraries that we can use to evaluate against. Off the top of my head some of the aws lambda stuff would really benefit from this. As right now you would just be shoving tons of very fragile inline imports. I do not think inline imports are a good thing and if there was an api to enable lazy importing being able to rid the language of the practice should be part of the value of it.

carljm · May 12, 2022, 6:01am

I agree with this, I just think the PEP already provides sufficient API for library developers who want to help their lazy-importing users to do so: I think the ability to shallowly force a particular import or set of imports to be eager is all that’s actually needed. I’m particularly skeptical of any “deep effect” context manager, for reasons already described in the rejected ideas section of the PEP.

I’ll look forward to examples demonstrating why that’s not adequate (If it’s easier to invent a real-ish simplified example motivated by a real use case, that’s fine too.)

zrothberg · May 12, 2022, 6:24am

I have a real library in mind but it’s an internal system so I cant use it as an example without sanitizing it.

malemburg · May 12, 2022, 1:01pm

I am surprised by the obviously unbounded optimism in this thread

Let’s take a step back.

Please read to the end before replying. This is a longer text.

Lazy Imports a great thing

I think that most people in the discussion regard lazy imports as a good thing, including myself.

Where we have different opinions is around the method of enabling such lazy imports and in valuing the possible problems which may arise from the lazy aspects of imports (exceptions, registrations, cache setups, etc. getting deferred as well).

The latter is important to get a good grasp on for deciding which approach to take for the former.

Import and lazy binding side effects

First, I think it’s important to agree on the fact that in Python, programmers expect a module to be readily usable after import. This is very much unlike in e.g. C where you’d always except to
have to call some kind of init function get have the library setup some global state.

With this in mind, it is clear that module import time side effects are in fact more common that not having these. The side effects could be loading DLLs and setting those up, reading config files or importing config files (in case the configuration uses Python as config language), registering codecs, import handlers, atexit functions, setting up loggers, event loops, creating large mapping dictionaries or long lists of constants / enums, etc. etc.

With the current eager default logic, all of these side effects happen deterministic at import time. You know exactly when and where in your code you have to catch possible exceptions. And you (usually) know how to deal with that at that point in time and that point in your code.

With lazy imports, the import itself will happen when the module is first used in other code. This could be for an attribute, a function or a class, among other things.

Example

Let’s take an example and assume the lazy import is for a config module, which loads a float value from the OS environment, say config.py:

import os
factor = float(os.environ.get('FACTOR', 1.0))

If the OS env var is set to a value which cannot be read as a float, a ValueError is raised by the config module.

With the standard import, this ValueError would happen when config is imported.

With lazy imports, this could well happen in other code, using the config value, e.g.

import config

def parse_float(value):
    try:
        return float(value) * config.factor
    except ValueError:
        return math.nan

The example shows that the import is no longer deterministic, it’s in fact data driven, since the first import of config would happen when the first float needs to be parsed by the code.

Now, if config.factor causes a ValueError as result of reading the OS env var, the error would bubble up the stack and cause the above function to return math.nan regardless of whether
value could be parsed or not.

The result is a silent data corruption – pretty much the worse thing which can happen in any data processing workflow.

Depending on how the lazy import logic is coded, the failed import could have various effects, e.g.

We have a half loaded config module in sys.modules and the next use of config.factor would trigger an AttributeError or NameError, looking odd to the programmer, since the function reads perfectly fine.
The lazy loader does not add the config module to sys.modules and makes another attempt at importing the module, again raising a ValueError.
The lazy loader marks the module as failing and returns e.g. an ImportError. This could then cause other weird effects further up the stack.

Mitigating problems

Now, if we know that config can fail in this way, we could mark this module as not-safe-for-lazy-imports (NSFLI) by adding the module name to the unsafe set, e.g.

sys.lazy_import_unsafe.add('config')
# only the module "config" matches

before importing the module.

That is, if we control the code with the parser.

Let’s assume this parser is part of a 3rd party package called fast_csv.

The package author could set the marker in the package init.py:

sys.lazy_import_unsafe.add('fast_csv.config')
# only the module "fast_csv.config" matches

or the author could mark all modules in the package as unsafe, because she is also playing other tricks which are not compatible with lazy imports.

She could then let the lazy loader know, by adding the whole package:

sys.lazy_import_unsafe.add('fast_csv.')
# the trailing dot indicates: any module with this prefix matches

Now, let’s assume that the package author doesn’t care about lazy imports, but your CLI has to use the package and you still wants to benefit from lazy imports.

PEP 690 only provides an all-or-nothing switch, so the CLI would not be able to use the fast_csv package.

However, with the more flexible approach shown here, you could get around this by putting the marker inside your CLI code after analysis of the fast_csv package:

sys.lazy_import_unsafe.add('fast_csv.')

Now, what if you don’t want to bother with all this and can accept longer loading times, e.g. say you are working on a long running server.

Then you’d simply switch off lazy loading for everything:

sys.lazy_import_unsafe.add('')
# the empty string matches all modules; alternatively, "." could be made
# to have this meaning

What if I want to use lazy imports for a specific module or package ? Well, we could have a second set, marking safe modules/packages:

sys.lazy_import_safe.add('utils')
# matches just the "utils" module
# unsafe matching overrides safe matching to be on the safe side :-)

Aside: The reason for using a set of strings and not regular expressions is to avoid the overhead of loading the re module for this purpose.

What am I trying to say with all this ?

The global switch solution proposed by PEP 690 is not adequate to solve real-life issues with lazy imports. The suggested solution should provide a better way to approach those real-life problems or at least hint at a better solution.

There are cases where you have to use packages which are known to not play well with lazy imports.

And it’s well possible that these are not marked by the maintainers as unsafe.
There are also cases where you may only want to have lazy load apply to specific modules where you know a lot happens at import time, e.g. a parser module compiling lots of regular expressions, a module loading a fairly large DLL which you don’t always need, a plugin which pulls in other large packages, but is only needed when enabled, etc.

Like in the above case, it’s possible that the code in question (e.g. the plugin) does not include a marker for being safe for lazy import.

What else can we do to make this safer ?

The problem mentioned above with unexpected exceptions causing changes to the flow of execution can be mitigated by having the lazy loader catch all such exceptions during he actual import and wrapping them into a new exception LazyImportError, which inherits directly from BaseException.

That way, false interpretation could not happen that easily and debugging would be simplified.

EpicWink · May 12, 2022, 3:05pm

In your example, what if the exception was made to be raised at the original import statement by the lazy-import system (by modifying the stack)?

barry · May 12, 2022, 7:51pm

This is worth reiterating, and I think you’ve nailed it right on the head @carljm

Library authors can’t declare their modules safe for lazy import because they have no idea how their libraries are consumed or in what order they will be imported. They can declare they modules unsafe for lazy import but I don’t think that’s actually helpful. As you point out, it leads to essentially every module being unsafe for lazy import in some circumstance or other, and besides that, as a library author myself, I don’t want to modify my code to declare anything about the laziness-friendly property of my library.

As an application author though, I know everything I need to know about what modules I consume, how they are imported, and whether they are safe or not. At least theoretically. Nobody is in a more advantageous position to understand the behavior of my application, and to make declarations about what modules can and cannot be safely lazily imported. And nobody else is in a position to actually test that assumption.

To me, the PEP gives the application author end-consumer the tools they need to build a lazy-friendly application.

sirosen · May 12, 2022, 8:22pm

I was just playing with adding a __getattr__-based lazy importer for one of my projects at $WORK .

One thing I see after toying with that: it is very optimistic to think that library authors will not need to do additional testing if they want to support lazy import semantics. I found that with getattr-driven laziness, there were import cycles which did not exist with the normal importer. (I don’t have a MWE yet; I can work on one later if someone asks.) That suggests to me that there are many differences between lazy and imperative import semantics beyond the obvious things like __init_subclass__-based registries.

Gregory P. Smith:

malemburg:
Instead of making this a global option, I think making it easy to use lazy imports on a case by case basis would be better, something like:
lazy import re
E.g. let’s say one of the functions in your module uses the re module, but all others don’t. In this case, a lazy import of the re module at the top of the module would make sense.
I don’t find this useful. Code already has a way to do this today: import re in the code right next to where it is used instead of as a top level import. Exact same effect.

Let me point out a case in which it is not the same: __init__.py for a package.

If I have a few modules in a package, I would love to be able to write

from .foo lazy import Foo, FooMeta
from .bar lazy import Bar

__all__ = ("Foo", "FooMeta", "Bar")

Today, to write this in a way which type-checks, I need

import typing

if typing.TYPE_CHECKING:
    from .foo import Foo, FooMeta
    from .bar import Bar
else:
    def __getattr__(name): ...

__all__ = ("Foo", "FooMeta", "Bar")

I believe that for library authors, this is a common case. I’d like to write a bunch of modules with normal eager imports, and then export them in __init__.py lazily. There’s a mechanism to do this today, but

it’s highly manual and error-prone
it comes into conflict with type-checking due to the dynamic nature of the lazy imports

Additionally, I disagree that deferred imports by putting the import into a function is equivalent or even nearly equivalent. Within teams, it is hard to communicate why an import is deferred, and hard to have any testing which would catch performance regressions. Let’s say I’ve been profiling my application and see that configparser takes a long time to import. So I isolate the functionality which uses it thusly:

def _get_parser(path):
    from configparser import ConfigParser
    parser = ConfigParser()
    parser.read(path)
    return parser

Now, someone refactors the code and adds type annotations:

from configparser import ConfigParser
import pathlib

def _get_parser(path: str | pathlib.Path) -> ConfigParser:
    parser = ConfigParser()
    parser.read(path)
    return parser

With annotations, the desire to place imports at the top of a file is stronger than without – it’s often needed in order to annotate signature lines. We can guard with if typing.TYPE_CHECKING, but this is more manual and complicated. With deferred imports today, the same imports are often listed twice.

So what ends up happening in practice is

def _get_parser(path):
    # NOTE: long explanatory comment
    from configparser import ConfigParser
    ...

Instead of

lazy import configparser
lazy import pathlib

...

I think the impact of allowing opt-in laziness on a case-by-case basis for type-annotated code alone would be significant.