PEP 690: Lazy Imports Again

Kronuz · October 3, 2022, 4:09pm

After a lot of engagement in our previous discussion topic about Lazy Imports (with over two hundred comments!), I’m presenting an updated proposal of PEP 690 - Lazy Imports.

We have (hopefully) considered and addressed each and all of the suggestions in the previous discussion thread, by either providing rejection reasons or improving the API and implementation. Some examples of things we’ve modified are: a per-module opt-in, specifically designed to address some use cases in Scientific Python libraries (SPEC-1), a way to enable Lazy Imports directly from the API (i.e. importlib.set_lazy_imports()), we rejected wrapping deferred exceptions in LazyImportError, improved verbose mode (to see in real time which modules are being lazy loaded), among many other things.

Additionally, I present an updated reference implementation on top of the CPython Main branch in the form of a Pull Request in my own CPython fork, that has all the bells and whistles of the PEP, so you can all start trying and playing with Lazy Imports right away!

I’m glad we have had so much engagement from the community, and I thank you for all your comments and suggestions. This is an exciting ride and I can’t wait to see more feedback, thoughts and comments from all of you. Please join me in seeing what comes next in this amazing journey!

effigies · October 10, 2022, 6:29pm

Thanks for this. I have a question that’s more about the side effects of lazy loading; I tried to find it in the PEP, so apologies if I missed something, but what happens if I lazily import a missing module? Does the ImportError occur at import time or when I first try to access it?

The reason I ask is that the Scientific Python lazy_loader defers the error until an attribute is accessed. When combined with the current eager semantics, this means I can do something like the following:

import lazy_loader as lazy

import required_dependency
optional_dependency = lazy.load('optional_dependency')

Here I immediately get an ImportError if required_dependency is missing, while I get a deferred error if optional_dependency is missing. This is a nice semantics, at least in the context where my library is eagerly loaded, as you find out right away if your environment is broken and only when you try to access optional functionality if the optional dependencies are missing. I haven’t thought through whether it’s still desirable to have different semantics when everything is lazily loaded, but it will help to know what the expected behavior will be.

Kronuz · October 11, 2022, 11:35pm

Hi @effigies, the behavior with Lazy Imports is we get the ModuleNotFound error when the name is accessed. If required_dependency doesn’t exist, you will not get an error until you use it.

Kronuz · October 11, 2022, 11:52pm

I ran a full suite of pyperformance benchmarks to see how badly Lazy Imports affects CPython performance.

In the experiments I ran, I contemplated three scenarios (all on top of CPython 3.12, Main branch):

Vanilla vs. Lazy Imports Implementation Disabled - To see if Lazy Imports (if it is disabled and not used) brings any penalties to CPython performance.
Vanilla vs. Lazy Imports Implementation Enabled - To see if it brings penalties to those wanting to use Lazy Imports but are concerned about giving up on any Vanilla CPython performance.
Lazy Imports Implementation Disabled vs Imports Implementation Enabled - To see what would later be the penalty to someone not yet using Lazy Imports who wants to start using it.

I used a dedicated Amazon EC2 Bare Metal machine with 72 logical cores. The results are in the following link in my CPython fork. For those interested, you can also find the .json files produced by pyperformance in the /lazy_imports-pyperformance directory in the lazy_imports-pyperformance branch.

Please, share any feedback, comments or concerns. Thank you!

Edit: I re-run the experiments and updated in the link. I’m terribly sorry I did mess up the previous run. Thank you for spotting the possibility @markshannon!

github.com

Kronuz/cpython/blob/lazy_imports-pyperformance/lazy_imports-pyperformance/README.md

# Benchmark Lazy Imports on top of CPython Main

These experiments ran on top of CPython Main branch with Lazy Imports implementation on a dedicated AWS becnhmarking machine.

+ GitHub Fork: [https://github.com/Kronuz/cpython.git](https://github.com/Kronuz/cpython.git)
+ CPython Main: [553d3c10172254b190078c50eb9f8e60522c8f41](https://github.com/Kronuz/cpython/tree/553d3c10172254b190078c50eb9f8e60522c8f41)
+ Lazy Imports: [77213f781e876869018cce5aefe5828b3b7b6b49](https://github.com/Kronuz/cpython/tree/77213f781e876869018cce5aefe5828b3b7b6b49)

---

## Benchmark

Configured each version to compare (`cpython-main`, `cpython-lazy_imports-disabled` and
`cpython-lazy_imports-enabled`) with all optimizations enabled, using LTO and `pyperf system tune`:

```
$ ./configure --enable-optimizations --with-lto

$ make -j

This file has been truncated. show original

markshannon · October 12, 2022, 1:38pm

Are you sure about those numbers?

It seems odd that the differences are so consistently small.
We typically see a few percent variation across benchmarks even for tiny unrelated changes due to variation in code layout.

The consistently small changes are especially surprising because the branch includes significant changes to frames and dicts, two of the hottest data structures in the interpreter.

Kronuz · October 12, 2022, 4:02pm

@markshannon, now you’re making me think I could have messed up!

Let me re-run all the experiments and I’ll get back.

Kronuz · October 12, 2022, 7:40pm

@markshannon, sorry, I did mess up the previous run. Thank you for noticing the possibility!.
The updated link shows the real numbers as they currently are. I thoroughly verified it this time.

guido · October 12, 2022, 7:52pm

Can you summarize in one line what the outcome is?

Kronuz · October 12, 2022, 8:59pm

@guido, the geometric mean is currently 1.01x slower when comparing vanilla CPython with CPython and CPython with Lazy Imports implementation disabled, and again 1.01x slower when comparing CPython with Lazy Imports implementation disabled vs enabled (1.02x slower from vanilla CPython to fully enabled Lazy Imports).

brettcannon · October 14, 2022, 6:45pm

One piece of personal feedback is I think there’s now too many ways to control whether lazy loading is on/off:

-L
importlib.set_lazy_imports()
importlib.enable_lazy_imports_in_module()
Import in a try
Import on a with

Do we need so my knobs? For instance, you can do option 3 by inverting the logic and using option 5 with a do-nothing context manager, e.g.:

with force_eager_importing():
   ...

Since that function only has module-level effects, that suggests you already have control over the code in that module. As such, you can choose how to handle that situation however you want.

For turning lazy importing on later, since you can control what’s eager I’m assuming this is for indirect imports that have side-effects (e.g. A → B w/ side-effects)? But you could forcibly do that by importing B before you import A, and do both eagerly in a try/with. I see this potentially falling apart if you can’t import B before A because of some side-effect requirement on A. But in that case, you could still import A and then B in the same eager block for the same effect.

Personally, the way I would suggest doing this is:

Emit IMPORT bytecode for anything at the module scope not in a block that can be eager or lazy based on a flag in sys (or just within try)
Emit an EAGER_IMPORT bytecode for everything else
Have importlib be called differently based on which bytecode is used so it knows which module object to construct (by default, otherwise I haven’t looked at how your proposed functions communicate with the deep innards of importlib)
If you want people to control what things do, then let the flag in sys be writable

After that, it’s up to people to come up with their own APIs to control lazy/eager importing by flipping stuff around in sys (e.g. their own context managers since people should not be doing imports in threads to begin with). Otherwise I’m afraid you’re going to be constantly chasing everyone’s desires of that one API they want for their thinking of how this should work and it will vary as much as how people do imports (which is a lot).

encukou · October 17, 2022, 1:18pm

With what you’re proposing, how would make the imports within a single module lazy, regardless of -L? AFAIK that’s SciPy’s use case.

A library that would set that flag would usually want to reset it back to the previous state – but it couldn’t use try or with to ensure that.

Kronuz · October 17, 2022, 7:30pm

I did a few optimizations and re-ran the experiments in the same machine. The geometric mean is now:

1.00x slower — CPython vs. Lazy Imports disabled.
1.01x slower — CPython vs. Lazy Imports enabled.
1.01x slower — Lazy Imports disabled vs. Lazy Imports enabled.

You can check the detailed results in the same link: cpython/README.md at lazy_imports-pyperformance · Kronuz/cpython · GitHub

cc: @guido, @markshannon

Kronuz · October 17, 2022, 7:55pm

@brettcannon, thank you for your comments. I agree we have what it looks to be too many ways of controlling lazy imports, but these are the result of real needs. For example, we need the try / except / finally for compatibility with a lot of code which uses the idiom:

try:
    import foo
except ImportError:
    foo = None

Imports being eager inside with block was a byproduct of that idiom (since in the original implementation I checked f_iblock == 0). We are now producing IMPORT at module level only (outside blocks as you are suggesting) and EAGER_IMPORT on all other imports, so we are no longer relying in the now defunct f_iblock. We could maybe consider leaving out with, but it made sense to have that do-nothing context manager to make things clearer (instead of using try / finally).

I like the idea of making sys.flags writable, but that doesn’t address having a container of module names that are to load things eagerly (the excluding argument to set_lazy_imports()); and this is needed for incompatible libraries.

enable_lazy_imports_in_module() is for SciPy use cases, where it’s desirable to have all imports in a module being lazy, without affecting how it currently works anywhere else. We could of course explore other ways of doing it, what would you propose it’d be a cleaner way for this use case?

brettcannon · October 18, 2022, 12:15am

I totally agree with that and the with block idioms.

Not to have that use case . I think per-module flipping like this via a function call is too magical. If you need that level of control then you can manually change things as appropriate as you obviously control the code that will be affected (i.e. the module you are changing import semantics for). What you’re effectively doing is what a __future__ import typically does, so making a module have a side-effect due to importing a module (the exact thing you’re trying to get people not to do ) seems weird.

barry · October 18, 2022, 2:32am

I’m not sure if it’s better or not, but if the objection is that there are too many importlib functions being added, rather than the actual functionality that enable_lazy_imports_in_module() provides, you could add another keyword argument including which could take a typing.Container[str] | None just like the excluding keyword. Then you could write something like:

# I am module foo.bar.baz
from importlib import set_lazy_imports
set_lazy_imports(including=['foo.bar.baz'])

It’s not beautiful, but it could work I think.

Kronuz · October 18, 2022, 3:25pm

@brettcannon, other than @barry’s suggestion of using including which I also agree could be cleaner than having two functions.

If we knew that someday Lazy Imports would be the default import mode (not saying they will ever be), a future flag could solve the issue, i.e. from __future__ import lazy_imports on top of the module would work. But because we don’t know that and since a future implies something will become the default at some point, we could have another similar import for these kind of things. e.g. from __options__ import lazy_imports. (we’d just have to choose a name: __options__, __opts__, __behavior__, etc.)

Another option could be adding an specific keyword to be used in lazy import statements, e.g. lazy import foo.bar and from foo.bar lazy import Bar (or require foo.bar and from foo.bar require Bar). The drawback of this is that we’d need to introduce syntactic changes to the language and that brings a whole new set of problems.

Thoughts?

brettcannon · October 18, 2022, 7:59pm

That’s up to you if you want to propose it, but I’m personally fine with the syntax idea. It’s the slower path, but it would allow for making it the default with a __future__ at some point. You could still have a flag to override the default for those that know they can handle that scenario so you don’t have to wait for lazy imports to spread throughout the community once the syntax is available on all supported versions that some code is designed for.

encukou · November 14, 2022, 4:56pm

Re-reading the PEP, I see one point without much rationale. Sorry about pointing it so late – it’s a big PEP, and it’s easy to miss the forest for the trees.

“Transparent” means that besides the delayed import (and necessarily observable effects of that, such as delayed import side effects and changes to sys.modules), there is no other observable change in behavior: the imported object is present in the module namespace as normal and is transparently loaded whenever first used: its status as a “lazy imported object” is not directly observable from Python or from C extension code.

Is this a good constraint?
Lazy imports aren’t fully transparent, as they have “necessarily observable effects”. With that in mind, is preventing any other observable changes worth the implementation complexity?

The complexity worries me. I understand that it can be easily added to current dicts, but it’ll be a burden for any future optimizations and implementations.

To make things clearer, consider semantics like the following. (I don’t see anything similar in Rejected ideas, hopefully it wasn’t floated earlier):

import foo creates a global variable __lazy:foo (specially named, but otherwise normal), and sets it to a lazy object. (Another possibility is using a global dict: __lazy__['foo'].)
LOAD_GLOBAL for potentially lazy objects (which are known at compile time) becomes LOAD_LAZY_GLOBAL, which:
- tries loading foo, and if it doesn’t succeed:
  - loads __lazy:foo from globals (not builtins), resolves it and stores the result as foo
  - deletes lazy:foo
- replaces itself with LOAD_GLOBAL, if the specializing machinery allows that
module __getattr__ tries resolving lazy objects the same way
is_lazy_import, eager_imports, set_lazy_imports would work as in the PEP
importlib.resolve_lazy_imports(mod_or_dict) or a globals(resolve_lazy_imports=True)

That would break more modules than transparent way of modifying dict, but, how serious would that be? ISTM that it would break modules that inspect module __dict__ or globals() directly.

Would it be better than changing __dict__? I have no way of knowing. Even if I implement it, I can’t quite test it without access to huge real-world codebases, and mechanisms to patch third-party deps.
And that’s the main thing that makes me uneasy. Testing an implementation is a huge undertaking, since it involves adapting third-party code. I don’t think it can realistically be done outside Meta. If the proposed semantics are just a local optimum, or a Meta-specific one, we might get stuck in it.

I don’t know what to do about this, though. It dosesn’t sound fair to ask Cinder folks to implement and test half-baked ideas.
So, my concrete question is: how important is the “transparency”?
(Apologies it this was discussed before – but if it was, it should be mentioned in the PEP.)

[edit]: This is a personal view, I don’t represent the SC here.

csm10495 · November 14, 2022, 6:19pm

One thing that confuses me about this is:

How does a user know that libraries they depend on would be ok with lazily-loaded imports?

Like if we have a flag where someone can force all imports to be lazy (-L), why would someone who claims to want performance not set that irrespective of potential issues? Similar issue with importlib.set_lazy_imports() … I mean its tempting to just put that at the top of an entrypoint in hopes of better performance… which could lead to issues and confusion when weird behavior happens.

Part of me thinks having this type of feature may be better fit to a ‘4.0’ release.

barry · November 14, 2022, 7:24pm

They have to test it with their leaf applications. Only the authors of such an application can possibly know whether lazy imports are safe, and they also control the -L flag (although IIUC, there’s no possible way to specify shebang flags in an entrypoints definition).