PEP 690: Lazy Imports

Yes, it does. Other than that +1 on your comments.

3 Likes

Thanks @encukou for raising the Black example. I do think that “application developer” is the right term for who should be enabling lazy imports, but (as @tiran said) the Black example is more confusing because it is a bit of an application/library hybrid in practice. The types of larger deployed applications (probably mostly internal?) that are IMO most likely to benefit from lazy imports don’t muddy these lines.

That’s a reasonable concern! Like you say, I think the only practical way to address it is to make the feature available opt-in so more people can try it (without having to compile or patch a custom Python, which for many potential experimenters would be a non-starter.) Would it help if we explicitly labeled it as provisional and subject to possible removal if more real-world experience suggests the problems outweigh the benefits?

One thing I would note, if it helps, is that at Meta we don’t have just “one specific deployment” of lazy imports, we actually now have quite a few deployments that are rather different from each other. Instagram Server is a very large Django-based web server monolith, but lazy imports are also in use by some other non-Instagram services and by a number of CLI tools of varying sizes. We are currently working on broader adoption of Cinder within Meta (outside of just IG), and currently lazy imports seems to be by far the most widely attractive feature to new potential users, more so than our JIT or other performance-oriented changes. Our data science users are also now rapidly adopting Cinder-based Jupyter notebook kernels, motivated by the large improvement in notebook startup time and memory usage from lazy imports.

3 Likes

Thanks @steve.dower . I’m totally in agreement with all of your comments about social responsibilities, and will be happy to draft a PR to the PEP text (or update my currently outstanding PR) to try to address this more clearly in the “How to teach this” section.

3 Likes

One other possible change to the PEP that I’m still mulling over is something to address the case @ronaldoussoren raised, where a library A (that does care to support lazy imports) might be using another library B in a way where the library A author knows that a certain module within B needs to be opted out of lazy imports for A to work correctly with lazy imports enabled. Currently the PEP doesn’t really offer a tool for the library A author, other than documenting for their users “if you enable lazy imports, please opt out module b.foo.” This isn’t ideal.

One option is that we could provide a PyPI lazy_imports_helper package that provides a few pre-canned opt outs that are commonly useful (initially based on our experience at Meta), and recommend this package in the lazy imports documentation. This is sort of a typeshed-style approach that places both less responsibility and less control with the library A author (they can always make a PR to lazy_imports_helper for the case they notice.)

To give more control to the library A author, we would need a more complex set of opt out controls, with precedence and overrides. This could be as simple as just exposing the opted-out-modules set as a global attribute of sys or importlib, as @malemburg showed in some code snippets. Then the library A author can add b.foo to this set in time for the import they care about, and if the application developer really wants to override that, they have to first import library A, so its change takes effect, and then modify the list again after this.

I think this could be OK. It just encourages libraries to mutate global process state as a side effect of being imported, which is not very nice as a rule.

Interested in thoughts on these two options (or others I didn’t think of.)

1 Like

Would it make sense for those to be in the stdlib? I’m not sure you’d need to provide a backport package on PyPI, since the stdlib APIs wouldn’t be available also until lazy imports are available (unless perhaps they were no-op for older Pythons).

1 Like

The advantages of maintaining it as a standalone PyPI package are:

  1. Provide a shim layer that’s no-op for older Python versions, so application developers can use the helpers without checking whether they are running under a version that has lazy imports.
  2. Provide frequent updates, decoupled from CPython release cycle. Reduces friction to receiving community contributions and releasing frequently.
3 Likes

A -X option would be a good way to signal “provisional” or “specific to CPython 3.12”.
Unfortunately, making features provisional usually didn’t work too well in the past.

There’s another benefit of having such a repo on GitHub: it would be a place to design and discuss lazy-import fixes before (politely) contributing them to the libraries :‍)

3 Likes

The specification section of PEP 690 doesn’t include a specification of what a lazy import does.
It just says that it is deferred. How does that occur? How does the import change the VM state?

Suppose we have:
[lazy] import foomod as foo
Then what happens?
Is "foo" in globals() true?
What happens to globals() after globals().copy()["foo"]?
Does
g1 = globals().copy(); g2 = g1.copy(); g1["foo"]; g2["foo"]
cause foomod to be imported twice?

What is the model of execution?
After import foomod as foo, is the key “foo” in globals() a thunk, or is globals()["foo"] a thunk, or is there some other model?

The PEP needs to provide a specification, such that I can answer these questions.
I should be able to write conformance tests for an implementation, and currently I cannot.

4 Likes

Hi @markshannon,

Yes, I agree some details about the implementation could be improved in the PEP. There are many edge cases, but something can most likely be cooked help getting us to a point were we can add some conformance tests for the most usual cases just from reading the PEP.

To answer some of your other questions:

Yes, the key "foo" would be in globals(), and the value for it is an instance of a deferred object. The value will be resolved when is first used, and the new non-deferred object will, in most cases, immediately substitute the old value in the dictionary.

Neither globals().copy() nor g1.copy() triggers the import, the complete dictionary is copied without resolving any of the deferred values, and both dictionaries become dictionaries that contain lazy objects. "foo" is resolved when it’s first accessed, when using g1["foo"], and the second access (g2["foo"]), when it tries to resolve the deferred object, simply gets the module from sys.modules, so it’s not imported twice. It’s just as if you had called import foo twice in this case.

1 Like

Using lazy imports, if I run this code:

from math import pi

is pi imported lazily? I think it is, but the PEP doesn’t explicitly say that from imports are lazy (apart from star imports, which are never lazy).

If I do a lazy import:

import mymodule

I know that referring to the name mymodule will load it. And I think that even looking up the name in globals globals()['mymodule'] will load it. Correct?

But what about inspecting globals?

print(globals())

I think it would be good to have an API that can inspect a global name without causing it to load. I don’t have an opinion on what that API should be (something in the inspect module?) but there should be a way to grab a reference to the lazy proxy object without triggering the load.

Yep, the first PEP draft didn’t include sufficient specification details. There is a PR of a much-expanded second draft up at PEP 690: Updated draft from discussion and feedback by carljm · Pull Request #2613 · python/peps · GitHub that incorporates a lot of the feedback from this thread, plus a much more detailed implementation section. Let us know if that PR doesn’t address your questions adequately! Sometime after it is merged we’ll formally post the updated PEP draft for a new round of discussion.

1 Like

Yes, this import will also be lazy. The name pi will be added to the module namespace as a lazy object, and the first reference to pi will trigger the actual import resolution. The next update of the PEP text will make this clear. Thanks for the feedback!

Correct.

This will also cause the import to resolve. Not due to calling globals(), but because you’re printing it, calling its __repr__, which iterates its values – accessing the value is what will trigger the import.

Based on suggestions earlier in this thread, we’ve added an API importlib.is_lazy_import() to the PEP. This API has to be passed a dictionary and a key name, and returns a boolean. E.g.:

import importlib

from foo import bar

importlib.is_lazy_import(globals(), "bar")  # True

bar

importlib.is_lazy_import(globals(), "bar")  # False

At the level of Python semantics, there is no such thing as a “lazy proxy object” – these are an internal C implementation detail that is never visible to Python code. The key innovation of this PEP relative to previous lazy import solutions is the integration into the dictionary implementation: this makes lazy imports very low overhead and transparent (by which we mean precisely that Python code can never see a lazy proxy object.)

This is why the proposed is_lazy_import API just gives you a boolean result.

I guess it would be possible in principle to keep lazy imports transparent and avoid a lazy proxy object ever escaping accidentally, but still make it possible to get a reference to one only via a dedicated API. This should avoid adding new compatibility concerns. But it also adds a lot of complexity to the implementation, and the use case is not clear. What useful thing can you do with a lazy import object if you get hold of one?

5 Likes

Looking at this from the library perspective I share @takluyver 's concerns that this will eventually become an expected feature and very much agree with @steve.dower 's point in PEP 690: Lazy Imports - #167 by steve.dower about how this needs to be messaged.

I do not think that the concerns of what can go wrong are being overblown (if anything everyone is giving them too little weight). In Matplotlib we have made the backend selection and import progressively lazier and at every step discovered a subtle way that a user was relying on a side-effect (I can track down the bug reports if anyone wants). I expect that this is change is going to produce a stream of really interesting bugs across the whole ecosystem :slight_smile: (but given the amount of use that people are reporting from cinder, I would be very happy to be wrong!).

That said, I’m still enthusiastic about this proposal.

Matplotlib (and I think many of the scipy tools) have relatively long import times and want to reduce them. A major contributor to our startup time is that for both internal and external reasons we have a lot of re-importing between modules so if you pull one module, you end up pulling a majority of the library. Because these imports are done in public modules we can not drop the imports (or bury them in function calls) because we may have users accesses those name from the (re-imported) namespace. I have started to advocate that Matplotlib adopt SPEC-1 (or some other scheme based on module properties), but would much rather have this done at the language level.

I am only one data point, but I think the carrot for library maintainers is both that we do care about our users problems (or we would not be maintainers!) and that in many cases we are users of our own tools (e.g. at my $work we actually have tests that Matplotlib is not imported unless you pull in a very specific sub-module in our libraries) so we are also feeling this pain.

Another application of this that I have not seen discussed is that this makes making curated namespaces for a domain (that may pull from one or more libraries) much more practical. One of the criticism we get from people moving to Python from something like MATLAB or Mathematica is that the ecosystem is too scattered (you want feature X, go to library A, you want feature X’ you go to library B) and new users get lost. Being able to hand users a conglomerated and curated namespace across numpy, scipy, sckit-image, pandas, networkx, Matplotlib, 
 (that does not take forever to import) would be a pretty big step forward (leaving aside the process of how to sort out what should be in the namespace :wink: ).


I see the practical reason for the ENV (you have some script that you can not fix that is invoking Python and you want the lazy), but I think it opens up too much action-at-distance. I am also swayed by the argument that by the time any Python is running, you in general have no idea what has or has not already been imported (thinking like a library developer not an application developer here) so it is too late to ask to be lazy. On the other hand, the CLI flag is localized to one process and you can be sure it is handled before any Python code gets a chance to run.

8 Likes

A common pattern to import an optional dependency is

try:
    import module
except ImportError:
    have_module = False
else:
    have_module = True

However, this always has to be an eager import. Thus, I’m wondering how to perform an optional import lazily? Would this work, perhaps?

from importlib.util import find_spec
have_module = find_spec("module")
if have_module:
    import module
1 Like

@mara004, the pattern still works with the lazy imports feature. Any imports inside a try / except / finally block are eager imports.

Are the imports within those imports also eager? If I try to import something that is optionally installed, and it works, presumably you still want the imports in that library to be lazy.

I expect it will to, particularly while enabling the feature for testing things and trying to figure out if it works or not with lazy imports, out of the box.

This is exactly what lazy imports is supposed to fix, and I’d expect these libraries to become major users of the feature.

This is indeed a very cool thing for some libraries to have and something lazy imports can certainly help with.

No, they’re not. It’s correct it will not work out of the box if the imported library is importing some other module internally and expecting it to throw an error if it doesn’t work (or is missing). In this case, we’d have to add the module to the list of modules in the excluding argument of set_lazy_imports() or ask the author to add lazy imports compatibility. If it’s not a third-arty library, we could add a try / finally there as well, so the inner import becomes eager too.

@Kronuz Yes, I’m aware of this. My question merely was how to best replace the first pattern so as to do a lazy, optional import.

3 Likes

Sorry, I’m not clear what this means. If I try to import a module, are that module’s imports lazy, or not? It seems not ideal to try import LargeLibraryWithExpensiveImports and suddenly lose all benefits of lazy imports.

1 Like