PEP 690: Lazy Imports

carljm · May 13, 2022, 3:47am

Hi Marc-André,

Thanks for providing a clear example case! I agree that delayed errors can be a problem. I am a bit confused as to exactly where your disagreement with PEP 690 is, though, since I think the PEP already provides the tools needed to handle this case: in fact what the PEP already proposes is very similar to your sys.lazy_import_unsafe.

You say:

But this is not true! Quoting directly from the PEP:

The more difficult case can occur if an import in third-party code that can’t easily be modified must be forced to be eager. For this purpose, we propose to add an API to importlib that can be called early in the process to specify a list of module names within which all imports will be eager:
from importlib import set_eager_imports

set_eager_imports(["one.mod", "another"])
The effect of this is also shallow: all imports within one.mod will be eager, but not imports in all modules imported by one.mod.

set_eager_imports() can also take a callback which receives a module name and returns whether imports within this module should be eager:
import re
from importlib import set_eager_imports

def eager_imports(name):
    return re.match(r"foo\.[^.]+\.logger", name)

set_eager_imports(eager_imports)

So the PEP spells it importlib.set_eager_imports(['fast_csv']) instead of sys.lazy_import_unsafe.add('fast_csv.config'); otherwise it seems quite similar to what you propose.

In fact the callback option also allows your “opt in just a few modules” case, too: define a callback that implements an allow-list instead of a block-list.

One difference is that the PEP proposes to name modules within which imports are eager, instead of modules whose import will always be eager. I don’t think this is a critical difference either way; as I described above I don’t really think describing fast_csv.config as “lazy import unsafe” is accurate: rather I would say the usage of fast_csv.config in the specific context of fast_csv (whatever module defines parse_float) is not safe to be lazy. (Although if we make the LazyImportError change, then I think even that alone makes it OK.)

The other case the PEP does not currently support is the library author marking fast_csv.config as lazy import unsafe. But if the library author is willing to bother accounting for lazy imports in the first place, they can just as easily “opt out” for a potentially problematic import by doing this:

with importlib.eager_imports():
    import config

def parse_float(value):
    ...

So in sum: I agree with your concern, and your example case, and I think the PEP already provides all the tools required to handle it in a way that is not very different from what you propose; it seems more like API bikeshedding than a real difference in capability.

I still think the concern about delayed errors from imports biting someone is very real, and I love your idea for that:

I am inclined to think the PEP should include this. That way lazy errors will not silently pass as some other error in the way shown in your example.

I do think inheriting BaseException is a step too far: except Exception: should still catch LazyImportError. If someone is catching all exceptions, they don’t want errors bubbling through and they are already accepting the risk that they might catch any random thing they don’t expect. I don’t think LazyImportError is parallel to MemoryError or KeyboardInterrupt and deserves to be treated so differently; being a distinct exception type is sufficient to handle your example case.

carljm · May 13, 2022, 4:08am

Hi Stephen,

I would be interested in seeing an example of this! I do think it is possible with a getattr-based lazy importer, since these are effectively not lazy when you have from ... import ..., and import cycles can be sensitive to exactly where in the loop you enter the cycle: entering one place can cause the cycle to error, whereas entering at a different point in the loop will work fine, based on details of which module accesses what at module scope.

I am not sure that this would be possible with PEP 690 laziness, but I’d love to look at an example and see.

With PEP 690 as it is today, if this package is named pkg, you could spell this (in your main module) as:

def eager_imports(modname):
    return modname != "pkg"

importlib.set_eager_imports(eager_imports)

And turn on lazy imports. Only imports directly within pkg will be lazy.

I agree. I think syntax for opt-in lazy imports would be really nice for typing-import purposes in codebases that don’t need the startup time / memory benefits of PEP 690 and so don’t want to deal with opting in to lazy imports generally. But I’m inclined to think this should be a separate PEP that can build on the infrastructure of PEP 690, because the rationale and motivating use case is quite different, and that motivates different capabilities (including new syntax, which is a big change that PEP 690 doesn’t need.) I think PEP 690 stands on its own merits, and when that is true I generally think smaller PEPs are better than bigger ones.

That said, if the Steering Council were to say “we like PEP 690 but we’d like it more if it included syntax for per-import opt-in too in the same PEP,” I would totally support adding that

eric.snow · May 13, 2022, 4:42pm

As I’ve thought about the three backward incompatibilities, as well as the library opt-* discussion, one possible solution has come to mind: tooling.

We’ve leaned on external tooling (linters, type checkers, etc.) in the past to solve similar situations. This certainly isn’t the first time we’ve faced tricky cases in an otherwise desirable feature, where solving it in the compiler or runtime would have too large an impact on performance in the common case. There is plenty of precedent of tools filling that gap effectively. We should weigh it as a possible solution here.

While this case is similar to those past examples, and it may be a good solution, there are some additional wrinkles:

(IIRC) most/all linters only analyze a single file at a time, rather than holding state between files
analysis would have to reach into dependencies for some of the checks (requiring they be installed)
it might be tricky to identify side effects generally

The necessary checks would probably mean the tools would have to analyze across multiple files (including possibly into dependencies). I’m not aware of precedent for tooling that like that (type checkers?), but I would not be surprised if there were mainstream examples.

Basically, it would require a least a partial incarnation of whole-program static analysis. That doesn’t concern me much, since tooling for whole-program static analysis would have a number of benefits, regardless of the application here. (FWIW, it’s been on my backlog for years, currently in the top 3 once I finish with per-interpreter GIL.)

Regarding identifying side effects, the tools don’t have to be perfect, as long as they err on the side of false positives. Those can be handled like normal: with directive comments in code.

Anyway, solving this with tooling may not be the right fit but it is worth considering. At the least we should identify how hard it would be for existing tools to meet this need. I did not notice any mention of tooling as a solution so I figured I’d bring it up.

guido · May 13, 2022, 5:38pm

Possibly you could use the trick we are using for exception groups, where we have two exceptions, one a BaseException and the other an Exception. See pep 654.

nas · May 14, 2022, 12:06am

I’ve often wished for something that would create a single file “bundle” of Python packages and libraries. zipimport is close I guess but the tooling and startup options don’t make it too convenient. I don’t want to mess with venv, PYTHONPATH, etc. Just start my program with something like:

py -bundle myapp.pyb

The .pyb file would contain all of the pyc data, similar to what zipimport does. Maybe we could put the lazy import annotations (i.e. safe or not) into that file too.

BowlOfRed · May 14, 2022, 1:48am

Something like shiv?

barry · May 14, 2022, 8:58pm

FWIW, we at LinkedIn (mostly my colleague Loren Carvalho) developed shiv as a pex alternative, to modernize the tool chain, fully embrace Python 3, and provide additional features and such. It gets us pretty far, but really there are (at least) two downsides to any zipapp approach:

You have to already have a compatible Python executable installed. It would really be nice to be able to bundle everything into a single executable with little to no external dependencies.
You still can only portably import extension modules from the file system (because dlopen() does not work from a memory offset). That means any zipapp with shared library extension modules have to unpack them to the file system in order to import them. We’ve played with unpacking the whole zipapp and just unpacking the .so files on demand and found there really isn’t a performance gain from the latter approach, and the former has warm-startup benefits.

I played with PyOxidizer several years ago, and it was mostly a successful experiment, but it would have taken significant effort to make it work seamlessly, and integrate it with our tool chain. It’s probably worth looking at again now.

wim · May 15, 2022, 12:07am

The PEP says:

Existing import-hook-based solutions … are limited in that only certain styles of import can be made truly lazy (imports such as from foo import a, b will still eagerly import the module foo)

Does this proposal avoid that limitation? How?

carljm · May 16, 2022, 8:58pm

Right, PEP 690 does not have this limitation.

Lazy loaders based on import hooks and module __getattr__ create lazy module objects that reify themselves on the first attribute access. But a from ... import ... is just an import followed immediately by one or more attribute accesses on the newly imported module, to get the specific names from it that should go into the importing module’s namespace. So the immediate attribute access effectively makes the import eager.

PEP 690 allows having an independent lazy placeholder for each value in a module namespace dictionary, and it modifies the behavior of the import opcodes themselves to create these lazy placeholder objects instead of actually performing the import. So with PEP 690 a from foo import bar, baz just places lazy placeholders into the importing module’s namespace under the names bar and baz, which track enough metadata to know that when referenced they should import foo and get the right name off it.

guido · May 17, 2022, 12:00am

Carl and German, I have a question about the semantics. It may have been answered in this thread but I didn’t see it and Discourse claims 57 minutes reading time. I don’t think it is answered by the PEP.

If I have from foo import bar is that import lazy or not? The PEP’s Motivation section suggests that this is something that other lazy importers don’t support, but the specification doesn’t have a single example of this form. I’m guessing yes, but it would be nice to know for sure.

Relatedly, I’d love a sketch of how you are implementing this that is not quite the Cinder code, yet more than what the PEP currently has, and also more than “we added a new lookup function to the dict implementation”.

Somehow reasoning about the semantic implications of the PEP would seem easier to me if I had a better understanding of the implementation. For example, does a dummy object appear in sys.modules? If not, is there some other (hidden, internal) place where such dummy objects are stored? And it would help me envision things like how to think of the state of a lazily-imported module that failed to execute (MAL’s example).

Assuming from A import B as C is supported, there must potentially be multiple dummies for objects imported from the same module (to store at least A and B above – C is always the dict key), so now I’m curious what happens to such dummies once the import is executed for one of the dummies.

Etc., etc.

Kronuz · May 17, 2022, 1:14am

Hi @guido, short answer: yes, that’s a lazy import! To summarize what’s a lazy import and what’s not:

Any import xxx at the top level of a module (including from xxx import yyy) are all lazy; except any import * and imports inside any block (implementation currently uses f_iblock == 0). So imports inside a try/except/finally, with clause blocks or even within class blocks are all always eager.

I agree the the PEP could do a better job at explaining how the implementation works, and I think it’s a good time to try adding more information about it.

In the mean time, what the implementation does is the following:

When anything is lazily imported (at module level only), it adds a (completely new) internal “deferred” object to the module’s global dictionary. Any imported modules or names are converted to this type of deferred objects, and nothing at all is added to sys.modules (no dummy objects there).
These deferred objects (which can represent either a complete module or a name inside a module), are kept as such inside the module’s dictionary for as long as possible, and these hold every bit of information needed to do a “deferred object resolution” (current globals, locals, fromlist and level are all kept, as those were at the time the import statement was executed).
All that information kept inside the deferred object is used, when the time comes, to resolve (load and execute) the related module or the name being resolved; the same way they’d be used during the actual import or import from in vanilla CPython.
At the resolution time, whenever is possible, the value in the dictionary is updated to point from the deferred object to the actual resolved object, so later accesses to the dictionary key don’t attempt to resolve any already resolved deferred objects again.
As an optimization, and also to account for the cases where the dictionaries can’t be immediately updated (for whatever reason), resolved deferred objects also maintain a pointer to the resolved object they represent.

I think that may help to somewhat clarify the current implementation, but if I left anything unclear, please let me know.

During PyCon I ported a reference implementation, on top of CPython 3.8 (with nothing related to Cinder); it’s available at my GitHub fork in the lazy_imports_3_8 branch, and the relevant diff is available here: Lazy Imports · Kronuz/cpython@dde0ab9 · GitHub.

guido · May 17, 2022, 2:07am

Thanks, that’s helpful (and a relief!).

So now about this scenario. Suppose I have two modules, A and B, and each has

import X

So each has a different dummy object for X in their globals dict, which is impossible to get for the user because trying to get it out of the dict always resolves the import. Great.

Now when A later does X.foo it loads module X and A’s global X is replaced with the real X, B’s global X is still the dummy.

When B reads its global X for the first time, what exactly happens?
Same question when loading X raised an error. (Note that if an eager import raises an error, the X key is removed from sys.modules, so there’s no trace of X there!)

guido · May 17, 2022, 2:08am

Wait, does that mean that an import inside an if is also lazy?

barry · May 17, 2022, 4:19am

That will have to change for 3.12 since f_iblock (and f_blockstack) have been removed. I can’t seem to be able to directly link to the section so you’ll have to scroll down in the What’s New in Python 3.11 document.

Kronuz · May 17, 2022, 4:40am

I don’t think that makes if blocks lazy, only with, try/except/finally and class blocks, I believe.

Yep, it seems like it. f_iblock is gone now! I’m not sure how we’ll hook to, but we need something similar (for with and try/except/finally, specifically).

guido · May 17, 2022, 4:46am

Why not add new opcodes for optionally-lazy imports? Then the parser can decide when lazy import is appropriate based on the AST structure. (If lazy import is not enabled the lazy opcode can fall back to the regular opcode.)

Kronuz · May 17, 2022, 4:53am

Guido van Rossum:

So now about this scenario. Suppose I have two modules, A and B, and each has
import X
So each has a different dummy object for X in their globals dict, which is impossible to get for the user because trying to get it out of the dict always resolves the import. Great.

Now when A later does X.foo it loads module X and A’s global X is replaced with the real X, B’s global X is still the dummy.

When B reads its global X for the first time, what exactly happens?

Yes, if by dummy you’re referring to the deferred object, yes; if something in B tries to read its global X, it will try to load it too, because the X name in B’s globals dictionary neither was substituted by the resolved object nor the deferred "X" in it has a pointer to the real resolved X. However, at this point sys.modules already has the loaded X module and it will use that one from there.

If X raises an error while it’s being loaded/executed in module A, it’ll add a sys.modules["X"] = None, which will also raise an error when it’s accessed from module B.

This doesn’t change at all the way it all currently works, we’ll maintain the same errors we have and the same behaviors, except the timing when the errors bubble up will be different; just as if we had moved the import to the previous line where we used the imported names.

Kronuz · May 17, 2022, 4:58am

This is definitely a good idea! We’ve thought about adding opcodes for imports that are lazy, and I think it’d be a great improvement too. This way we won’t have to check whether we need to do the import lazily or not at runtime, we’d have a specific opcode for it. And yes, the only drawback is that the lazy imports opcode would still need to check if lazy imports are currently enabled or not; but I think that’s better (and simpler) than checking if it’s needed on all imports.

guido · May 17, 2022, 5:16am

That’s new – currently a failing module gets deleted from sys.modules so another attempt at importing will try to find it again (in case you have edited the source since the first attempt).

No, putting None in sys.modules['X'] is different from the current behavior when the module errors out during its execution (in 3.8 or in main).

Kronuz · May 17, 2022, 5:46am

I can’t remember from the top of my head, but I think Python may put sys.module["foo"] = None only when the module is not found, so it doesn’t try to find it again (ModuleNotFoundError), and it removes it in other errors. Lazy Imports, however, is using exactly the same behavior; that’s how it currently works. I know because I’m not doing anything particularly special in the cases where it gets to errors, we just let Python handle the import error as usual, the same way it’d do as if the import was done right there, just before the point where the name is being used.