PEP 690: Lazy Imports

Kronuz · May 3, 2022, 8:18pm

PEP 690 is posted! It proposes an opt-in feature to transparently defer the execution of imported modules until the moment when an imported object is used. Since Python programs commonly import many more modules than a single invocation of the program is likely to use in practice, lazy imports can greatly reduce the overall number of modules loaded, improving startup time and memory usage.

At Meta, we have implemented this feature in Cinder, and has already demonstrated startup time improvements up to 70% and memory-use reductions up to 40% on real-world Python CLIs, while also eliminating the risk for most import cycles.

Feel free to comment or share thoughts and feedback.

bentheiii · May 3, 2022, 8:38pm

I like this feature and it seems good for most cases, but I doubt such an obviously breaking change will ever be accepted. Why not allow it conditionally with a __future__ import?

Kronuz · May 3, 2022, 8:41pm

@bentheiii, this is an opt-in feature that is enabled globally passing the -L flag or using the PYTHONLAZYIMPORTS environment variable.

bentheiii · May 3, 2022, 8:50pm

I understand, but I can only enable this feature in an existing codebase if I know that none of it uses import side-effects (and, that none of my dependency libraries use import side-effects). Allowing for per-module opt-in instead of interpreter-wide opt-in would be a lot more manageable for people using this feature in existing codebases.

Like it or not, a lot of ubiquitous libraries nowadays use import side-effects in their internal logic, making this feature interpreter-wide means that I can’t use it if I rely on those libraries.

At the very least, I’d expect the PEP to include an explanation as to why you went with an interpreter-wide flag that changes the way python behaves (AFAICT, this is unprecedented except for -O), instead of a __future__ import which I’d expect for something like this.

carljm · May 3, 2022, 9:10pm

Hi Ben,

Thanks, you’re absolutely right that the PEP should address this question. I’ve pushed an update to the PEP to do that. Let us know if it doesn’t make sense!

EpicWink · May 3, 2022, 9:45pm

If you import submodules (such as foo.qux and foo.fred ), with lazy imports enabled, when you access the parent module’s name ( foo in this case), that will trigger loading all of the sibling submodules of the parent module ( foo.bar , foo.qux and foo.fred ), not only the one being accessed, because the parent module foo is the actual deferred object name.

Is it possible to have foo.bar etc have two layers of deferred import, one for foo and another for bar?

It would be nice to have a function which returns whether a module is import-deferred (ie it checks whether the passed object’s __dict__'s item-lookup function is lookdict_unicode_lazy). This could be useful for lazy import debugging, among other use-cases.

Kronuz · May 3, 2022, 11:30pm

Hi @EpicWink, having two layers of deferred import would actually be a really good thing to have to avoid this behavior, and is something that may be possible! We’re looking into adding this into the implementation and will make the changes to the PEP when we have it ready. Thank you for bringing it up!

carljm · May 3, 2022, 11:38pm

It would be nice to have a function which returns whether a module is import-deferred (ie it checks whether the passed object’s __dict__ 's item-lookup function is lookup_lazy ). This could be useful for lazy import debugging, among other use-cases.

This wouldn’t be hard to provide, e.g. as importlib.has_lazy_imports() or similar. One slightly awkward edge case would be that we don’t ever unset the lazy lookup func on a dict (it would require checking every value in the dict on every read that resolves a lazy object), so has_lazy_imports(mod) would continue to be True even after all the lazy imports in mod have been resolved. The meaning would be more “ever had any lazy imports.” Would this still meet the debugging or other use cases you’re envisioning?

guido · May 3, 2022, 11:47pm

I agree that it would be nice to have some API the provides insight into the state of the lazy import system, even if sys.modules isn’t updated for lazy modules.

EpicWink · May 3, 2022, 11:57pm

I hadn’t thought of that, but no, that’s not what I was thinking. More like (in lazy import mode):

import importlib, foo
importlib.is_loaded(foo)  # False
foo.bar
importlib.is_loaded(foo)  # True

(where importlib.is_loaded’s name to be bike-shedded)

Edit: as explained below, I had a misunderstanding about when the import was triggered: I thought it was on first use (attribute access), not first reference, but I was wrong

carljm · May 4, 2022, 12:12am

This would be much harder to provide. In a sense the debuggability and transparency goals conflict here. In this code sample, the mere reference to the name foo (and placing its value on the stack in preparation to call importlib.is_loaded the first time) would be sufficient to force its resolution. So by the time importlib.is_loaded gets called, the answer would always necessarily be True. Short of compiler special-casing of importlib.is_loaded, I don’t see a way around this that wouldn’t sacrifice the reliable prevention of lazy objects escaping into the wild.

What would be possible would be something like importlib.is_loaded(globals(), "foo"), where we are really asking “is the value in this dict at this key a lazy object currently?” We have to pass in globals() explicitly in this case, or else do sys._getframe() shenanigans to get our hands on it.

methane · May 4, 2022, 12:58am

I like the idea but I am +0 until I see the implementation.
I am worrying about how much it makes dict complicated.

It would be better to have an API to enable transparent lazy import from script.
Imagine that command line tool like hg (Mercurial).

Setting PYTHONLAZYIMPORT only for hg doesn’t make sense.
It is difficult to use -L in shebang.

guido · May 4, 2022, 1:33am

IIUC the implementation requires the key to be a string. What if some joker does globals()[42] = "hi" ?

steven.daprano · May 4, 2022, 1:35am

Can we have a dirt-simple example of the expected behaviour?

For example, suppose I have:

# module spam.py
# Simulate some work:
import time
time.sleep(10)
print("spam loaded")

# module eggs.py
import spam
print("imports done")

Now I know that this example goes against the recommendation to avoid side-effects during module import, but I think that showing side-effects may be the easiest way to see

what this feature does;
and why you make that recommendation.

What normally happens is that when I run python eggs.py, it takes 10 seconds to do the work before I see any output:

spam loaded
imports done

But with this new feature, python -L eggs.py, it will load almost instantly, the work will never get done, and the only output will be:

imports done

Is this correct?

But if I change eggs.py to this instead:

# module eggs.py
import spam
print("imports done")
spam  # Any reference to the module is enough to load it.

I get this, with comments interspersed:

imports done  # appears immediately
# ten seconds while the work is done
spam imported

Is my understanding correct?

I think the PEP could do with something like this demonstrating the behaviour. Especially if I have got it wrong! If I got it right, feel free to steal my example if you like it

I also think the implementation section should have at least a high-level overview of how this magical behaviour is implemented, not just “read the source of Cinder”

nas · May 4, 2022, 2:12am

I don’t want to be a “broken record” and I don’t want to derail discussion of the PEP (I think it looks quite nice). However, I still wonder if my modules as global namespace idea could help here. I think it is a problem that we are trying to do too much with dict. If LOAD_GLOBAL can do __getattr__ on the module, rather than __getitem__ on module.__dict__, it gives us a place to hook in this lazy loading behavior without affecting every dict object in the system. The lookdict_unicode_lazy hack is clever but not exactly elegant.

carljm · May 4, 2022, 4:14am

It’s a reasonable concern. We can revisit and discuss once the ported-to-main implementation is ready.

This should be possible, if people would prefer to have this option as well.

Can you say more about why it is difficult to use -L in shebang? In my tests using one or more single-letter options in the shebang works fine, either with direct reference to python binary or with /usr/bin/env python -....

carljm · May 4, 2022, 4:25am

I think this only works here if modules can totally encapsulate their dictionary such that it is inaccessible from any other code, even C extension code. I don’t think the initial draft of your proposal suggested that level of hiding, and it would be quite hard to do in a backward-compatible way.

Without that encapsulation, it is too easy for existing code that is unaware of lazy imports/objects to go poking directly at the module’s dictionary and pull things out of it, expecting them to be normal usable PyObjects, and then go put them somewhere else. If it is possible for naive existing code to allow lazy objects to “escape” unresolved, the lazy imports idea just doesn’t work in practice; it becomes a constant game of whack-a-mole. (We know this because we played this game of whack-a-mole in earlier draft implementations, until hitting on the dictionary plan.)

It’s a fair point Dictionaries triggering imports is not exactly the layered abstraction you’d ideally draw up. But it’s the only way we could find to make this both fast and reliable. So the question the community has to answer is, do the benefits (which can be sizable) outweigh the inelegance? (You could also say that there’s a certain elegance to the relative simplicity and comprehensiveness of handling it all in the dictionary, even if the layering is odd.)

carljm · May 4, 2022, 4:30am

Good question I guess we hand-waved over this in the PEP, but in fact the implementation has both lookdict_unicode_lazy and lookdict_lazy. Usually only the former is needed, but the latter is there to fall back on if needed.

carljm · May 4, 2022, 4:34am

Entirely so! Will look at adding a simple example like this to the PEP – thanks for providing a clear one.

Hmm, I thought that this part of the PEP met that description:

Given the possibility that Python (or C extension) code may pull objects directly out of a module __dict__ , the only way to reliably prevent accidental leakage of lazy objects is to have the dictionary itself be responsible to ensure resolution of lazy objects on lookup.

To avoid a performance penalty on the vast majority of dictionaries which never contain any lazy objects, we install a specialized lookup function ( lookdict_unicode_lazy ) for module namespace dictionaries when they first gain a lazy-object value. When this lookup function finds that the key references a lazy object, it resolves the lazy object immediately before returning it.

Some operations on dictionaries (e.g. iterating all values) don’t go through the lookup function; in these cases we have to add a check if the lookup function is lookdict_unicode_lazy and if so, resolve all lazy values first.

Perhaps it’s confusing that this text is under the “Rationale” section; perhaps we should add a separate “Implementation” section.

methane · May 4, 2022, 4:44am

Linux doesn’t support more than one argument:

$ cat myscript.py
#!/usr/bin/env python3 -Xutf8
import sys
print(sys.argv)

# On macOS
$ ./myscript.py
['./myscript.py']

# On Linux
$ ./myscript.py
/usr/bin/env: ‘python3 -Xutf8’: No such file or directory
/usr/bin/env: use -[v]S to pass options in shebang lines

As above message tells, you need to use -S option

$ cat myscript.py
#!/usr/bin/env -S python3 -Xutf8
import sys
print(sys.argv)

$ ./myscript.py
['./myscript.py']

pip writes script with rewritten shebang. I don’t know pip support multi argument shebang.