Delayed module execution (again)

Following up on Async imports to reduce startup times.

This would be a very different approach, and I’m not sure how much sense it’d make, but I think we could possibly add the following:

  • A way to explicitly state that deferred execution is supported by the module
  • A delayed execution flag in bytecode
  • A special function that is always explicitly called at import time (eg. __run__), even if the the module supports delayed execution

Since delaying execution is mainly an optimization thing, not doing it shouldn’t be a big deal for modules, but doing it without the module explicitly supporting it can be problematic, as the module might rely on code executing on import. For this reason, we opt out by default and only use it when we know it’s supported.

When a module is imported in delayed execution mode, all its attributes become delayed references — the code that defines them is only executed on access. Code that needs to be executed at import time goes into __run__, which can include triggering the execution of certain attributes.

As far as adoption goes, having to flag every single module as supporting delayed execution would be pretty annoying, and that’s where the bytecode flag comes in. Most code is shipped as a package, so we could add some metadata to flag delayed execution support for its whole pure-Python code, which installers could check, and generate the bytecode accordingly. We can also have the backends or installers directly adding the flag themselves, depending on which execat mechanism we choose for that.

This provides a pretty easy path for projects to adopt the feature, leaving only unpackaged code needing to use the manual flag. On top of it, it is backwards compatible, and doesn’t require any syntax changes.

The main drawback AFAICT is that delayed execution might not be guaranteed to module authors, but I think that’s okay, as I don’t see a big need for that, and it cal already be accomplished in other ways (eg. module __getattr__).

Any thoughts? Is there any technical detail I am missing regarding the viability of this?

2 Likes

I’d like to see __run__ reserved for a potential alternative to the trailing if __name__ == "__main__*: idiom, but aside from naming quibbles the general idea sounds plausible to me.

The presence of the new name as a top level attribute could serve as the flag:

  • __at_import__ = None: opt-in to lazy loading support with no import time side effects
  • def __at_import__(): ...: opt-in to lazy loading support with import time side effects
2 Likes

Would there be a clever way to get objects out of __run__()'s local variables or would we just use lots of global statements?

def __run__():
    global SOME_CONSTANT
    SOME_CONSTANT = 3

That’s assuming that delayed_module.SOME_CONSTANT wouldn’t trigger full module loading anyway?

So code can be selectively executed based on which module attribute is accessed at runtime? But how do you determine which code defines an attribute? A variable in a module can be defined conditionally and can be defined for multiple times. There may not be a one-to-one relationship between an attribute and its defining code.

Also, there may be dependencies for a piece of attribute-defininig code. Do you build a dependency graph from a static analysis or do you access the dependency to trigger its defining code, which may also be buried in conditions?

Take the following module for example:

a = random.randrange(2)
b = random.randrange(2)
c = 2
if a > 0:
    c = 1
d = b > 0 or c

If one accesses the attribute d, can you explain the mechanism with which the interpreter can deduce backwards that c = 1 may need to be executed?

Given that the primary use case is imports, and it’s possible to handle lazy imports via __getattr__ already, could this be a sugar for exactly that? Non-import lazy execution wouldn’t be supported.

The advantage being that you write normal import statements and static analysis can read them, but the actual module object you get puts all of those imports into __getattr__ and produces a reasonable __dir__ and populates an __all__ tuple.

e.g.,

__at_import__ = None

import requests 
from typing import TypedDict

A module like this is easy to understand.

Putting any non-import statements into such a module could be flat-out unsupported (runtime errors). After all, if you need to execute something at import time, we’d be offering __at_import__ as the hook for that as well.


This version of the idea basically makes the feature only useful in __init__.py files, but the limited scope would mean we don’t need to discuss arbitrary deferred execution. So that trade-off seems worth it to me.

Something to consider:

  • some people are getting overwhelmed with python bytecode changes. I would look into LOAD_DEREF and Cell variables
  • debugging cyclic imports will be hell
  • is it actually going to be an async app or lazy importing? is there going to be an executor loop? when is it run and what will be responsible for shutting it down?
  • you could always consider refining package structure such that you do imports when you need them naturally

I’m a big fan of delayed importing/execution, but there is an edge case we came across in pip by lazily importing: Lazy import allows wheel to execute code on install. · Issue #13079 · pypa/pip · GitHub

So whatever the mechanism it’s important that libraries like pip can specify which modules must not be delayed regardless of any user flags. This would ideally include overriding modules which do specify they are safe to be delayed, so that vendored libraries don’t end up needing significant patching.

Technically you can already do something that looks like this - although not currently in a concurrent-safe way that I’m aware of[1]. I have something like this in ducktools-lazyimporter.

from ducktools.lazyimporter import LazyImporter
from ducktools.lazyimporter.capture import capture_imports

_laz = LazyImporter()

with capture_imports(_laz):
    import numpy as np

This ‘captures’ the import onto the _laz object so it’s accessible as _laz.np within the module, but automatically writes __getattr__ and __dir__ to make it available from the module directly. Static tools will believe the import has occurred, so if you re-import from this module they should work - however if you try to use the names within the module they don’t exist.

I’d thought about proposing something like the regular dynamic version of this[2] after seeing all of the different ways people are doing lazy imports in-line in the stdlib currently.


  1. If someone does know of a way to safely temporarily replace __import__ other than by doing so globally in builtins that would probably work. Import hooks are too late in the process and would also interfere with each other in a similar way. ↩︎

  2. Where you create classes and use strings instead of hijacking the import statement ↩︎