PEP 810: Explicit lazy imports

I’ve had a careful look over the PEP and here are my thoughts:

First of all, I should say that the demand for this feature is clearly there and this PEP is a clear improvement over PEP 690, but there is room for improvement.

In my mind there are two issues with the PEP:

  1. The mechanism for reification focuses on object replacement, rather object transformation, resulting in too many corner cases and difficult to resolve issues, like the C API.
  2. The “lazy imports filter” feels bolted on and undermines much of the good design of the rest of the PEP. It almost feels as if the PEP authors didn’t really believe their own claims that being “explicit”, “controlled” and “granular” were good things.

Rather than one giant post. I’ll address point 1 in the next post and address point 2 later to leave space to discuss point 1 first.

2 Likes

What do I mean by “object transformation” rather than “object replacement”.

Consider this code:

class PendingModule:
    ...

class FinishedModule:
    ...

a = C()
b = a

Here both a and b refer to the same object, an instance of a PendingModule.

Suppose we want to replace the PendingModule with a FinishedModule.
To do this we need to find all the references to the PendingModule and update them.
We could attempt to do this eagerly, but that would be very difficult, if not impossible.
So we need to do this lazily, resulting in confusion as some variables refer to
the old PendingModule and some to the new FinishedModule. Specifically in this case,
if we update a, then the relationship a is b no longer holds.

If however, we add a transform method to PendingModule that transforms an instance into a
FinishedModule, then we don’t have to worry about things getting out of sync.

class PendingModule:

    def transform(self):
        with some_lock:
            if self.__class__ == FinishedModule:
                return
            self.__class__ = FinishedModule
            # Do the rest of the transformation

In terms of the PEP, this means that reification should transform the LazyImport object into module, not attempt to replacement it with a module.

This removes the need for special casing LOAD_GLOBAL for var in lazy import var although var in from foo import var will still need special casing.

How this would impact importlib:

Importing a module happens in three phases:

  1. Find the loader
  2. Create a module from the loader
  3. Initialize the module

With transformation, the second step would need to change:

  1. Find the loader
  2. The module already exists, so any loader’s create_module() that returns non-None would fail here for lazy imports
  3. Initialize the module
2 Likes

This is impossible. The final object that we have to transform into can be anything: A direct instance of ModuleType (which can be handled), a custom subclass of ModuleType (which can probably be handled), or any other arbitrary object. We either

  • need to allocate enough memory for the final object when creating the LazyImport - this is impossible since we don’t know how much memory it will take.
  • need to grow the allocation large enough to contain the new object, no matter what it is - this is impossible without changing it’s location in memory.

Your proposal might be able to handle all simple imports of the form lazy import a (but only if a doesn’t insert any custom object into sys.modules['a']. But lazy from a import Bclass, Cvariable couldn’t possibly be handled. IMO this is such a limitation in scope as to be useless.

Later: I now see that your suggestion is to treat lazy import and lazy from ... import .... differently - that’s IMO not good either (and still can’t work because of sys.modules modification), it increases the mental model unnecessarily. Notably it would mean that some modules can be accessed via lazy from, but not via lazy import.


assert a is b inside of the module would still hold - b gets reified the moment the code looks at it. What wouldn’t hold is globals()['a'] is globals()['b'], but neither would globals()['a'] is a (but a is globals()['a'] wouldn’t). IMO this is similar to id(object()) is id(object()) being undefined behavior. Is it a surprising wart? Yes, but it’s an rare edge case most people will never seen.

3 Likes

Thank you so much for taking the time to provide this detailed feedback. I really appreciate you thinking through these edge cases carefully. Let me address your points:

I have to respectfully but firmly push back on this characterization. This feels quite handwavy to me. You’re asserting there are “too many corner cases” but I don’t actually see evidence of problematic corner cases in practice. Most of what you say don’t interfere how people code in Python and any “side effect” will only appear if you are searching for it.Let me show you what I mean:

I think you are misunderstanding something because this is what happens here:

lazy import json
a = json
b = a

print(f"a is b: {a is b}")           # True
print(f"a is json: {a is json}")     # True  
print(f"b is json: {b is json}")     # True

The relationship a is b holds perfectly. All three names refer to the same module object. This works because accessing any of these names triggers reification, and they all get rebound to the same underlying module through the normal name lookup mechanism.

Similar variations also behave similarly:

>>> lazy import json
>>> lazy import json as bar
>>> json is bar
True

and

>>> lazy import collections
>>> lazy_obj = globals()['collections']
>>> other_lazy_obj = globals()['collections']
>>> lazy_obj is other_lazy_obj
True

The only “theoretical” edge cases I can think of is when explicitly accessing globals() before reification:

pythonlazy import json
lazy import json as bar

globals()['json'] is globals()['bar']  # False (different proxy objects)

x = globals()['json']
y = globals()['bar']  
x is y  # True (both resolve to same module)

and

lazy import json
lazy import json as bar

def foo():
    x = globals()['json']
    y = globals()['bar']
    print(x is y)

foo()  # False

# But at global scope:

x = globals()['json']
y = globals()['bar']
print(x is y)  # True

These require deliberately bypassing normal name lookup by directly accessing the globals() dictionary. This is not how anyone actually writes Python code. In every realistic usage pattern, identity is preserved correctly.

All this is expected and on point. Also, this is extremely rare in practice (I would be surprise if there is anyone doing this in the wild with imported modules) and still resolves correctly after assignment. Given that lazy imports are opt-in with different semantics, I believe this is more than acceptable.

We believe this is a much worse option. The transformation approach has fundamental problems that make it unworkable. We’ve already considered and rejected this approach for good reasons:

It’s fundamentally asymmetric. It would work differently for lazy import module versus lazy from module import name, requiring two completely different mechanisms:

lazy import json              # LazyImport → module (transformation might work)
lazy from json import dumps   # LazyImport → function (transformation cannot work)
lazy from collections import defaultdict  # LazyImport → class (transformation cannot work)

The transformation approach immediately fails for from imports, which are a core part of this PEP.

It’s actually more complicated and creates more edge cases. The transformation approach would need complex logic to handle identity preservation, thread safety during transformation, and different code paths for different import types. The replacement approach is simpler and uniform.

It’s already been rejected. We covered this thoroughly in the PEP where we discuss the dict subclass approach and other alternatives. The replacement approach is the design we’ve chosen after careful consideration of all the alternatives.

There is nothing wrong with special casing LOAD_GLOBAL for lazy imports. You’re framing this as if it’s a problem that needs to be avoided, but it’s not. Special casing LOAD_GLOBAL is a perfectly reasonable implementation strategy for this feature.

Moreover, your transformation approach doesn’t even eliminate special casing. By your own admission, from foo import var would still need special casing. So you’d end up with two different approaches.

I appreciate you sharing your view but I have to strongly disagree with this statement. The lazy imports filter is not “bolted on” at all, and I find this critique a bit unfair (or at least the way is expressed). Also we can totally believe that “explicit”, “controlled” and “granular” are good things and also believe that are use cases not covered by the main mechanism that are important but still maintain that we believe the vast majority of users (and library developers) should use the keyword. We’ve covered the rationale for the filter extensively in the PEP and throughout the discussion thread. We not only believe this is an important use case but also as we have said already a bunch of times many people in the discussion (here and in other places) have been consistently asking for more aggressive versions of lazy imports, including making them the default behavior. Since making lazy imports the default isn’t possible (as we learned from the previous PEP and the compatibility issues it would create), the filter function will give application developers the tools to achieve similar results in their own codebases when they need it.

I’d respectfully prefer not to reopen this debate, as we’ve addressed it thoroughly already and I would appreciate if you respect that (although you are totally free to do as you think its best of course). You’re absolutely welcome to express your concerns about this design choice to the Steering Council as its your right, and I’m confident they’ll take all feedback into consideration when evaluating the PEP. But from our perspective, we’ve explained the rationale thoroughly and don’t have new arguments to add beyond what’s already been discussed.

17 Likes

I think you are misunderstanding something because this is what happens here

Yes, I did.
Because name bindings changing when accessed is new and alien. So it is hard to reason about. Whereas object’s changing their nature when accessed is unusual, but not without precedence.

Changing bindings and module dictionaries when observed is problematic. It makes introspection much harder.
How is a debugger supposed to observed the state of a module without changing it?
For example, I can already observe weird effects in pyrepl when autocompleting on a module that lazily imports typing declarations that only exist in stub form: the autocomplete fails without raising causing modules to partially imported as a side effect.

It’s already been rejected. We covered this thoroughly in the PEP where we discuss the dict subclass approach and other alternatives.

I don’t see where in the PEP this was rejected, nor what this has to do with dict subclasses.

If you considered it and rejected it, could you add it to the rejected ideas section explaining why?

You say that the transformation approach has fundamental problems that make it unworkable, but I really don’t see what those are.

The transformation only applies to modules, which is why I said that from imports still need special casing.
In your example of lazy from json import dumps, accessing dumps would trigger reification of json the same as you propose. The difference is the mechanism of reification of json, which would be in place. The variable dumps would still need to be modified.

It’s actually more complicated and creates more edge cases

Let’s leave “more complicated” for now, as that is both subjective and not a problem for the user, just us.

How does it create more edge cases?
Identity preservation is automatic, since the object is transformed, not replaced.

What wouldn’t hold is globals()['a'] is globals()['b']

Yes, thanks for the clarification

That is a good point about inserting objects into sys.modules.
I’m not claiming that transformation has no edge cases, but I still think they are more obvious and easier to reason about than the replacement approach presented in the PEP.

The PEP says nothing about lazy imports in for or while statements, but the reference implementation rejects them. That should be clarified in the PEP.

1 Like

How would this work when the same name is lazy-imported by two different modules, and only later reified?

Because there is only one object. Both imports refer to the same (lazy) object. When reified, the references are unchanged, still referring to the now-reified module object.

I can promise you that these aliens come in peace :wink:

I know this is subjective, but an object changing its entire __class__ when accessed is an order of magnitude more “alien” and hard to reason about than the proposed semantics. But I understand that we can disagree on this and that’s fine. As you will surely understand and respect we will go with whatever we think is the best option.

Via mod.__dict__. Note the reference implementation is not up to date with the PEP yet, but the pep says that calling globals() or accessing a module’s __dict__ does not trigger reification – they return the module’s dictionary, and accessing lazy objects through that dictionary still returns lazy proxy objects that need to be manually reified upon use. This has something we changed based on feedback and its the main method of external introspection.

This is a consequence of the reference implementation not being up to date with the PEP. When __dict__ doesn’t reify as the PEP says this works with no problems:

# x.py

print("IMPORTED x")

# y.py

print("IMPORTED y")

# test.py

lazy import x
lazy import y

import json

def foo():
    ...

# REPL session

>>> import lol
>>>
>>> # Tab completing:
>>> lol.
lol.foo()  lol.json   lol.x      lol.y

# Direct access:
>>> lol.x
IMPORTED x
<module 'x' from '/Users/pgalindo3/github/python/lazy/x.py'>
>>> lol.y
IMPORTED y
<module 'y' from '/Users/pgalindo3/github/python/lazy/y.py'>

I was thinking of this but you are right that this is different enough that requires its own section. We will discuss it and update the PEP. Thanks for pointing it out.

I think this is a bug in the reference implementation, but let me consult with the rest of the team as I may be missing something.

Here is one I can think of:

>>> class PendingModule: pass
...
>>> class FinishedModule: __slots__ = ('x',)
...
>>> a = PendingModule()
>>> a.__class__ = FinishedModule
Traceback (most recent call last):
  File "<python-input-4>", line 1, in <module>
    a.__class__ = FinishedModule
    ^^^^^^^^^^^
TypeError: __class__ assignment: 'FinishedModule' object layout differs from 'PendingModule'
3 Likes

I know this is subjective, but an object changing its entire __class__ when accessed is an order of magnitude more “alien” and hard to reason about than the proposed semantics

I suspect that’s only true for someone who studied quantum mechanics :slight_smile:
Entangled global variables seems weird to me.

accessing a module’s __dict__ does not trigger reification

Ah. I’ve been playing with the reference implementation. That is an improvement. Any plans to update the reference implementation?

The PEP says nothing about lazy imports in for or while
I think this is a bug in the reference implementation

I think you should follow the implementation here. I can break your assumptions about the specialization of LOAD_GLOBAL if you allow it in a loop.

2 Likes
>>> class PendingModule: pass
...
>>> class FinishedModule: __slots__ = ('x',)
...
>>> a = PendingModule()
>>> a.__class__ = FinishedModule
Traceback (most recent call last):
    ...

Yes, you’ll need to make sure that PyLazyImportObject is the same size as PyModuleObject.
The transformation approach already fails for loaders that override create_module(), so we can ensure that VM created modules are the same size as lazy objects.

I’m sorry, but this is the complexity we rejected early on, and it’s so vague I’m not even sure what to add to rejected ideas, to the extent it isn’t already covered in the PEP.

If you want to work on this and turn it into something real, feel free. I’m sure the SC will welcome discussion of your new solution even if this PEP has been accepted. We have time before beta 1, after all.

What is so vague?

You can do this for lazy import x, where you know x will “always” be of one type. (Well, not even that: executing x could end up replacing the object as sys.modules["x"] with some object of a different size.) But I don’t see how you could get this approach to work for lazy from x import y, because then y can be an object of any size. That means this approach doesn’t reduce complexity: now you need both your __class__ changing magic and the PEP’s reification magic.

I’m not interested in reducing complexity, although I don’t see any significant change in complexity of implementation either way. What I want to reduce is the number of corner cases and rough edges for Python developers and C extension authors.

If accessing a module’s __dict__ does not trigger reification of lazy imports therein, that means the PyLazyImportObject could become visible to C extensions, as C extensions often (usually for perceived performance reasons) access attributes via dictionaries, not through PyObject_GetAttr().

What then happens if an extension calls PyObject_GetAttr() on the PyLazyImportObject, expecting it to be a module?
According to the reference implementation, an error will be raised. It is not specified in the PEP AFAICT.

Only if you do weird things because you need to give the object to the C-extension and that will reify it anyway:

>>> from c_extension lazy import extension_func

>>> lazy import mymod
>>> lazy  from othermod import foo

>>> extension_func(mymod.myobj) # mymod is reified before extension_func receives it

>>> extension_func(foo) # foo is reified before extension_func receives it

For this to be a problem an extension module should either receive mod.__dict__ or act directly on the module via PyObject_GetAttr with the module name:

# bar.py
>>> from my_extension_module import weird_function
>>> lazy import foo

>>> weird_function(globals()['foo'])

# baz.py
>>> from my_extension_module import weird_function
>>> import foo

>>> weird_function(foo.__dict__)

This is so rare that’s fine if people need to special case for receiving a lazy import. We already discussed some version of this early in the discussion and the consensus is that mod.__dict__ should not reify. In fact, this is so rare that as far as I understand, none of the lazy import implementation that have been deployed at scale have run into this being a problem.

1 Like

Trying out the reference implementation and wanting to inspect the lazy module objects, I did find this behaviour a bit surprising even if I understand why it’s happening[1].

lazy import tomllib

def demo():
    local_ref = globals()["tomllib"]
    print(f"{local_ref = }")

demo()

ref = globals()["tomllib"]
print(f"{ref = }")
local_ref = <lazy_import 'tomllib'>
ref = <module 'tomllib' from '/home/ducksual/src/cpython_lazy/Lib/tomllib/__init__.py'>

I’m not sure if anything can reasonably be done about this with the current implementation plan reifying any LazyImportType objects on global variable access.


Not sure that’s the right syntax :slight_smile:


  1. Yes, this is because I tried to look at them in the REPL ↩︎

Ah, a man can dream though… A man can dream :slight_smile: [1]


  1. https://www.youtube.com/watch?v=vAUOPHqx5Gs ↩︎

5 Likes