Why standard lib LazyModule checks if module was substituted during lazy load?

EvgeniiEltyshev · January 10, 2025, 5:35pm

Hello,

This is my first post on this forum, so I might not know all the customs, sorry for that.

My question is regarding _LazyModule implementation in 3.13. It has a check that the current module hasn’t been substituted in sys.modules in the process of loading[1], however standard, eager imports don’t have such a check [2]. From reading the lazy loader and module implementation it doesn’t seem that this invariant is critical to lazy imports themselves. So this might be just a generic check to prevent multiple module objects existing for the same module name?

[1] cpython/Lib/importlib/util.py at 3.13 · python/cpython · GitHub
[2] I infer that the standard import system doesn’t have such check because cases like cryptography/src/cryptography/utils.py at main · pyca/cryptography · GitHub exist and seem to be working

EvgeniiEltyshev · January 10, 2025, 5:36pm

Looks like this check was added by @brettcannon a long time ago, do you happen to remember why?

effigies · January 10, 2025, 6:02pm

See:

github.com

python/cpython/blob/3.13/Lib/importlib/util.py#L222-L223


      
          if isinstance(self, _LazyModule):
              object.__setattr__(self, '__class__', __class__)

After completing the module load, the lazy loader sets the __class__ attribute to the original class type so that _LazyModule is not triggered again. If sys.modules[original_name] is not self, then updating the __class__ of self would not affect the object in sys.modules.

If a module is lazily loaded, then it is probably an object with a name in some namespace, e.g.,

numpy = lazily_load('numpy')

When I call numpy.array, I not only want to finish loading numpy and access the array function, I want every future access of numpy to go directly to the fully loaded module and not continue to attempt to finish loading it each time. Therefore, the object itself needs to be updated, and we can’t just update sys.modules['numpy'] = __class__(self).

Further, I want numpy, in every namespace it happens to be imported into, lazily or eagerly, always to be the exact same object.

If lazily_load('numpy') does not set sys.modules['numpy'] to the _LazyModule, you could imagine ending up in a situation where:

np1 = lazily_load('numpy')
import numpy as np2
np1 is not np2
assert isinstance(np1.array([]), np2.ndarray)  # Boom

A more realistic scenario would be:

import libA  # <- Lazily loads numpy
import libB  # <- Eagerly loads numpy

libB.performOp(libA.makeArray())  # isinstance(arr, np.ndarray) might not be true

I’m not sure how easy it is to actually produce this situation; there may be other reasons besides this guard that it doesn’t happen.

EvgeniiEltyshev · January 10, 2025, 6:19pm

Thank you for a thorough explanation, that makes sense!

I see 2 separate issues here: the resetting __class__ attribute and the isinstance one. The former is clearly specific to this lazy import implementation, but the latter can happen with standard eager imports, right?

effigies · January 10, 2025, 7:52pm

I think the __class__ thing is probably necessary because the objects cannot be exchanged. I suppose you could try to come up with another implementation, but you would need to achieve the same effect of modifying the module object in-place.

And yes, isinstance can fail on eager imports if you import a module two separate times. Consider a module example_lib.py:

class A: pass

>>> import importlib
>>> import example_lib
>>> a = example_lib.A()
>>> importlib.reload(example_lib)
<module 'example_lib' from '/var/home/chris/Sandbox/example_lib.py'>
>>> isinstance(a, example_lib.A)
False