Why do relative from-imports also add the submodule itself to the namespace?

Hi,

Here is a common idiom found in many packages. For example in asyncio/__index__.py:

# This relies on each of the submodules having an __all__ variable.
from .base_events import *
from .coroutines import *
...

__all__ = (base_events.__all__ +
           coroutines.__all__ +
           ...)

When using this idiom, I’ve been always asking myself why the submodules themselves also end up added to the namespace. Absolute imports do not work in this way. For example, if we were to add from re import *, and we tried to also append re.__all__, we would get an error that re is not defined.

Despite a fair effort, I was not able to find any documentation for this behavior.

I’m simply curious. :slight_smile:

2 Likes

This is a quirk of the way that packages work. Consider this from an external perspective:

import asyncio
import asyncio.base_events
from asyncio.base_events import BaseEventLoop

After the first statement, one would expect the name asyncio to exist in the current namespace. Well and good.

After the second, one would expect that (a) asyncio has not been reassigned; (b) asyncio.base_events is now valid; and (c) any future import of asyncio.base_events will use the same module (it won’t be reimported more than once).

The third statement should give you the class out of the same module that the second line imported. If you switch lines 2 and 3 around, it’s clear that the from-import MUST cache the imported module, as otherwise you’d run into problems where a class doesn’t correctly match (because it’s not the same class, it’s another thing with the same name).

So you should be 100% confident that this code will work:

import asyncio
original = id(asyncio)
from asyncio.base_events import BaseEventLoop
assert id(asyncio) == original
assert asyncio.base_events.BaseEventLoop is BaseEventLoop

From which it should be clear that asyncio.base_events is indeed guaranteed to be set after the from-import.

The one small piece left in the puzzle is: Inside __init__.py, you’re running in the package’s namespace. That means that your globals ARE the attributes of the asyncio object, and adding asyncio.base_events must also populate your globals with that - because they’re the same thing.

So it’s a little bit weird when you see it, but it’s a consequence of otherwise-logical design decisions, and it can’t really be any other way without breaking something :slight_smile:

2 Likes

Thanks for the quick reaction. And sure thing about modules being imported only once in Python.

But from math import * (not that I recommend writing such code) also does not add the name math to any namespace. For sure the math module is imported, but it’s cached somewhere “invisible”.

It is not clear to me what design decisions lead to from .base_events import * having to make the base_events name appear inside the asyncio top level namespace.

Look back at my example that ends with a pair of assertions. Do you agree with each step in it? For instance, if you’ve already imported asyncio and then you do other imports, is it correct that the asyncio object should remain the same object?

Since the asyncio package remains the same object, and the base_events module has to become accessible from it (since any import of asyncio.base_events MUST return the same module), it has to be true that the module is available from the package. That’s where the surprising part happens; you are in the package when you’re seeing this. Running “from math import *” isn’t adding a module to a package, thus it doesn’t have to behave this way. It’s a phenomenon that only occurs with the importing of a module that’s foundi n a package, and must therefore be cached as part of that package.

Hopefully that makes it a bit clearer.

Thanks for you patience.

In your example, I agree that asyncio should remain the same object. But I believe that this would be the case even if from asyncio.base_events import BaseEventLoop would not create the asyncio.base_events binding - just like from math import cos does not create the math binding.

(Actually, I find the way things work very slightly problematic. It seems justified to see asyncio.base_events as a mere implementation detail of asyncio: it is imported by default together with asyncio and its entire public API is exposed through asyncio. As far as the public API is concerned, it could be renamed. There seems to be no technical reason to expose this implementation detail.)

In an alternative reality (I’m not suggesting any changes, just trying to understand), creating the base_events binding under the asyncio namespace could well require an explicit import asyncio.base_events (or inside asyncio.__init__.py: from . import base_events). The asyncio module object would still be the same before and after import asyncio.base_events. Just like scipy is the same before and after import scipy.optimize (that submodule is not imported by default together with top level SciPy).

Here is a related phenomenon. Consider the following session:

>>> import math
>>> del math
>>> import math
>>> math
<module 'math' (built-in)>
>>> 
>>> import asyncio.base_events
>>> del asyncio.base_events
>>> import asyncio.base_events
>>> asyncio.base_events
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'asyncio' has no attribute 'base_events'

Why does the second import math recreate the math binding (for sure the module is not reimported), but the second import asyncio.base_events does not? What principles of the language would be broken if it did?

Yep. In our current reality, that happens automatically, which is incredibly convenient (it allows some modules to be loaded automatically and others to be loaded only when they’re needed, with end users not needing to know the difference); but you’re right, that basically is what happens.

The module isn’t re-executed, but every import statement fetches the module and gives you a local reference to it. That’s why, if you import math in one module and import math in another module, you’re looking at the exact same math module in both of them.

In effect, import math means:

  1. Is there a sys.modules[“math”]? If so, skip to step 5.
  2. Search for a matching module, using sys.path, sys.meta_path, etc.
  3. Create a module called “math” and put it into sys.modules
  4. Run the code of math in the context of that module
  5. Set math = sys.modules["math"]

(This is a massive MASSIVE simplification, but broadly right.)

It doesn’t matter whether the module was previously imported here or not; the last step will always happen. In fact, you can do some fun stuff:

>>> import sys
>>> sys.modules["wut"] = 42
>>> import wut
>>> wut
42
>>> from wut import as_integer_ratio
>>> as_integer_ratio()
(42, 1)

Duck typing: never mind about whether it quacks properly, if it’s in sys.modules, it must be a module! :slight_smile:

The rules are very slightly different with packages, though. (I’m going to assume a directory package here; others work similarly.) Since http is actually a directory that contains __init__.py, you can do this:

>>> import http
>>> http.client
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'http' has no attribute 'client'
>>> import http.client
>>> http.client
<module 'http.client' from '/usr/local/lib/python3.12/http/client.py'>

Note that the error I’m getting here is AttributeError. That means that, in between the two attempts to look up http.client, something has to add an attribute to the http object called “client”. Since the http object’s namespace IS the global namespace for the http module, that means that client has to have appeared in the global namespace.

The other example you gave, though, is mutating the asyncio class. For the same reason that adding a module to the package has to mutate it, deleting from the package also has to mutate that. The import succeeds because sys.modules["asyncio.base_events"] exists, but you’ve broken the normal patterns, so the attribute remains deleted. Definitely not something you’ll often see, though.

I am aware of this: I mentioned scipy.optimize which behaves in the same way.

I understand that while import math mutates the local namespace only, if import asyncio.base_events was to behave in the same way and always recreate the asyncio.base_events binding, it would have to mutate a non-local namespace that is visible in other modules.

But then import http.client or import scipy.optimize also mutate the global namespace of the relevant modules and this is not seen as a problem.

But OK, I think I begin to understand: Imagine I choose to monkey-patch scipy by first importing scipy.optimize and then binding this name to something else, e.g. scipy.optimize = my_optimize_module. Then, if some unrelated third party module that I happen to import executes import scipy.optimize, it would undo my monkey-patching if the semantics of submodule imports were to behave in the alternative way that I sketched above.

This is an obscure way of doing things, but I can see how it could be a problem, so point taken. Can it also be a problem in less obscure ways?

The connection to my original question is perhaps as follows: in order to avoid the above problem, the decision was taken to create the binding of a submodule when and only when it is imported for the first time. There is no differentiation by what statement the original import is triggered. (Indeed, from . import submodule as foo introduces both submodule and foo bindings!) And so, as a consequence, from .submodule import something also adds the submodule binding.

Well, thanks again! I think I learned… something.

That’s what’s important, right? Learn… something. Doesn’t really matter what, or how useful, just learn anything whatsoever :smiley:

But is this behavior documented somewhere? I bet that I’m not the only one to be surprised that from . import submodule as foo also introduces a submodule binding.

Not sure, but it’s a consequence of other documented behaviour.