Python3.11 - importlib no longer exposes .util

Hi.

With the release of Python3.11 I noticed importlib.util has previously been exposed/available from import importlib. This is no longer the case and a explicit import importlib.util is needed.

Is this the desired behavior?
There’s no mention that I could see of it in Changelog — Python 3.11.3 documentation

Nothing urgent or important, easy fix and it’s slated for removal in 3.12 anyway. But tripped me up a bit.

In all of 3.10.11+, 3.11.3+, and 3.12.0a7+ (locally compiled) in a newly started REPL import importlib; importlib.util fails with AttributeError, so the failure is not new in 3.11. The string ‘util’ does not appear in any code in importlib/init.py, including the definition of all. So there is no reason to expect ‘.util’ to work.

In a freshly started IDLE Shell, the same statement succeeds. This is because IDLE imports importlib.util and leaves the name ‘util’ in importlib when starting the usercode execution subprocess. I consider this a bug to be fixed as it makes user code run during development in IDLE even though it will fail when run directly in python, without IDLE.

Perhaps your memory of previous behavior was from a similar situation of importlib.util having been inadvertently imported by something that ran earlier. Currently, importlib.machinery is in sys.modules at python startup even though it does not appear in importlib.init and likely was imported by something else. Perhaps importlib.util was once similarly present at python startup. The initial content of sys.modules varies from version to version and possibly even with bugfix releases.

2 Likes

I did not import it knowingly.
Taking archinstall.__main__.py as an example:

import importlib
import sys
import pathlib

# Load .git version before the builtin version
if pathlib.Path('./archinstall/__init__.py').absolute().exists():
	spec = importlib.util.spec_from_file_location("archinstall", "./archinstall/__init__.py")
	archinstall = importlib.util.module_from_spec(spec)
	sys.modules["archinstall"] = archinstall
	spec.loader.exec_module(sys.modules["archinstall"])
else:
	import archinstall

if __name__ == '__main__':
	archinstall.run_as_a_module()

This is the entrypoint for python -m archinstall and has been working for well a year or two at least.
Again it’s a small change to implement and fix, but since there’s nothing mentioned in the release notes this threw a curveball into the machinery testing Python 3.11.

(I’ll just mention it so the discussion doesn’t go that direction — yes… this is probably not the best way of importing things via relative paths but it’s been treating me well over the years that i’ve used it so it’s been a personal preference nothing else)

The proper import statement for the above is import importlib.util. Is what exact Python version did the buggy code work?

It worked in 3.9.2:

Python 3.9.2 (default, Feb 28 2021, 17:03:44) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import importlib
>>> importlib.util
<module 'importlib.util' from '/usr/lib/python3.9/importlib/util.py'>

importlib.util can sometimes get brought in by having other modules installed. Not sure if this is the case here but for example if you have sphinx installed in the environment this will work.

Python 3.11.2 (main, Feb 20 2023, 09:32:55) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import importlib
>>> importlib.util
<module 'importlib.util' (frozen)>

(I think this may be related to the sphinxcontrib-* namespace packages but I don’t know if it’s a general thing with having namespace packages installed or specific to sphinx.)

This is true in the REPL as rlcompleter is imported and depends on inspect which depends on importlib.machinery. However it’s not generally true that it’s imported on startup.

$ python -c "import importlib; importlib.machinery"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'importlib' has no attribute 'machinery'

It’s coming from site, and has for at least a couple minor versions:

$ python
Python 3.8.10 (default, Nov 14 2022, 12:59:47) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import importlib
>>> importlib.util
<module 'importlib.util' from '/usr/lib/python3.8/importlib/util.py'>
>>> quit()
$ python -S
Python 3.8.10 (default, Nov 14 2022, 12:59:47) 
[GCC 9.4.0] on linux
>>> import importlib
>>> importlib.util
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'importlib' has no attribute 'util'

It seems to come from some dynamic import setup used for the site-packages.

It does come from site but it does also depend on other installed packages. A clean build of python will not have imported importlib.util.

Python 3.8.10 (default, Apr 11 2023, 13:29:54) 
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import importlib
>>> importlib.util
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'importlib' has no attribute 'util'

System python might though (mine does):

Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import importlib
>>> importlib.util
<module 'importlib.util' from '/usr/lib/python3.10/importlib/util.py'>
1 Like

The only behavior supported in Python is to explicitly import all submodules you depend on. If you want to use os.path, you should import os.path, not import os and rely on the fact that os currently imports it. (Or at least it does on POSIX; for all I know os doesn’t import path on Windows or OS X.)

Some modules may explicitly import all their submodules for you as a documented API guarantee. AFAIK importlib never made that guarantee, in fact quite the opposite: the documentation for it has several examples showing code explicitly importing importlib.util.

On testing the sample namespace packages it seems having a pkg_resources style namespace package installed is a potential cause for importlib.util being imported on startup. pkgutil and native namespace packages don’t appear to have this side effect.

1 Like

os seems a bit ambiguous. On the one hand, the os module doc does not explicitly say that os.path exists on import of os. On the other hand, the one code example with os.path.x depends on the assumption that it does. os — Miscellaneous operating system interfaces — Python 3.12.1 documentation has, simplified,

import os
os.path.join(...)

In addition, ‘path’ is in os.all and os.doc lists ‘path’ as one of the names exported, and is either posixpath or ntpath. Maybe the doc should explicitly say at the top that importing os imports path.

2 Likes

I would like to clarify my view that what you said Terry is my experience.
I was under the impression that you did not have to do explicit imports, and that this is a 50/50 mixed jungle of when to do explicit imports and when to import the root library and use submodules/functions loosely coupled.

I’ve never understood until recent years why it’s preferred to do explicit imports, obviously depending on what initiation logic the library chose to do. Either way, I haven’t in my 10+ years of Python experienced that the language encourages to do explicit imports - neither in the docs or examples.

I therefore personally don’t resonate with:

Nothing changed “in recent years”. Maybe you’ve noticed things like Numpy, which tend to dump everything up front, gaining in popularity. Maybe you’ve also noticed that import numpy in a fresh interpreter session is noticeably slower than most other imports.

It’s right there in the Zen of Python: “Explicit is better than implicit”.
The built-in behaviour of Python is that sub-package imports do not happen automatically; if a library is importing its own sub-package, either that is an implementation detail or something that should be explained in the documentation. The point is that there is more control this way over how much is imported.

The “when” is clear: if you need both foo and foo.bar, then import them both unless the foo documentation explicitly tells you not to bother. It is the same as how, if you need both foo and bar directly, you would import them both, even if it turned out that foo imports bar as an implementation detail (and that you could thus do bar = foo.bar instead).

Python’s module imports are cached, so it costs almost nothing to have “unnecessary” module imports, and it’s both clearer and more foolproof.

Anton, please delete your partial post, completed in this one. To me, your (mis)impression says that a) the docs could perhaps be clearer, and b) what one learns about importing submodules depends on which ones one works with. I knew over a decade ago that os imports path from experience and reading the code. Tkinter, on the other hand, does not import any of its submodules. Since IDLE, turtle, and turtledemo are the only stdlib modules that import tkinter, its namespace does not routinely get augmented, unlike importlib and a few others. And several years ago, I stopped IDLE from exposing its augmentations of tkinter to user code. I can hardly think of any other submodules that I have used. Others experience tends to be quite different.

I’m not familiar enough with this forums mechanics to understand what you mean by partial post and what’s completed in what?

Since the thread became more of a “defend why” - take for instance import datetime which exposes the entire submodules functionality. If that changes in the future I’d expect it to be documented or announced ahead of time.

And I rarely, if ever, use non-stdlib resources. So my experience over the years is purely based on the standard library. Either way, this is me digressing.

I’d like to stay on topic and avoid tangents of personal preferences and recommendations, it takes away from the focus of the changes in the library and the potential issues that arise.

If there’s no issue and this is purely down to which libraries are installed (how that works was a bit of news to me however… I did not know having sphinx installed could trigger imports without even importing sphinx myself) then that’s fine, I guess. If it is an issue I came to ask about it before I submitted a potential bug report that just takes up space in peoples working schedule.

I see: The 4th and 5th posts above by you start by quoting me saying “import os”. The first ends prematurely with ‘examples.’ and can be deleted. The second continues with ‘I therefore…’ and ends with a quote from Larry and should be kept.

Thanks, didn’t spot that! I’ve deleted that specific post :slight_smile:

Yes this was surprising to me, but certain types of namespace package being installed can change the launch imports. Sphinx just happens to have some of these packages (some of the sphinxcontrib-* packages - but not all of them). Sphinx itself doesn’t get imported but the namespace packages existing causes some additional code to run which brings in importlib.util.

The side effect of this is when you import importlib, importlib.util has already been imported and so is available. Using pkgutil as a convenient module that imports importlib.util It’s essentially this kind of behaviour.

>>> import importlib
>>> importlib.util
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'importlib' has no attribute 'util'
>>> import pkgutil
>>> importlib.util
<module 'importlib.util' (frozen)>

So it’s not that importlib intentionally exposed importlib.util on import, but that importlib.util being imported somewhere else made it available when you imported importlib.

My guess would be your Python 3.11 install either doesn’t have whichever module was causing this or that module has been updated to use a newer form of namespace package which no longer has this side effect.

Since people might otherwise get confused: pedantically, importlib itself has also already been imported. It just hasn’t yet been bound to a name in the current context. In order to import importlib.util, Python found the source for the importlib package, loaded that, stored it in the module cache (accessible via sys.modules); then also found, loaded and cached the importlib.util module, and finally actually assigned that module as the util attribute of importlib.

That’s how it can already be accessibly from the importlib module object after import importlib: both were already loaded and the attachment was made; the import statement is just grabbing it from the cache.

In other words,

this example still works if we import importlib after import pkgutil:

>>> import pkgutil
>>> importlib.util
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'importlib' is not defined
>>> import importlib # but if we define it, via the import statement...
>>> importlib.util # then it does have the attribute:
<module 'importlib.util' from '/usr/lib/python3.8/importlib/util.py'>