Reliable way to check if a C module is a package

While working on attributes auto-completion in PyREPL (ready for review, btw!), I noticed that the math module is not considered as a package despite its brand-new math.integer submodule:

>>> import pkgutil
>>> next(m for m in pkgutil.iter_modules() if m.name == "math")
ModuleInfo(module_finder=..., name="math", ispkg=False)

and thus from math.<TAB> doesn’t suggest math.integer :pensive_face:

That’s also the case for os and sys, the two other builtin modules with documented submodules, and I guess it’s expected per the glossary definition:

Technically, a package is a Python module with a __path__ attribute.


os.path got special-cased[1], but before adding math.integer to the list, I wondered if anyone more versed in the importlib / C modules business knew a reliable way to detect such cases, and to extract the summodules names?

That would reduce the risk of omitting to add a new submodule in this list – especially given the current push for more namespacing! – and potentially benefit trird-party extension modules too.

(I wrote a test that scan the stdlib and detect these cases[2], but that’s not the cleanest thing :sweat_smile:)


  1. alongside a few other modules, but for other reasons ↩︎

  2. (by recursively importing all modules, inspecting they attributes for submodules and comparing that with pkgutil-discovered modules) ↩︎

3 Likes

Without importing the top-level module, no, there is no way to detect this and it needs to be hardcoded.

Once the module does get imported it’s possible by looking at sys.modules for looking for the prefix math..

os, math are not packages. Being a package is fully independent to having importable “submodules”.

1 Like

Thanks!

So to be clear, it is not possible for a C module to have submodules that don’t get imported alongside their parent? That may quite simplify the test I was talking about!

But then, how does the import machinery even know if a C submodule is legal to import? I suppose it does not just check for an attribute with that name on the parent module, otherwise import os.abc would work :thinking:

Indeed:

>>> from math.integers import comb  # note the s at integers!
Traceback (most recent call last):
  File "<python-input-11>", line 1, in <module>
    from math.integers import comb
ModuleNotFoundError: No module named 'math.integers'; 'math' is not a package

That error message is not really helpful, but I’m not sure if we can do really better :confused:

What you are probably thinking of as the “import machinery” is being bypassed: import os.path as a first step looks up os.path in sys.modules. This succeeds, so nothing else matters. [1].

Only if that checks fails does the actual machinery get involved: It asks the loader of os if os has a submodule named path. Since os is not a package, this fails. How exactly this works is a bit complicated, but yes, it definitely doesn’t just check for an attribute being present. (Although it existing as an attribute is guaranteed by the default mechanism afterwards)

Note that os.py is not a C module, it sounds like you weren’t quite aware of that. This entire machienery is also not dependent on that.

I am also not sure that it’s a true that a C module can’t be a package - none in the stdlib are, but I think it would be possible to construct one. (by adding a valid __path__ attribute and maybe doing some other stuff). But at that point you are significantly outside the normal import machinery and not supporting these edge cases is ok - I have never heard of an extension module doing that (except maybe Cython? Not sure how those work).

Long term it might be worth investigating if packaging metadata could be interrogated to support some edge cases - I don’t have an overview of what is currently being stored while the package is installed.


  1. technically, os doesn’t even have to have path as an attribute, although that would be confusing ↩︎

1 Like

Ah of course, because it is explicitely added by the parent module when imported!

But then I suppose pkgutil would be able to detect sumbodules statically, so for my use case it’s not an issue :grin: I really should have said “import sumbodules from a non-package” instead of “from a C module”.

And here is the PR for math.integer: gh-69605: Add math.integer to PyREPL module completer hardcoded list by loic-simon · Pull Request #144811 · python/cpython · GitHub

Not unless the module is already imported: The C module would make itself a package during execution.

But a module, and maybe even more so a C module can do a lot of “interesting” stuff during initialization that just cannot be captured by static analysis which is what you are trying to do here. We can do our best to support the stdlib special cases, and if it turns out there are common patterns in third party code that aren’t supported we can look at those - but we will never capture everything without importing the top level module, and even that might not be enough.

E.g. see the python-java bridges that bring the full java namespace as a dynamic importable package tree into python. Those are quite interesting for auto-complete because names can be very long and difficult to remember/easy to mistype, but I don’t think we will be able to provide proper support without some cooperation from those modules. But I would wait for user requests before considering this.

1 Like