How exactly does init.py influence module search order?

dragondive · March 12, 2023, 7:05am

The documentation on Packages says this:

The __init__.py files are required to make Python treat directories containing the file as packages. This prevents directories with a common name, such as string, unintentionally hiding valid modules that occur later on the module search path.

The documentation on Module Search Path says this:

When a module named spam is imported, the interpreter first searches for a built-in module with that name.

To properly understand this, I tried a few experiments:

Create a local directory string in the current directory c:\WORK\dragondive\heavens-arena\python\modules (from where python is called). Then do:
```
import string

print(type(string))
print(string.__file__)
```
Output:
```
<class 'module'>
C:\Users\aravi\AppData\Local\Programs\Python\Python311\Lib\string.py
```
In the above local directory, create an empty __init__.py. Run the same code again:

Output
```
<class 'module'>
c:\WORK\dragondive\heavens-arena\python\modules\string\__init__.py
```

My sys.path looks like this:

PS C:\WORK\dragondive\heavens-arena\python\modules> python main.py
[‘C:\WORK\dragondive\heavens-arena\python\modules’, ‘C:\WORK\python’, ‘C:\Users\aravi\AppData\Local\Programs\Python\Python311\python311.zip’, ‘C:\Users\aravi\AppData\Local\Programs\Python\Python311\Lib’, ‘C:\Users\aravi\AppData\Local\Programs\Python\Python311\DLLs’, ‘C:\Users\aravi\AppData\Local\Programs\Python\Python311’, ‘C:\Users\aravi\AppData\Local\Programs\Python\Python311\Lib\site-packages’]

Questions

How exactly does the presence of an __init__.py cause the Python interpreter to “prefer” the local string over the built-in string?

The module search path documentation says built-in modules are searched first, but the packages documentation says __init__.py prevents hiding valid modules that occur later on the module search path.

Does this imply that regular packages are loaded before modules?

This Stack Overflow answer (from 2011) suggests so, but there’s also some disagreement in the comments.

Is the expected behaviour documented somewhere explicitly?

komoto48g · March 12, 2023, 4:10pm

I am not sure I can answer your question correctly.

But pkgutil, for example, does not recognize a directory without __init__.py as a package.
My guess is that the directory containing __init__.py takes precedence.

By the way, I am interested in your situation and I experimented in the following environment:

~/workspace/
    string/
        __init__.py
        myscript.py

I got:

C:\temp\workspace>py
Python 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
  File "C:\usr\home\.py", line 12, in <module>
    from PIL import Image
  File "C:\Python310\lib\site-packages\PIL\Image.py", line 30, in <module>
    import logging
  File "C:\Python310\lib\logging\__init__.py", line 28, in <module>
    from string import Template
ImportError: cannot import name 'Template' from 'string' (C:\temp\workspace\string\__init__.py)
>>>

(EDIT: The line from PIL import Image is written in the PYTHONSTARTUP file.)
So, logging and other libraries can conflict with the local string package.
If you plan to share your library with others, I strongly recommend that you do not use package names that conflict with the standard library.

dragondive · March 12, 2023, 5:53pm

@komoto48g Thank you for your response.

Indeed, I will not use this code in a production environment. Nonetheless, it is important for me to understand why. This is also frequently helpful to debug problems when reusing code from multiple “internal” projects, which can get messy quickly. Especially with a language like Python which makes changes across versions, that are not always fully backward compatible.

The documentation (and the information available online, for example, on Stack Exchange sites and Reddit forums) isn’t satisfactory in my view. “Just create an __init__.py, it makes the directory a package and prevents conflict later” (as the documentation suggests) isn’t sufficiently reliable explanation.

I hope someone who has worked on the “behind the scenes” implementation and/or documentation can shed more light on this.

dragondive · March 13, 2023, 6:08am

I found the most clear explanation on this topic here from Victor Skvortsov: Python behind the scenes #11: how the Python import system works (tenthousandmeters.com). In particular:

How does it work? When Python traverses path entries in the path (sys.path or parent’s __path__) during the module search, it remembers the directories without __init__.py that match the module’s name. If after traversing all the entries, it couldn’t find a regular package, a Python file or a C extension, it creates a module object whose __path__ contains the memorized directories.

The initial idea of requiring __init__.py was to prevent directories named like string or site from shadowing standard modules. Namespace package do not shadow other modules because they have lower precedence during the module search.

This explanation resolves my queries. Although it’s not clear if this behaviour is officially documented. I wrote to Victor thanking him for the explanation, and he pointed me to PEP 420 – Implicit Namespace Packages | peps.python.org for further reading. I will take my time to understand the PEP and update here what I learn.

kknechtel · March 13, 2023, 7:55pm

First observation: Python treats packages as a kind of module. They’re cached in sys.modules and represented with the module type (not even a subtype). Whereas the attributes of an “ordinary” module (created by importing a .py file, for example) are the global names in the top-level code, the attributes of a package are its contained modules - as well as anything defined in __init__.py, as a special case.

Second observation: Python doesn’t actually create packages (rather, module objects representing packages) “from” the folder. It creates them according to the rules of the import system. With the filesystem-based importer, there are two ways this happens: explicitly by executing code from a .py file (or cached .pyc bytecode, etc.), or implicitly, from nothing at all (a namespace package, the need for which is deduced from the import statement syntax).

There is no contradiction.

Built-in modules (which does not mean “standard library modules”, but only the ones listed in sys.builtin_module_names - ones which the reference implementation implements in C) are checked before any attempt to look for source code in the places mentioned in sys.path.

The requirement to use __init__.py to make Python treat the directories as packages, is so that e.g. the standard library string module (which is not built-in) won’t be unavoidably hidden unintentionally by a local folder called string.

Because of the requirement, the string folder doesn’t prevent the standard library string from being imported, all by itself. But if an __init__.py is added, the folder will be recognized as a package; and since its containing folder appears (by default) earlier on sys.path than the standard library folders (deliberately placed at the end, so that this option exists), import string will import that package.

“Explicit is better than implicit”, as they say. (Something the documentation writer could take to heart; it’s not very clear what “this” refers to in the first quotation.)

I tried many ways to write this explanation, because there are a lot of ways to customize the Python import system. Ultimately, though, I think it’s better here to focus on the important parts of the one specific situation you tested.

The string folder can’t create a module at all. If there is a __init__.py file, Python will find that, and create a module from it, before it looks at the standard library. Python already “preferred” this path.

Without __init__.py, after making every possible attempt to identify string as an ordinary module, Python concludes that string must be the name of a package instead. During the search for source code, Python will have found and remembered the string folder, and because it found such a folder, it will create a string “namespace package”.

The details of this process are described in PEP 420 as you already linked. I think this constitutes explicit documentation. You may also be interested in the non-tutorial official documentation for the import system.

How exactly does __init__.py influence module search order?

How exactly does init.py influence module search order?