Is this desirable that a module is loaded twice

zale · August 7, 2023, 3:57pm

I met with such a situation that a singleton object implemented by module is instantiated twice when imported in two different other module files. I suspected that it was caused by the uses of two kinds of import statement and therefore making the Python interpreter loaded the module twice, hence two instances of a singleton object.

Here is a simplified version of environment / structure to reproduce:

.
├── main.py
└── package
    ├── __init__.py
    ├── modA.py
    ├── modB.py
    ├── modC.py

main.py

import package.modB
import package.modC

if __name__ == "__main__":
    print('Hello world')

modA.py

a_list = [1, 2, 3]

__init__.py, in order to directly import modA

import sys
import os
sys.path.insert(1, os.path.dirname(__file__))

modB.py

import modA

print(hex(id(modA.a_list)))

modC.py

from . import modA

print(hex(id(modA.a_list)))

Invoking main.py in project directory would give such output

0x7fc579bd5c80
0x7fc579bd5a40
Hello world

You can see that the two memory addresses of a_list are different.

So, modA is imported using absolute as well as relative importing statements. If you may ask why I was doing that, it’s because I learnt and wanted to try relative importing in the middle of development while the project started off using only absolute importing.

In modC.py, by printing sys.modules I can confirm that package.modA and modA co-exist, showing that the same module was loaded twice.

Is this a bug or something else?

kpfleming · August 7, 2023, 4:12pm

It’s “something else”. By injecting the package directory into the path, you’ve made the source files available both as standalone modules and as package modules. Your two import statements are acting differently because of that.

The import syntax in modB is incorrect for a module that is in a package and is trying to import another module from the same package, and it will probably start failing when you remove the path injection from __init__.py.

kknechtel · August 7, 2023, 10:55pm

Modules are cached in sys.modules when loaded, but the cache keys are according to the fully qualified name, as you found.

When modB is imported, because it used absolute import for modA, it has no way to know that modA should be part of the same package. Dots in import paths are symbolic; they represent package structure, not folder hierarchy. So when modB’s absolute import is found in the hacked sys.path, it creates an ordinary modA module from the modA.py source code, not inside any package - because the import statement doesn’t indicate that it should be in a package.

Then, when modC is imported, it will use relative import for modA. This means that the existing package module (packages are also modules, that get imported and cached in sys.modules) will be directly asked for the path to find the modA.py source code, without checking sys.path. The import system knows that this is inside a package, with a fully qualified name of package.modA, again because of the import statement itself. So it can’t use the cached module, because it has the wrong name (modA instead of package.modA), will reload the module, and cache it with the correct name.

This is not much different from the issue where people have a circular (possibly deferred) import of the __main__ module and get two copies: one named __main__ (usually this does not end up in sys.modules) and one named according to the source file name.

Please consistently use relative imports within the package hierarchy. Use absolute imports to access stuff from the project’s dependencies (including the standard library). The structure that you have, where the driver script makes a single absolute import to something within the package, and everything else is relative from there, is a standard practice. It is almost never necessary to hack sys.path this way. (Many popular third-party Python packages do not use it at all, or just once for some special case, even for hundreds of thousands of lines of code.) In development, you just start Python in the default way, from the driver’s folder, and it puts that folder on the path, so the package is visible for the absolute import. When installed (including in a virtual environment for testing), the package is put in the appropriate site-packages folder, which is already on the path (due to the startup work done by site.py etc.).

zale · August 8, 2023, 3:33am

Yes. The first run was with an empty __init__.py and it failed, so I modified sys.path in the file.

So what is the proper way to import things inside package? Is it by using relative importing, like what modC.py did?

zale · August 8, 2023, 4:57am

Thanks. This is a thorough explanation.

Karl Knechtel:

Please consistently use relative imports within the package hierarchy. Use absolute imports to access stuff from the project’s dependencies (including the standard library). The structure that you have, where the driver script makes a single absolute import to something within the package, and everything else is relative from there, is a standard practice. It is almost never necessary to hack sys.path this way. (Many popular third-party Python packages do not use it at all, or just once for some special case, even for hundreds of thousands of lines of code.) In development, you just start Python in the default way, from the driver’s folder, and it puts that folder on the path, so the package is visible for the absolute import. When installed (including in a virtual environment for testing), the package is put in the appropriate site-packages folder, which is already on the path (due to the startup work done by site.py etc.).

This is especially helpful.

I found another file structure which was seen in stack overflow but I can’t find it again.

.
├── app
│   ├── run.py
│   └── some_file.py
└── package
    ├── modA.py
    └── modB.py

I’m not sure if this is a practical case for arrangement, but I can imagine that one may want to collect all related code in one place while other dependencies in another place, even though those dependencies are developed by themselves, for clarity purpose maybe. In this case, how to make importing system work well?

With that said, I found a solution in Structuring Your Project — The Hitchhiker's Guide to Python, where it shows how to handle tests:

test/context.py

import os
import sys
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))

import sample

The hack of inserting sys.path is used here. But test is probably the special case where such hacking is acceptable. As for my original case app, we can also use the same approach, or should we re-construct the architecture?

kknechtel · August 8, 2023, 5:51am

There are many other ways around the problem:

Run the driver script from the root directory (so that when the current path is added automatically to sys.path, it’s the correct one), specifying it as app/run.py. (This means that run will not be able to import some_file either way: it’s not on the path for an absolute import, and the app folder isn’t a package for a relative import.)
Use the PYTHONPATH environment variable to have paths added to sys.path at startup, rather than doing it through your code. This way, you don’t have to worry about it in production (in which case the package is installed, so the extra path entry isn’t desirable). Starting in 3.11, you can also use the -P command-line option for Python to prevent the default path from being added to sys.path. (This is in case you want to avoid import some_file accidentally working.)
Create a virtual environment for each project, such that the package is always “installed” even as you are developing it. (You will likely want to use the -e option for Pip when installing to the virtual environment.) This is my preference, but it’s not the easiest thing for people who aren’t used to “being developers”.
Create a driver that’s part of the package, by putting an if __name__ == '__main__': block inside an appropriate module of the package (let’s say it’s modB.py) and running it (still from the root directory) as a package: python -m package.modB (notice we give a fully qualified module name, not a filename/path). This uses the standard library runpy module behind the scenes. The driver code will not run when modB is imported normally (that’s the point of the if statement). You can also make the entire package runnable this way (python -m package) by giving it a __main__.py (this does not need an if statement, because it will always be true ).

kpfleming · August 8, 2023, 10:43am

The simple way is import .modA.

from . import modA is also valid, although not the preferred way to express the concept

kknechtel · August 8, 2023, 11:37am

$ touch modA.py
$ python3.8 -c "import .modA"
  File "<string>", line 1
    import .modA
           ^
SyntaxError: invalid syntax
$ python3.11 -c "import .modA"
  File "<string>", line 1
    import .modA
           ^
SyntaxError: invalid syntax

Absolute imports cannot start with a .; relative imports have to use the from syntax.

kpfleming · August 8, 2023, 12:02pm

Of course you are right… my fault for replying before breakfast…

kknechtel · August 8, 2023, 12:13pm

To be fair, I definitely have typed it myself before and been annoyed when it didn’t work.

Rosuav · August 8, 2023, 12:20pm

In general, after typing “import X”, you can access the module as X:

>>> import math
>>> math
<module 'math' from '/usr/local/lib/python3.12/lib-dynload/math.cpython-312-x86_64-linux-gnu.so'>
>>> import os.path
>>> os.path
<module 'posixpath' (frozen)>
>>> import importlib.resources.abc
>>> importlib.resources.abc
<module 'importlib.resources.abc' from '/usr/local/lib/python3.12/importlib/resources/abc.py'>

That wouldn’t make sense with a relative import:

import .modA
.modA # SyntaxError

Hence the need to fetch a particular thing from it. I suppose import .modA as spam should theoretically be possible, but most commonly, you’d want import .modA as modA which can be written from . import modA anyway.

zale · August 8, 2023, 2:26pm

I think there maybe something wrong. Say, we are now in the project root, command python3 app/run.py will automatically add the directory in which run.py resides, a.k.a $project/app to sys.path. In this way, run.py does be able to import some_file.

$ ls app
run.py  some_file.py
$ python3 app/run.py 
print from some_file.py

zale · August 8, 2023, 2:41pm

Speaking of import .modA, this was my first attempt . Like what @Rosuav said, the expression after import should be syntax valid, which is also mentioned in 5. The import system — Python 3.12.0 documentation

Absolute imports may use either the import <> or from <> import <> syntax, but relative imports may only use the second form; the reason for this is that:
import XXX.YYY.ZZZ
should expose XXX.YYY.ZZZ as a usable expression, but .moduleY is not a valid expression.

kpfleming · August 8, 2023, 2:47pm

There’s nothing wrong with that, but it doesn’t match the example you had in the first post of this topic.

zale · August 8, 2023, 3:01pm

I made a second question about the situation when drive script is located in a directory that is in the same level to package directory.

.
├── app
│   ├── run.py
│   └── some_file.py
└── package
    ├── modA.py
    └── modB.py

As for my first example, I believe I’ve understood well with all the help from you guys

kpfleming · August 8, 2023, 3:05pm

In a configuration like that, the best way to ensure that the program can find the package is to actually install everything, as opposed to relying on relative directory paths. An editable install using pip install -e . in the top level directory, with a suitable pyproject.toml (or setup.cfg/setup.py), would ensure that modA can be imported using import package.modA, regardless of the location of the code executing the import statement.

kknechtel · August 8, 2023, 8:41pm

Sorry, I misremembered it.

The rule for how sys.path is constructed, is given in the documentation.