Issue with Module Import and Role of __init__.py in Python 3.9

Subject: Issue with Module Import and Role of __init__.py in Python 3.9

Dear Python Community,

I’m currently facing an issue with module import in my Python project, and I’m hoping someone can help me understand the underlying concepts better.

Environment Details :

  • Python version: 3.9.18 with Anaconda
  • Operating System: Linux

I created a test project with the root directory named test_import to reproduce the problem. When I ran the following code to print the sys.path list:

import sys
for path in sys.path:
    print(path)

The output was:

path/test_import/src
/root/anaconda3/envs/pytorch/lib/python39.zip
/root/anaconda3/envs/pytorch/lib/python3.9
/root/anaconda3/envs/pytorch/lib/python3.9/lib-dynload
/root/anaconda3/envs/pytorch/lib/python3.9/site-packages

Here’s the structure of my project:

test_import
└── src
    ├── datasets
    │   ├── file1.py
    └── main.py

In the main.py file, I have the following import statement:

import datasets.file1

However, when I directly run python main.py, I get the error ModuleNotFoundError: No module named 'datasets.file1'

After investigating, I found that there is a datasets file in the /root/anaconda3/envs/pytorch/lib/python3.9/site-packages directory.

I managed to solve the problem in two ways:

  1. By adding an __init__.py file in the datasets directory of my project.
  2. By renaming the datasets directory in my project to avoid the conflict.

Now, I have a few questions:

  1. In Python 3, the __init__.py file is not considered necessary. So, what is the specific role of an empty __init__.py file in Python 3?
  2. The path/test_import/src path in the sys.path list comes before the /root/anaconda3/envs/pytorch/lib/python3.9/site-packages path. Why does the file in the latter path interfere with the normal import from the former path?

Moreover, I conducted an additional test. I added the statement sys.path = sys.path[:-1] in the main.py file of my original test project to remove the /root/anaconda3/envs/pytorch/lib/python3.9/site-packages path from the sys.path list. After doing this, I was able to import datasets.file1 without any issues.

I would greatly appreciate any insights, explanations, or suggestions you can provide regarding these questions.

Thank you!

Best regards.

You are encountering namespace packages, see PEP 420.

Specifically they have weird priority interactions if there is both a normal package and a namespace package on the path, which is explained in the PEP.

1 Like

Hello,

I recreated your project architecture. I did not need to have the __init__.py modules in place nor did I get a ModuleNotFoundError error. Maybe something else went astray in your coding.

It was required to designate a directory a package directory for package imports. This is no longer a requirement as you stated.

I don’t think that you want to do this as the site-packages directory may or may not contain modules that you may need for your project (say if you update it or make changes to it at a later date). Not a good idea to remove packages that come with your Python download.

Here is a good video that explains both absolute and relative imports that can help in the understanding of the subject matter:

Thank you very much for your suggestion. According to your suggestion, I have done some new tests and made new discoveries.

This test example uses the following directory structure:

.
├── dir1
│   └── dir
│       └──file1.py
├── dir2
│   └── dir
│       ├── file2.py
└── main.py

In the main.py file,first, to make sure that no other paths interfere, I clear the sys.path path. Then, we add the ./dir1/and ./dir2/directories to sys.path.

import sys

sys.path = []
sys.path.append('./dir1/')
sys.path.append('./dir2/')

import dir.file1
import dir.file2

In this case, as described in the PEP 420 documentation, no errors occurred.
However, I simply added the __init__.py file in the ./dir2 directory without changing the original condition, and I get the error ModuleNotFoundError: No module named 'dir.file1'.

.
├── dir1
│   └── dir
│       └──file1.py
├── dir2
│   └── dir
│       ├── file2.py
│       └── __init__.py
└── main.py

Can you give me some more insights, explanations, or suggestion?

Hello, I sincerely appreciate your attention to my problem and your efforts in attempting to reproduce the bug. I have furnished more implementation details in my subsequent responses to Cornelius Krupp. If it is not too much trouble for you, would you kindly make another attempt to reproduce the issue based on this new information? Your feedback is of great significance as it will be instrumental in my endeavor to identify and resolve this problem.
Thank you once again for your invaluable assistance.

I am not really sure what open question you still have. You found a small isolated example that exactly reproduces the behavior and the PEP I linked explains the mechanism behind it. To resolve your original issue you now need to eliminate one of the factors: either add an __init__.py file to your package directory, rename it or make sure the one in site-packages doesn’t exists/isn’t visible. I don’t know enough about your usecase to know what the best option is.

1 Like

Thank you very much for your reply. I have thoroughly read the content in the PEP 420 document. However, I still remain confused about the two questions I raised above.

Due to my limited programming proficiency, I might not have extracted the effective answers from this document. I am earnestly looking forward to receiving more assistance from you.

Hi,

this is what I have:

Desktop
|__ test_import
     |___src
         |__ datasets
         |      |__ file1.py
         |__ main.py

In main.py, I have:

import datasets.file1 as ds

ds.some_func()

and in file1.py, I have:

print(r'I am in test_import\src\datasets\file1.py')

def some_func():
    print('Test calling the function in file1!')

As you can see, I don’t have any __init__.py files in my project as there is no need for them. When I run from main.py, it works as expected and I get the following output:

>>> I am in test_import\src\datasets\file1.py
    Test calling the function in file1!

I am using Python v3.13 (Windows 11 Pro). The site-packages package does not have any modules named datasets. Can you try upgrading to Pythonv3.13 and see if that helps (v3.9 was released in 2020 btw).

Note that there is no need to rename the datasets directory in my example.

an fyi …

If you really want to learn about importing packages, the book Learning Python by Mark Lutz devotes four chapters to it. Both absolute and relative imports are explained in detail.