Packages in standard library

I am trying to get my head around packages, subpackages and modules and I started reading the Python Packaging User Guide.

I think I understand a package is a directory that contains a __init__.py file, and any subdirectory in the package that contains an __init__.py is a subpackage. Every .py file is a module.

Assuming my understanding is correct, I was trying to understand what I am using when I am using the standard library. For example is os a module or a package? is os.path a subpackage of os? It turns out that os seems to be a module, since under Lib there is a os.py file and Lib is not a package (not having an __init__.py).

So the questions are:

  1. Is my understanding around packages/subpackages and modules correct?
  2. What is os?
  3. What is os.path?
  4. Is there a logic around the layout of Lib?

Thanks!

__init__.py files are not necessary (since Python 3.3), but they have consequences that are usually considered beneficial. Aside from that you have it right, except one more thing you need to know:

What you miss is that packages are modules. Python represents packages as objects just as it does modules (which is why they can be named in your program), and their object type is the same as for modules (it’s not even a subclass).

Generally, if you import x.y, first Python will locate x and create a module for it - if Python found a folder named x, then it looks for __init__.py and uses it to populate the module, but otherwise it’s perfectly happy to create an “empty” module. A module created from a folder (including in the __init__.py case) will get a __package__ attribute that speeds up future imports for the package contents (it remembers the path to x, so it doesn’t have to be looked up again). This is also what allows relative imports to work.

Then Python locates x.y and gives it back to you: If x already has a y attribute which is a module, you get that. Otherwise, it tries to find the y.py source inside the x folder (using the cached __package__ path). (The full picture is more complicated than this, but importing from source code is the common case.)

os is implemented as a module, by the os.py file. When you import it, it checks your operating system, and imports a corresponding implementation module, which it names path internally. Since the code inside os.py had a global variable named path, that becomes the path attribute of the os module - that’s how modules work.

There is nothing particularly special about it. Things are organized according to the individual needs of each module (including the packages).

3 Likes

Do note that os.path is a weird outlier - most modules don’t do things like this to pretend to be a package. A better example might be json - that’s Lib/json/__init__.py, which has a bunch of submodules- encoder, decoder, scanner and tool.

4 Likes