Is there a specification of what path-based finders are included by default and how they operate?

anselmschueler · June 26, 2024, 9:06pm

I’ve looked through these pages in the reference:

The import system
Command line and environment
The initialization of the sys.path module search path
sys — System-specific parameters and functions
site — Site-specific configuration hook

I couldn’t find any specification of how the default path-based finders and their loaders operate (e.g. what gets considered a namespace package).

kknechtel · June 27, 2024, 6:37am

You wouldn’t find it in documentation of “The initialization of the sys.path module search path”, because the finders are on sys.meta_path A “path-based finder” is, after all, the thing that gives meaning to the sys.path strings. In case you skipped over that entry in the sys documentation, here’s an anchor:

We can easily see the defaults:

 python
Python 3.12.4 (main, Jun 24 2024, 03:28:13) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.meta_path
[<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]

The last one is the one that actually handles sys.path. _frozen_importlib_external is really importlib._bootstrap_external, and we can find the PathFinder class there. Basically, it handles iterating over sys.path, and for each entry it either uses a cached loader for that folder or creates and caches one.

I’m not sure what you’re hoping to find. “namespace package” isn’t actually a Python-level concept: packages and modules (using the latter in the documentation sense) are represented by the same module type, and that type doesn’t directly encode anything like an __is_namespace__ attribute. Instead, namespace packages may have more than one element in __path__ (the loading of a regular package implies that only a single path could be present - but namespace packages are why __path__ is a list of strings instead of just a string), and don’t have a __file__ (because there’s no __init__.py for it to refer to).

If you’re looking for the details of how namespace packages are loaded (by the same loading process as regular packages!), that’s in the PEP that introduced the feature.

If there are more things you want to know besides that “e.g.”, I can’t guess what they are.

anselmschueler · June 27, 2024, 8:27am

You misunderstand. I am talking about the path-based finders used by the path-based meta-finder. And I know I can list them with sys.path_hooks, but that doesn’t describe how they work. And namespace packages are a real thing according to the documentation: A path-based finder requests the creation of a namespace package by setting the spec’s submodule_search_locations.

You say that namespace packages are just packages, which I understand. The question is when namespace packages get created.

For instance, based on this documentation, it is entirely possible that the logic is as follows:

Check if the path contains a file called rot13.py
If so, import rot13 from that file and call it on the path to be searched.
Search the result (which must be a path) for a file called MontyPython.toml
If that file is not valid TOML, and there is a file called __del__.py, rename that to the module name.
If that operation overwrites any existing files, throw ModuleNotFoundError.
Create a virtual disk drive device and store a symlink to the moved file there.
Call curl with the first line of the file as its argument.
If curl exits successfully, terminate the interpreter.
Finally, execute the file whose name is the module name followed by (1).

MegaIng · June 27, 2024, 8:59am

Well, as @kknechtel pointed out, read the PEP, like the documentation for namespace packages is telling you.

anselmschueler · June 27, 2024, 9:45am

Ok. I was hoping there was some single document that detailed the import system.
I am also interested in how non-namespace packages get created. I find this is also not clearly stated and in this case I don’t think there’s a clear reference to a document that might.

MegaIng · June 27, 2024, 9:51am

I mean, 5. The import system — Python 3.12.4 documentation is pretty damn expansive. What information in detail are you missing? Sure, maybe PEP 420 could be inlined into it, but I am not sure if that makes the information any more discoverable.

How non-namespace packages get created is also covered in PEP 420, in fact, it describes the complete search through path without going into detail on what the path entry finders do in detail.

anselmschueler · June 27, 2024, 10:14am

Yes, it is comprehensive, but it mostly describes mechanism and not policy (which finders and loaders exist). In my opinion, all details from that PEP and the site docs and the docs for the default path should be on that page, or there should be a separate page that details all the policy with as little of the mechanism as possible.

I will read the PEP, thanks.

MegaIng · June 27, 2024, 10:23am

Python’s default sys.meta_path has three meta path finders, one that knows how to import built-in modules, one that knows how to import frozen modules, and one that knows how to import modules from an import path (i.e. the path based finder).

The default set of path entry finders implement all the semantics for finding modules on the file system, handling special file types such as Python source code (.py files), Python byte code (.pyc files) and shared libraries (e.g. .so files). When supported by the zipimport module in the standard library, the default path entry finders also handle loading all of these file types (other than shared libraries) from zipfiles.

There is no strict “policy” beyond this because it’s at least partially implementation defined.

anselmschueler · June 27, 2024, 10:25am

Is there a CPython-specific document that details the policy for CPython only?

anselmschueler · June 27, 2024, 11:13am

I am confused by this. It’s unclear to me how the path-based finders can accumulate paths to be put into a module. As I understand, the path-based meta finder iterates through the path hooks and calls them, which will then return a spec. The docs say

To indicate to the import machinery that the spec represents a namespace portion, the path entry finder sets submodule_search_locations to a list containing the portion.

But then what loader does it set? Does the path-based meta finder not return that spec in these cases? This is not made clear.

It said also that

This spec will always have “loader” set (with one exception).

Is this another way to communicate a namespace package? Do both have to be set? Will the path-based meta finder interpret this to mean it should not return this spec to the import system? This statement is in direct contradiction to another page which says

loader
(__loader__)
The loader used to load the module. The finder should always set this attribute.

Regardless of how it is communicated, if a namespace package is identified, is the path-based meta finder now responsible for setting the path? Does it return a special loader that implements create_module to set the path? Because otherwise the import machinery would just create an empty module…

Does the path get populated with the rest of the submodule search locations immediately? Or is the path set to an iterable that searches for more portions only when a submodule is requested and only until it is found? If not, why is the path a special iterable at all?

I think I understand what is meant, but it is not communicated well or in a way that makes me confident I understood it correctly.

anselmschueler · June 27, 2024, 12:43pm

It’s also unclear where in this process the current directory is searched.

This passage talks about it

The current working directory – denoted by an empty string – is handled slightly differently from other entries on sys.path. First, if the current working directory is found to not exist, no value is stored in sys.path_importer_cache. Second, the value for the current working directory is looked up fresh for each module lookup. Third, the path used for sys.path_importer_cache and returned by importlib.machinery.PathFinder.find_spec() will be the actual current working directory and not the empty string.

but I do not find an empty string in sys.path_hooks, yet the current directory is searched.

MegaIng · June 27, 2024, 12:53pm

The current directory is not always searched. If you execute a script, the directory of the script gets added to sys.path instead of the current directory. You can observe '' being in the path if you run python -c "import sys; print(sys.path)". Not sure if/where this is documented.

anselmschueler · June 27, 2024, 2:27pm

I see, thanks. That makes sense.

kknechtel · June 27, 2024, 4:16pm

That part is in “The initialization of the sys.path module search path” which OP claims to have read - for reference:

The first entry in the module search path is the directory that contains the input script, if there is one. Otherwise, the first entry is the current directory, which is the case when executing the interactive shell, a -c command, or -m module.

Although, this really could be clearer. The value '' is used as a hack (I’m pretty sure it could equally well be '.') to make sys.path keep including the current working directory, even if it’s changed via os.chdir(). In the -m case, a fixed value is used. That’s also described in the documentation specifically for sys.path:

By default, as initialized upon program startup, a potentially unsafe path is prepended to sys.path (before the entries inserted as a result of PYTHONPATH):

python -m module command line: prepend the current working directory.

python script.py command line: prepend the script’s directory. If it’s a symbolic link, resolve symbolic links.

python -c code and python (REPL) command lines: prepend an empty string, which means the current working directory.

anselmschueler · June 27, 2024, 4:52pm

Thanks, sorry for either missing that or forgetting it.

The misunderstanding was caused by me thinking if I ran a script there’d be a "" entry, and I didn’t find one, so I thought I smelt foul play somewhere. Sorry.