Would it be appropriate/acceptable to put sys.path inserts such as PYTHONPATH and/or '' when that is added at a lower priority than the standard library, instead of always top priority?
Overriding Python’s standard library is harder than it used to be (long ago, in the before-pip times, eggs were regularly added to the front of sys.path instead of at site-packages priority), and I think that’s great! So “installed” packages are now no longer added to the front of sys.path, but PYTHONPATH / “script dir” imports are still added to the front, which means they have priority above the standard library, unlike other third-party installs. That makes it easy for a file called code.py or a directory called code (or any other stdlib module) to cause real problems, as has been discussed numerous times over the years.
In IPython, we mitigate this issue in our own “script path” implementation by inserting the CWD immediately ahead of site-packages instead of ahead of the standard library, and this has helped us a lot! Would a similar change to the default behavior be appropriate? Is there a strong intentional reason why overriding the standard library is preferred by default? I’ve only ever used it myself to deliberately break things.
Such a change only protects the standard library, not anything in site-packages, but that seems reasonable to me because I think users are more likely to be aware that they have and need an importable numpy than a code, for example.
Another (much bigger!) related option to protect the standard library would be to add a stdlib namespace to allow from stdlib import re so that there’s only a single possible name for conflicts. That would be a years-long migration, but I think it would be nice if stdlib modules could be clearly and explicitly imported from the stdlib instead of sharing a single namespace with third-party packages.
Since we have a flat namespace for the stdlib, we need to make sure that
any new modules which end up in the stdlib don’t break existing code
out there.
By inserting the script dir or the CWD (via ‘’) at the start of sys.path and
before the stdlib locations, you make sure that the stdlib’s new module
doesn’t accidentally override an application module of the same name,
e.g. it’s very common to have a “test” package in an application,
but the stdlib has such a package as well, which applications
typically don’t need.
That said, sys.path is fully flexible and you can change this in
many different ways to your liking.
Moving the stdlib to its own package has been discussed many times
in the past, but it never materialized due to the huge impact this
would have on the existing code base out there and the fact that
e.g. pickles and other class referencing storage mechanisms
include the full module path of objects.
That’s interesting! I hadn’t thought of this case as a forward-compatibility issue, thank you. However, this is only true in the narrow case of application-directory modules and not traditionally installed application modules in site-packages, which would indeed be broken by adding a stdlib module of the same name. So there’s already a breakage issue, and it’s just a small subset of imports that are allowed to take priority over the standard library. It doesn’t seem desirable to me for the two to behave differently.
That’s true, but not relevant to the fact that it is the default behavior that’s the problem, and I think it’s only a problem because it’s the default. I think the intersection between folks for whom this causes a problem and folks for whom changing sys.path is appropriate is very small indeed.
I think PYTHONPATHis designed to be used in the application context, so it makes sense it only serves that well, and have issues when used elsewhere. The problem, from what I understand from the above, is that there is no way to specify “append after stdlib but before site-packages” without diving into sys.path (which isn’t trivial since the paths themselves don’t say why they end up in there, so a bit guess work is required). Maybe the solution is to provide a mechanism for that scenario, instead of trying to change PYTHONPATH to do what it wasn’t designed to do.
The Interpreter independent isolated/virtual environments thread a while ago raised a similar need (virtual environment site-packages directory is appended after stdlib; the “before site-packages” part is relevant for system-site-packages = true), so there is definitely a valid use case for this.
To clarify, in most cases the script directory is inserted at the start of sys.path, not the current working directory. The working directory is used when there’s no script per se, such as the interactive REPL or -c and -m commands.
sys.path building is somewhat complicated. Here’s an attempt at a summary of the basic logic:
script dir (when using python3 script.py) or module dir (when using python3 -m module) or current dir (when using python3 -c “commands” or running Python in interactive mode)
additions from PYTHONPATH
Python lib ZIP file
Python lib dir (for Python stdlib modules)
Python lib-dynload dir (for C stdlib modules)
User’s site-packages (for local 3rd party packages)
Python site-packages (for system wide 3rd party packages)
The complete logic also includes .pth file processing and can be further customized using usercustomize.py, sitecustomize.py, PYTHONHOME and venv setups. See site.py for most of the details and https://github.com/python/cpython/blob/master/Modules/getpath.c for additional details around building sys.path.
@malemburg: May be you know, why is the order of paths in the Python 3.11 differrent? User site package directory is not the second from the end, but third. At least in Windows.