Behaviour change in Python 3.11. Compatibility break or genuine fix?

rustinpeace · June 1, 2024, 10:34am

Hi There,

We create embeddable Python distributions for our tools and have been successfully using the the off the shelf Windows version and an equivalent for macOS that we build so far.

The only meaningful customization we do for the WIndows distribution is remove the default _PTH file so that relative imports work in the current directory. The presense of _PTH file seems to break that even if . is specified by default.

We also provide our own acrivation scripts much like venv that adds python executable dir to the head of path and additionally set PYTHONNOUSERSITE=1. We used to also set PYTHONHOME to the path of the distribution dir but found that it had no effect.

With this setup, we always had sys.path pointing to the embedded Pythons dirs only, doesn’t matter if a system Python was installed or not.

However, after switching to Python 3.11 where Python 3.11 was also installed (via the official installer) we noticed that AppData\Local\Programs\Python\Python311 dirs (DLLs, Lib) started appearing in sys.path. Even if Python 3.11 was uninstalled, the paths stayed. I noted that when uninstalling Python, Python leaves Windows registry keys behind, which is probably a bug anyway.

It appears that either 3.11 and above finally lines with the documentation where registry lookup is used as a fallback or this is a regression and shoul be fixed. Even if it was the former, I would say that for a piece of software with such a large installbase, leaked bugs effectively become default behaviour and fixing those constitutes a backwards incompatible break.

So what choices do we have now?

We could alias python in the shell to be python -E. Specifying -E has the advantage of isolating the environment and unlike the _PTH file limitation can do relative module import no problem. The main issue is that while Pyhton has env variable override for most command line parameters, there’s none for -E. So for example if we have a script, that forks Python to run something, it’ll have to remmeber to also pass -E
We could set PYTHONHOME again. This seems to work almost like -E and more resiliant to sub process forking (assuming those calls inherit the parent processes env and don’t supply their own). The only thing I seem to remember is that when we set PYTHONHOME, some (not all) globally installed tools like Black ran into issues (because they are hard wired to use system Python) because of path mismatch
Something else? E.g. raise bugs on 3.11 and above and also raise a feature request to introduce PYTHONISOLATED env var for -E

Thoughts welcome.

eryksun · June 1, 2024, 12:37pm

The presence of a ._pth file, such as “python311._pth”, enables isolated mode, which ignores environment settings and should also ignore the user site packages directory. It explicitly overrides the default calculation of sys.path. A “.” entry corresponds to the directory of the ._pth file itself. The directive import site enables importing the site module, and it’s the only import that’s supported.

It seems you uncovered a bug in “Modules/getpath.py”. It’s not supposed to add the default module search path from the “PythonPath” registry key if the standard library was found already. However, the code is only checking the variables home and stdlib_dir, which aren’t set if the standard library was found in a zip file such as “python311.zip”. A variable needs to be defined when the standard library was found as a zip file, which can be included in the check that determines whether to fetch the default search path from the registry.

For now, I think you should switch back to using “python311._pth”. It’s possible to get imports in the current directory while using a ._pth file if it includes the line import site. But first, note that it seems there’s another bug in “Modules/getpath.py”. Including import site inadvertently enables the user site packages directory because the configuration with a ._pth file doesn’t explicitly disable it. I think it should because I can’t imagine why an embedding application would ever want that. You’ll have to set PYTHONNOUSERSITE to work around this.

Anyway, by enabling the site module, regular .pth files are supported. Thus you can create something like mystartup.pth that contains the statement import mystartup. If you want the module search path to include the current directory, include a statement such as sys.path.insert(0, '') in the startup script.

rustinpeace · June 1, 2024, 6:53pm

Thanks a lot for the detailed response. I’ll file the bugs as you noted.

Great suggestion on using a .PTH file. I’ve tried it and it almost gets me to what I want but not quite. Allow me to explain. Suppose I have the following directory layout:

c:\example
|
– test.py
– rest.py

test.py has a single line:
import rest

And I have python311._pth and mystartup.pth and mystartup.py with sys.path.insert(0, '').

Fails:
python example\test.py

ModuleNotFoundError: No module named ‘rest’

Works:
cd c:\example
python test.py

Both these usecases works on Python 3.10 or below, including running Python with -E or -Es flag.

Any suggestions to for workaround most welcome and thanks again!

P.S. Forgot to mention that we already use PYTHONNOUSERSITE in our activation script

JamesParrott · June 1, 2024, 11:53pm

alias python to PYTHONHOME=... & python (or with a set if your customers use cmd in windows)?

I’ll support your case to the hilt for cleaning up Windows registry keys (I have written a tool to safely purge them from rogue apps that I’ll publish if anyone else feels similarly).

But overall I feel like this is your company’s problem, not the Python community’s problem.

eryksun · June 2, 2024, 12:05am

You said “current directory”, but I guess you actually meant that you want the script directory added to the module search path. Or maybe you want both. Isolated mode excludes adding the script directory to the module search path, so your startup script will have to add it manually. Here’s a quick attempt at reverting some of the effects of isolated mode on sys.path:

import os
import sys

if sys.argv:
    argv0 = sys.argv[0]
    if argv0 == '-c':
        argv0 = ''
    elif argv0 == '-m':
        argv0 = os.getcwd()
    elif argv0:
        argv0 = os.path.dirname(os.path.abspath(argv0))
    sys.path.insert(0, argv0)

kknechtel · June 2, 2024, 12:06am

This is a bit worrying in itself. Could you show a proper MRE of the sort of breakage you observe?

rustinpeace · June 2, 2024, 8:17am

Hi,

Here’s are detailed steps. Hope this helps.

Steps to reproduce. Assume extraction dir is C:\54729

Download and install Python 3.10.11 (x64)
Download and extract embedded Python 3.10.11 (x64)
Download and install Python 3.11.3 (x64)
Download and extract embedded Python 3.11.3 (x64)
Rename C:\54729\python-3.10.11-embed-amd64\python310._pth to C:\54729\python-3.10.11-embed-amd64\python310._pth.old
Rename C:\54729\python-3.11.3-embed-amd64\python311._pth.old to C:\54729\python-3.11.3-embed-amd64\python311._pth.old
Create C:\54729\example\test.py

import sys
print("\n".join(sys.path))

import rest

Create C:\54729\example\rest.py

print("rest")

Open a PowerShell terminal (or CMD) and run test.py

$ cd C:\54729

$ .\python-3.10.11-embed-amd64\python .\example\test.py
C:\54729\example
C:\54729\python-3.10.11-embed-amd64\python310.zip
C:\54729\python-3.10.11-embed-amd64\DLLs
C:\54729\python-3.10.11-embed-amd64\lib
C:\54729\python-3.10.11-embed-amd64
rest

$ .\python-3.11.3-embed-amd64\python .\example\test.py
C:\54729\example
C:\54729\python-3.11.3-embed-amd64\python311.zip
C:\Users\T\AppData\Local\Programs\Python\Python311\Lib
C:\Users\T\AppData\Local\Programs\Python\Python311\DLLs
C:\54729\python-3.11.3-embed-amd64
C:\54729\python-3.11.3-embed-amd64\Lib
rest

Repeat the same but with -Esu

$ .\python-3.10.11-embed-amd64\python -Esu .\example\test.py
C:\54729\example
C:\54729\python-3.10.11-embed-amd64\python310.zip
C:\54729\python-3.10.11-embed-amd64\DLLs
C:\54729\python-3.10.11-embed-amd64\lib
C:\54729\python-3.10.11-embed-amd64
rest

$ .\python-3.11.3-embed-amd64\python -Esu .\example\test.py
C:\54729\example
C:\54729\python-3.11.3-embed-amd64\python311.zip
C:\54729\python-3.11.3-embed-amd64
C:\54729\python-3.11.3-embed-amd64\Lib
rest

Repeat the same but setPYTHONNOUSERSITE and PYTHONHOME set

$ $env:PYTHONNOUSERSITE=1
$ $env:PYTHONHOME="C:\54729\python-3.10.11-embed-amd64"
$ .\python-3.10.11-embed-amd64\python .\example\test.py
C:\54729\example
C:\54729\python-3.10.11-embed-amd64\python310.zip
C:\54729\python-3.10.11-embed-amd64\DLLs
C:\54729\python-3.10.11-embed-amd64\lib
C:\54729\python-3.10.11-embed-amd64
rest

$ $env:PYTHONHOME="C:\54729\python-3.11.3-embed-amd64"
$ .\python-3.11.3-embed-amd64\python .\example\test.py
C:\54729\example
C:\54729\python-3.11.3-embed-amd64\python311.zip
C:\54729\python-3.11.3-embed-amd64\DLLs
C:\54729\python-3.11.3-embed-amd64\Lib
C:\54729\python-3.11.3-embed-amd64
rest

Clearly -E seems to provide the desired outcome across versions - unfortunately there’s no env variable to set -E so there’s a slight edge case if a script called with -E forks a Python sub-process without -E.

rustinpeace · June 2, 2024, 8:23am

Thank you. I appreciate the response. We’re privilaged to have open source software and contirbutors as yourself who work thanklessly and tirelessly.

I politely disagee on this, I don’t think this is a company issue, it’s clear that behaviour changed between versions running up to Python 3.10 to 3.11 and it’s important that this is acknlwledged and fixes/mitigations applied.

rustinpeace · June 2, 2024, 8:24am

Correct. Both.

eryksun · June 2, 2024, 8:49am

The startup script I posted attempts to undo the effect of isolated mode on the module search path. I think it should satisfy your needs.

When running a script, adding the process current working directory to the module search path is not the normal behavior. Instead, the script directory is prepended to the module search path.
When running the REPL or a -c command, the dynamic working directory is added to the module search path by prepending an empty string. This evaluates to whatever the working directory is at the time of an import, which can change over the lifetime of the process.
When running a module with the -m command, the working directory at process startup is prepended to the module search path. This allows importing a module from the working directory at startup.

I’m doubtful about relying on environment variables or command-line arguments to control this behavior. It’s fragile if scripts execute child Python processes. Environment variables can be unset, or a child process might be intentionally launched with a custom environment. I prefer for the desired behavior to be configured within the Python distribution.

rustinpeace · June 2, 2024, 9:53am

I agree and thanks for the suggestion as it works.

The use of env is a fallback and to be honest a common pattern in lots of different tools/runtimes so people generaly understand the system and its shortcomings.

eryksun · June 2, 2024, 10:22am

The bug with user site packages should be a priority to fix. Normally, [I]solated mode disables user site packages, but not the site module. For example:

$ python -Ic 'import sys; print(sys.flags.no_user_site)'
1
$ python -Ic 'import sys; print(sys.flags.no_site)'
0

Using a ._pth file disables the site module by default, but not the user site packages. Thus enabling site, which is a perfectly reasonable option for an embedded distribution, has the unwanted side effect of enabling user site packages. This violates the isolation of the distribution. A workaround is to set the PYTHONNOUSERSITE environment variable, but that’s fragile.

JamesParrott · June 2, 2024, 10:42am

Fair enough. I’m just not sure all options have been explored. There is an odd change from 3.10 to 3.11 I agree, however it just feels a lot like a typical isolation problem of avoiding the system Python, that is best solved by a venv, or shipping a sitecustomize.py that sets things up however is required.

Otherwise, have you looked into compiling your own build of embedded Python 3.11, that doesn’t pull in entries from the registry into sys.path?

rustinpeace · June 2, 2024, 11:45am

@eryksun has provided a good workaround for now.

By the way this is not a Python 3.11 issue only. It also affects Python 3.12

have you looked into compiling your own build of embedded Python 3.11, that doesn’t pull in entries from the registry into sys.path?

That’s always an option but the last option. We do it for macOS deistrbutions as no official embedded distribution exists for macOS.