Add a command line parameter to add folders to sys.path (needed for Embedded Python)

lwc · June 24, 2023, 8:52am

Embedded Python ignores most environmental variables like PYTHONPATH and seemingly PYTHONSTARTUP (while still accepting PYTHONUSERBASE and perhaps others).

But all modules need access to the Lib\site-packages folder, which is ignored even if it’s right inside the Embedded Python folder.

This causes forums and trackers around the web to offer users to modify or even completely delete the pythonXYZ._pth file in order to affect sys.path directly. This not only basically ruins the whole concept of “embedded” (at least the deletion part), but also breaks the process every time there’s a new version (due to the XYZ part).

So will you consider just adding a simple command line parameter like -libs= e.g. python -libs=libs (to relatively seek the current folder for the libs folder), python -libs=c:\portable-stuff\libs, etc.?

I hope to get consensus so I can put it as a GitHub feature request issue.

pf_moore · June 24, 2023, 9:06am

If you’re embedding Python, why not just run some custom code when you start the interpreter? It sounds to me more like your situation is that you’re trying to use the embedded distribution as a portable version of the standalone interpreter. The nuget distribution may be more useful for that scenario.

Side note - a lot of people seem to use the embeddable distribution in situations where the nuget distribution might be better. Maybe that indicates that distributing the nuget version differently, not needing the nuget tool for example, may be more discoverable? @steve.dower what are your thoughts on that?

petersuter · June 24, 2023, 10:34am

It would be nice to get more documentation / explanations about when they should be used / in what way they differ / intention of the ._pth isolation mode / why some things are missing from them etc.

Maybe it already exists? 4. Using Python on Windows — Python 3.12.1 documentation says the nuget package is for CI only, while the embeddable package is for distribution with applications written in Python (4.4.1) or applications written in native code (4.4.2). I just noticed further down (4.9) contains some more information, although still a bit sparse.

The nuget tool is not strictly needed apparently:

import io
import urllib.request
import zipfile
zipfile.ZipFile(io.BytesIO(urllib.request.urlopen("https://www.nuget.org/api/v2/package/python/3.11.4").read())).extractall()

Is that “supported” / preferred?

pf_moore · June 24, 2023, 10:48am

I was talking more about discoverability. Yes, you don’t need the nuget tool, but the existing docs give the impression you do. And the way of manually downloading and extracting the nuget distro isn’t recorded in the docs and definitely doesn’t feel “official” or “supported”

Also, the nuget distribution isn’t visible on the Python download page for Windows, whereas the embedded distribution is, so users looking for an “unzip and run” interpreter will naturally think that the embeddable distribution is the right thing for them. And users following that route are extremely unlikely to read the documentation.

lwc · June 24, 2023, 2:20pm

Thanks for the quick answers, all of you!!!

Because custom code won’t help you run a basic command like python -m [module name] (e.g. python -m pip). Such a command will only work if you edit the aforementioned file that will need an update after each release and might break isolation.

Not just that but also quite not straightforward. Instead of a simple “download Python” link, it’s “Download package”, it’s located under the general About panel instead of under the specific release, has a non standard file extension, has to have a decision which folders to unzip, etc. You can’t quite compare it to the official Python download page which immediately gives you clear plug and play download links.

In any case, the nuget version has no pythonXYZ._pth file, so doesn’t it break its isolation?
If so, should one choose between isolation and portability?

It also produces all kinds of errors unless you also export from the nuget package at the very least the Lib folder:

Could not find platform independent libraries
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
ModuleNotFoundError: No module named ‘encodings’

And without the DLLs folder:

All kinds of ‘unicodedata’ errors

pf_moore · June 24, 2023, 3:01pm

That’s not intended use of the embeddable package, though. I’m not saying your use case isn’t valid, I’m saying that you found the wrong solution to your problem (and I’m trying to explore how we direct users to the correct distribution when they have this use case, because I see a lot of people making this mistake).

I don’t know quite you mean by “export from the nuget package”. If I use nuget to download the package, or I use the script @petersuter suggested, it works fine for me. Without knowing how you’re extracting the nuget package, I can’t say much more. But if you are using nuget, which is the supported approach, and are getting errors like this, you should probably raise the issues on the Python tracker, as I don’t imagine that’s expected behaviour.

lwc · June 24, 2023, 3:04pm

I’ve just downloaded the nuget package, opened it an archive file and exported its main files, then also the Lib and DLLs folders to stop the errors.

pf_moore · June 24, 2023, 3:05pm

OK, so that’s not the documented or supported method, it’s not surprising it didn’t work…

lwc · June 24, 2023, 3:25pm

Well, it worked once I exported the main files plus the Lib and DLLs folders.
I left out the Include, libs and Tools folders and no harm seems to have been done.

lwc · June 24, 2023, 6:35pm

What is your opinion on this?

steve.dower · June 26, 2023, 8:19am

The Nuget package may behave weirdly if you have a regular install of Python, because it sometimes prioritises environment variables or registry keys, and always prioritises the current working directory. So provided you know the environment you’re using it in (i.e. there isn’t an existing Python install), it doesn’t need isolation.

Exactly the same issues would apply to an “unzip and run” distro. On top of that, you’re incredibly likely to get mixed up versions unless you’re very good about updates (and the 90-99% of users who are not very good at it will be a regular source of “bugs” for us to deal with, often without anywhere near enough information to know what they’ve done wrong). By not making it prominent, we more reliably know that people have run the installer, which in most cases will warn them if they have another install, or will properly update it rather than scattering files everywhere.

The installer can also handle file associations and shortcuts in a reliable, forward-compatible way. We are not going to write instructions on how to do that for every version of Windows (and the UI changes regularly), so that’s another big way that a lot of users would be let down by an unzip and run package.^[1]

The embeddable package is intended for environments that you don’t control, so you need the isolation. If you are adding a site-packages folder that you want imported, just add it to the ._pth file and it will be included in sys.path. You don’t need to induce the interpreter to do it, it’s already reading the paths from a file that you can easily modify.

If you want to add a path at runtime without knowing where it will be until then, then you’ll have to add a script that uses your own environment variable. This is a major security risk in certain contexts, which is why the embeddable distro does not do it by default. You can add the security vulnerability to your own app if you want, but we aren’t publishing CVEs “against” CPython because you did it.

I leave out environment changes because even with a Windows MSI installer, it’s impossible to do anything reliably. When not even the OS can manage its own PATH variable properly, most mere Python developers have no hope of persuading the OS to do it. ↩︎

pf_moore · June 26, 2023, 10:06am

That’s a really nice explanation of the roles of the two distributions, although I’m not completely sure it matches what the docs currently say. Would it be worth a docs change to reflect this? In particular, you imply here that it is reasonable to use the embedded distributed if you want an isolated Python interpreter to run scripts with, but the docs say “It is intended for acting as part of another application, rather than being directly accessed by end-users”. I’d try to write something myself, but I’m reluctant to do so as it feels very much like you are the authority on how these additional distributions should be used, and I don’t want to make commitments you’re not happy with.

One thing that I don’t think could be documented as it stands, and which worries me, is

That says to me that if you have any regular Python install on your PC, you shouldn’t be using the nuget distribution (even to test against a version other than the one you have installed). Is that right? It seems like a rather severe limitation.

steve.dower · June 26, 2023, 3:10pm

Yeah, probably. It seems most people who ask the question do find the docs at some point, so it would no doubt be more immediately helpful (though I do enjoy hearing how people are using Python like this).

It’s been a while since I’ve gotten into the headspace for writing docs. Would gladly review a contribution based on my various forum posts if someone wants to do that research (or even if they just want to make stuff up and I can try to correct it - that’s often how I break my writers block on stuff like this).

Kind of. It’s been a bit better since 3.5 when I straightened out a few overlapping registry keys, and better still since I think 3.9 when I made {dirname(sys.executable)}\Lib\os.py higher priority than the registry, but because our getpath.py behaviour is so intricate, there are plenty of ways to get into trouble (e.g. dev builds can also get confused by system/user installs, and unactivated venvs run an even higher risk, and venvs made from dev builds are real trouble…).

Adding ._pth was partly to allow a way to completely bypass all of getpath.py. Any further changes to that logic are going to cause pain and anguish (even my attempts to not change it still changed it and upset people), so I’m holding off in case something like PyBI comes about and we’ve got a big enough change of model that changing a few specific edge cases is easily acceptable.

lwc · June 26, 2023, 8:37pm

Why would a simple command line parameter bring pain?

Alternatively, if could allow to add a secondary generic (i.e. version-less) ._pth file that could remain between version upgrades.

Both options only happen in the local folder so I don’t understand the harm.

steve.dower · June 26, 2023, 9:08pm

Two separate topics:

a command line parameter breaks the security guarantee of having a ._pth file (which is that the only way to add more paths to sys.path is to modify that file). It only causes pain when you get hacked.
changing how paths are calculated (in getpath.py) will bring pain because carefully constructed environments suddenly break (not a hypothetical - it happened 3-4 times by accident with 3.11 when we didn’t even try to change how paths were calculated)

The two valid names for ._pth files are the name of the executable being launched (typically python._pth) or the name of the DLL being loaded (typically python311._pth). The latter is more secure, but the first is available.

Does that provide what you need?

lwc · June 27, 2023, 7:33pm

No, because of “The file based on the shared library name overrides the one based on the executable, which allows paths to be restricted for any program loading the runtime if desired.”

steve.dower · June 27, 2023, 7:37pm

Don’t you control both of these files though? You can just not use the library one.

If you don’t control these files, then this isn’t your Python install (it’s embedded in someone else’s app) and you should install your own. That’s entirely by design.

lwc · June 28, 2023, 3:58pm

I do control both files, but the shared library name one will be automatically replaced every time a new version is used, while the executable one will remain as-is when new versions arrive.