Hey there, i am new to this forum, hoping for constructive answers. My requirement is quite specific, so please read carefully.
What i want to achieve is a C++ program that has a bundled python interpreter, so i can run python code at runtime from C++. I already successfully use pybind11 to embed the interpreter, so that i can run Python code from C++, though my requirement seems not to be a pybind issue, but a general python one. (pybind is more or less just a header-only library which implements the official Python C-API).
My Problem is that Python is not yet embedded INTO the C++ executable, which means when distributing, the user’s PC still needs Python installed, or at least the entire python installation shipped with the program. Namely, python311.dll and the standard library files. I have no interest in tools like pyinstaller and similar, they do the opposite.
For the reasons of ease of installation, no possibility for unexpected failures due to missing files, and improved security, i want to embed the entire python interpreter directly into the C++ executable (python311.dll at least), so that there is a single, self-contained C++ executable that just always works.
How would I go about that, in a cross-platform way? (Windows and Linux)
Note that on Windows you’ll also need to rebuild any extension modules you want to support, either statically linked or directed to look into your executable for Python APIs rather than python311.dll. I don’t believe this is necessary on Linux, but I also don’t know exactly how to make it work.
You might have an easier time bundling the embeddable distro with your app on Windows, as this is the scenario it’s designed for. It’s stripped down and uses some features to avoid loading other copies of Python from the user’s machine, so it’s about as simple and secure as we can make it. You would probably omit the python.exe from that package in your case, as well as any extension modules you don’t want to support.
I see, thank you for your reply. I assumed that i need to build it statically. But can someone please explain to me why Python is available statically on Linux, but not on Windows? I know there are many gotchas when doing so, but they must be on Linux too, so what would be the difference? Usually Windows is the one that statically links and Linux is the one to forbid that…
But thank you for the link! I want to support both, shipping the embedded distro, as well as 100% self-contained executables.
I just read that the standard library is shipped as a zip file of precompiled python files, that’s more or less what i was afraid of. Isn’t there the possibility that someone could modify the zip file and modify the standard library, and thereby use introspection to get access to all private application data? This might be security related…
Or, is it possible to only ship the python dll, but embed the standard library such that it cannot be messed with?
It’s absolutely possible (by writing a suitable import hook) but it’s not something that exists as standard, because there’s been insufficient need for it until now. (Or anyone who has needed it has done the work for themselves and not tried to contribute it upstream).
The standard library is the bit that’s much harder to embed, as is Python code in general.
If you really want, you could concatenate it to your executable and you should be able to add the executable path to sys.path (or the equivalent when initialising) to have it read from the zip file. That won’t really prevent more modification, but at some point you have to accept that users who have a copy of your app have the power to modify things.
There’s a slightly more complete (yet highly experimental) approach in my pymsbuild tool called DLL packing, which will do what Paul suggests and can work for the stdlib, though that’s obviously not the intent. The main advantage is because it’s all executable code, if you sign the DLL and run in an environment that requires valid signatures, you can be sure that all the contents in unmodified.
The main reason Windows doesn’t support static linking is because extension modules need to know the name of the DLL to load APIs from. There are tricks to change the name at compile time, but only really dirt hacks to do it at runtime, and frankly everyone is better off if you just keep the DLL separate. But a few scenarios do benefit from it, we just haven’t merged all the actual fixes needed to make it buildable yet.
Technically the module, but in the Windows API sense, not the Python sense, so I’ll stick with DLL even though an EXE is also sufficient. ↩︎
So, my use case would be to have a C++ Framework and embedded Python more or less as a scripting engine for UI rendering, where the structures are written in C++ and then exposed to the Python script via pybind11. So as far as I understand this would not involve any runtime modules as everything is compiled in one go and the python script is written afterwards and not extended any more. If it is to extend, you would modify the original code and not load a module at runtime.
Does this sound correct, would this work completely without module extensions, when exposing it directly from the main C++ app? In such a case I would completely disable module extensions as it is not of any use
Yeah, the main thing that you lose by omitting extension modules is networking (socket, select and ssl) and foreign-function interfaces (ctypes/libffi). If you can live without these, and without any third-party stuff, you shouldn’t have a problem.
(I’d expect in your circumstances you’d want to provide your own networking anyway, so that it integrates properly with your UI loop.)
On Linux these can be statically linked if you want them. Windows is a bit more difficult because the third-party libs we depend on don’t statically link easily (and it only gets harder trying to do two levels of static linking…)
Yes, perfect, i see why that is. But the missing networking can actually be considered a good thing in my case as it shouldn’t be done in the UI loop, instead in the C++ base. Thus, it would prevent users from using the python script for anything more than it was designed for.
Thank you for your help. I will keep the possibility in the back of my mind, but I think I will just ship the embedded distro and have python UI scripting and self-contained executables exclude each other. The security gap is closed by noting that anything security related should be done in the C++ backend and not exposed to the Python UI loop anyways .
But I can’t really get the embedded distro to work. I basically want to compile the C++ application once using the Python C headers, link the Python library, and then run the executable on another machine while providing the extracted python embeddable distro, containing python311.dll, python311.zip and many .pyc files.
But how am I supposed to compile something for the embedded distro when there are no headers included? I really searched but can’t find good instructions in the web. The application should use python311.dll from the embedded package, but there are no headers or .lib files to compile with. And if I compile with the headers and libraries from the normal python installation, but then provide the embedded distro when running, the program simply crashes. (Versions are the same)
How am I supposed to use the embedded package when there are no headers? And when I use the normally installed python files, how do I get an embeddable distro, that fits together with the headers it was compiled with? (instead of downloading headers and libraries from unrelated download links and hoping they will work)? Somehow the headers are missing in the embedded distro, or the embedded distro is missing in the normal installation, what am I supposed to do?
The crash will be related to something else, the binaries are identical. They’re all laid out as part of the same release process. (Any Python 3.11 install should be fine, they’re compatible enough that the precise version doesn’t have to match.)
So you did the right thing here, but will need to diagnose the crash independently.
You didn’t extract the zip with all the .pyc files did you? There’s no need, Python will read them directly from the zip file. But if you move them around, you’ll need to update the python311._pth file to point at the new search path.
Well, i extracted the downloaded embed zip file, and now i have a folder with python311.dll, python311.zip and many other files and some more dlls.
I have compiled a small test program that includes and links against the normal global python install. It runs perfectly on the build machine.
For simplicity, I just copy the executable into the extracted embed folder on the other machine, so that it is right besides the python311.dll and python311.zip . However, when I execute it I simply get a Windows error message “The application could not be started correctly (0xc000007b)”. But that could be anything, this usually means something is incompatible.
When I instead only copy the python311.zip file from the embedded distro and python311.dll from the global python installation from the other machine next to the executable, then it also runs on the other machine. Until i try to import something like sockets, then it also crashes again. So something must be wrong with the dlls i think.
EDIT: It also works for simple packages when i extract the embed zip but replace the dll from the zip with the globally installed dll, but it still only works for simple packages, not for sockets. Thus, the dll that is globally installed behaves differently than the one that is part of the embedded distro
You will either need to copy python311._pth or specify the module search paths as part of initialization. Without one of these, it will default to trying the usual search process, which is bound to fail. Specifying the paths during initialization is more secure, but way more complex than using the ._pth file.
If the DLLs from the two sources are different, it’s because you have mismatched packages. The binaries are identical - we only build and sign it once, and then package it up multiple times.
# Uncomment to run site.main() automatically
That seems to be correct as python311.zip, python311.zip, python.exe and my compiled executable are all in the same directory. I want to note that python.exe (the interactive shell) works perfectly fine when double clicking it (inside of the embedded distro).
My App still works with the global installation, but when i remove the global installation from the path and put the app into the embedded distro folder on the same machine, it crashes. It must be the combination of my app being compiled with the global installation, but then using the embedded version. The global installation works in itself, and the python.exe interpreter in the embedded distro also works. Just the combination of my app and the embedded dll does not want to work.
Is there maybe something that must be set in the application to differentiate between global and embedded installations?
, while specifying include directory and library path manually. Both yield the exact same result that it works when the global instance is on the path, but crash when they are next to the embedded distro’s files…
Alright, I think i need to get out bigger guns. But just to confirm, it is the correct approach to compile with the headers from a global install, and then run it with any embedded distro that is the same major version (and minor version is irrelevant)?
Holy cow, i finally found the mistake, while writing a 500 point list of every single action i take. I didn’t notice the bit-ness.
I always downloaded and used the 32-bit embedded distro, because of greater system compatibility, and then never questioned it anymore. However, because I installed Python globally with the big download button and the big download button as well as the recommended download link on the windows page don’t say they’re 64-bit, I never thought about it. I now downloaded the 64-bit embedded distro and everyhing works beautifully. Thank you for taking a look at this, although it was my fault.
But i have one more question, now that i can successfully use it:
How can i guarantee that anyone building my project has a fitting embedded distro to the version they have installed? I assume it would be quite cumbersome to detect the version they have installed and then download the right package, besides the fact that this makes the build non-reproducible.
I would much rather have both python packages downloaded at CMake configure time, so that the project is always built with the same python version.
Question: Is there any way that I can automatically download like a zip file that contains the windows installation (including the headers and compiled library to link against, python311.lib), that is not an installer, but a zip file i can extract locally? I need the headers and the static library to link against.
That way CMake can specify the python version wanted, and the compilation is completely independent of what is installed in the developer’s system
Grab the package from Nuget. There’s a direct download URL, which you can pretty easily calculate, and a .nupkg is just a ZIP file with more metadata.
Alternatively, you could make your build calculate the embeddable distro URL from its own version and download that one.
Unfortunately in both cases, you’ll need to know the micro version (unless you use nuget.exe to do the download) - you can’t just ask for “the latest 3.11” for example.
I can think of a few ways to streamline our set of installers that I’d kinda like to do, which would affect this. But I don’t really want to go breaking everyone until it’s all lined up in a way that isn’t going to be surprising, and also doesn’t leave us maintaining all the old packages forever. Probably when the next big “thing” happens in packaging I’ll revisit the distributions we provide on python.org, but we’re still waiting to see what that will be.