Standalone app deployment story

Have you looked into the official CPython Embedded Distribution for Windows? What is the difference between it and python-build-standalone (considering only Windows)? I’ve been wondering how it would be possible to have a similar distribution for at least macOS, and I believe Russell Keith-Magee (who’s behind Briefcase) also had a similar thought. I am unfortunately not familiar with CPython’s build process to actually work on it :frowning:

The PyOxidizer/python-build-standalone work is super cool.

The idea of taking an arbitrary Python application stack and building it into a single static binary is really appealing. But it also involves a ton of nasty problems that are probably impossible to solve reliably, in general. The __file__ issue pretty bad, and extension modules are deadly.

Maybe a good pragmatic short-term goal would be to package just the interpreter + stdlib this way, while continuing to use a traditional file-based layout for third-party packages? That seems like something that could be implemented in a very general and reliable way to handle arbitrary Python applications.

1 Like

Have you looked into the official CPython Embedded Distribution for Windows? What is the difference between it and python-build-standalone (considering only Windows)? I’ve been wondering how it would be possible to have a similar distribution for at least macOS, and I believe Russell Keith-Magee (who’s behind Briefcase) also had a similar thought. I am unfortunately not familiar with CPython’s build process to actually work on it :frowning:

Yes, I am aware of the embedded distribution. The main difference with python-build-standalone is that the embedded distribution can be encapsulated in a single file. python-build-standalone ships the object files and static libraries required to link a binary embedding Python. And PyOxidizer contains magic that allows embedding the standard library into the executable/binary and importing Python modules from memory using zero-copy with no explicit filesystem I/O.

Coercing CPython’s build system to produce artifacts that can be embedded in a single file executable was… not trivial, especially on Windows. I’m contemplating sending some of my Visual Studio project files patches upstream because others may find it useful to have a static build target. But I’m not sure they would be accepted…

I have python-build-standalone and PyOxidizer working on Windows, Linux, and macOS and am pretty confident the approach of distribute intermediate build artifacts for linking works generically. There are some edge cases, such as LLVM version dependency for LTO builds. (LTO object files are LLVM bitcode, which is LLVM version dependent rather than e.g. ELF.) But it mostly just works.

1 Like

I agree with the assessment that it will likely be impossible to get 100% of applications to work as a single static binary. There’s just too many corner cases, especially with extension modules and the dependency/build problems that arise with those. Although I’m convinced that package maintainers can work around many of them with sufficient setup.py hacks. It may not be pretty though…

I don’t quite have it implemented, but PyOxidizer will support bundling Python modules in a more traditional manner. I wanted to solve the in-memory importing problem first because it was new and novel and was critical to achieving my vision of a truly self-contained binary embedding Python with minimal run-time overhead.

I ran into this issue many times while using and supporting other people using cx_Freeze.

My conclusion was that that ship has sailed. Modules assume they have a filesystem path and can locate adjacent data files using __file__. I frequently use this pattern myself. I don’t think there’s any realistic way to move the ecosystem to some fancier way to access data files inside the package.

Where I think there is potential is caching: for performance, a module could be loaded from some kind of pre-assembled bundle, but it could still have __file__ pointing to a real copy of the module on disk. So it can still look up data files by constructing a path from __file__, even if the module’s own file is never really read. Obviously doing this generally gives you problems with cache invalidation, but for an application with bundled dependencies, it should be feasible.

I would certainly love to consider them, so feel free to create an issue on bugs.python.org (or make sure I’m nosied on an existing one - steve.dower) and go ahead and submit a pull request.

One problem with a statically linked build is that most built extension modules expect to find python3#.dll in memory, which it won’t be in this case. Recompiling the module is the only (“good”) option, but I’m not thrilled about exposing all the C API from an executable file. It’s not technically wrong, but it loses a lot of the “niceness” of static linking.

(Edit - on Windows, though if someone else wants to confirm whether this is also a problem on other platforms I’m happy to believe it is there too.)

1 Like

I think on every platform, if you want to get a statically linked extension module then it requires somehow hacking the module’s build system. And we’re moving in the direction of having more variation in build systems, not less.

Another problem is namespace management. When you’re defining a symbol in a Python extension, then there are effectively two different levels of visibility: private to the .c file (using C’ static keyword), or shared across C files but still private to the extension module. But, if you link all your extensions into the same static binary, then suddenly all their internal symbols are thrown into the same namespace, and can collide with each other.

NumPy used to support being statically linked into the interpreter, because in the old days it was common that supercomputers would use exotic OSes that didn’t have dynamic linking support. This involved serious trickery to avoid polluting the global namespace, like concatenating all the .c files together into a single file before compiling them, and preprocessor trickery to add static keywords everywhere. It was pretty gross and fragile.

Or, as is my preference, stop using __file__ and start using importlib.resources.

3 Likes

Presumably, only pure Python imports work this way. There’s still official support for importing (i.e. dlopen()ing) shared libraries from memory.

With the metadata consumable using importlib_metadata.

Somewhere I wrote an import hook that would only extract shared libraries to a file system cache at import time. I don’t remember the details, other than @brettcannon helped a lot with the implementation. It wasn’t much code, and worked fairly transparently, so that’s another option.

I appreciate that this is an older thread, but just wanted to chime in as the new lead of rst2pdf who is also looking for a solution for standalone command line Python apps distributed to users who have no clue about Python.

I’m new to Python, as I only got involved in rst2pdf as I’m a user who doesn’t want it to die. I hope that I’m rapidly learning (there’s so much to learn!), but we certainly have dependency issues. We regularly get issues where the user has a different version of ReportLab to the one we tested with which then has unexpected behaviour.

Hence, I’m looking for a solution that I can document for users that will allow them to type rst2pdf from the command line (in any directory) and that I can be confident that they are running our tool with a set of dependencies that I know work.

Currently pipx or shiv appear to be the solution, but with talk of them being workarounds, I’m looking forward to an approved solution in the future.

I’ll plug my project - https://github.com/indygreg/PyOxidizer. I think it should be stable enough to use in production now. Please try it out and file GitHub issues for the parts that are confusing, don’t work, or aren’t implemented. (I’ll probably officially release it via a blog post, etc in the next few days.)

But in the case of rst2pdf, you may be out of luck because PyOxidizer only supports Python 3.7 and rst2pdf seems to require Python 2.

2 Likes

Thanks, I will take a look at PyOxidizer. rst2pdf is very nearly Python 3 compatible fortunately as I’m well aware that Python 2 is not long for this world!

Will itbe possible to use memfs to get over the file issue.
Where ever we have issues like mumpy will it be possible to load these offending modules in memory mounted folder ( in linux).

I am new to this area ans sorry if this suggestion does not sit well.