Creating a standalone CPython distribution

Some may have heard this from me before (and this is a bit off-topic here), but a lot of the packaging problems in Python are not actually packaging problems, but a cause of the combination of

  1. Python packages need an interpreter,
  2. A CPython interpreter is relatively difficult to distribute (on non-Windows), and
  3. Python packaging doesn’t cover the interpreter itself

And the distribution form, be it pipx or zipapp or something else, is always limited by the distribution of the interpreter itself.

There’s already some pretty well adapted approaches to distribute a full language toolchain instead of a single language implementation (Rust has rustup, for example), but CPython is comparably difficult to distribute because a CPython installation is not relocatable. I’m sure this is technically doable without changing CPython code (the Windows build of CPython has been relocatable for a long time, and IIRC Anaconda achieve relocatability on POSIX by patching rpath or something on installation), but it’d be much easier for (say) an equivalent of rustup for Python if this work can be done inside the interpreter’s build toolchain, or at least have a CPython document on exactly what to patch to enable this.

What would be a good way to facilitate collaboration on this? We probably need a few (CPython) core devs with build toolchain knowledge to be actively involved in this, and I know there are not too many of them to begin with, and the POSIX build tool in general needs a lot more love than it gets right now, even without this feature request…

8 Likes

Probably just to start doing it, and to keep doing it until it’s done. I know that’s not a great answer, but it’s the way these things start moving - most of the people needed for this are surprisingly resistant to persuasion and management (myself included :wink: ).

The work done by @indygreg in https://github.com/indygreg/python-build-standalone is definitely the right direction, but it needs to be paired with a project that has a clearly defined target audience (or else it’ll just be “yet another” fragmentation of the single-executable packaging tools). Or perhaps PyOxy is exactly the project you’re thinking of?

FWIW, Windows is relocatable because the OS is designed that way. If you look at many of the quirks in python-buid-standalone, they’re due to the OS expecting various absolute system paths to be embedded in executables, which was almost never a thing on Windows (and the ones that were have literally never changed).

Anaconda achieve relocatability by patching RPATH to point at their own private copies of “system” libraries. So it’s not patching up for the current OS, it’s near total isolation from the current OS. The core CPython distros are never going to take responsibility for redistributing these libraries (OpenSSL is bad enough!), so it’ll end up remaining with people who either believe they can care for those libraries, or those who believe they aren’t obliged to.

1 Like

I have spoken with the release managers about this idea of releasing a self-contained build of CPython for various platforms. For them what they would need is basically a Makefile target that they can run as part of a release, i.e. it’s just another checkbox as part of the release.

It would also have to be messaged as being a simple distribution, but not a complete one, .e.g this wouldn’t work for Tcl/Tk on macOS due to requiring a framework build, not optimized for an OS like Linux distributions provide, etc.

So whatever collaboration happens for this, those are the parameters to aim for.

You also have to acknowledge that distributing a single binary means only wheels will work since there won’t be a Python.h. I realize this isn’t necessarily what the zipapp story cares about, but there’s also the education angle of having a single thing to download to anywhere that would greatly benefit from this (and selfishly I want this for VS Code and the Python Launcher to make installations dead-simple). Else we are now having to define the scope in case you want the single binary or a single zip/tarball that you can unpack like a virtual environment that has all the appropriate files to build an sdist.

But to directly answer the “how to coordinate/collaborate” question, probably a repo that holds a fork of CPython along with a very clear goal. That way we can focus the folks with the appropriate knowledge to try and solve this versus some folks working in silos.

There are unfortunately very few of those. I will happily handle championing this sort of work with python-dev, though.

Don’t we already have to watch out for this for Windows releases since that gets baked in?

But I will admit that whenever I bring up the idea of doing a static build of CPython, OpenSSL is immediately brought up. Usually, though, that’s the only library of concern. But if I’m right about Windows then it’s a bit of a red herring since we already have to care.

1 Like

This doesn’t need to be a single binary though. From my POV, it’s sufficient to have a relocatable installation, shipped as a tarball containing compiled assets, which are guaranteed to work on the relevant platform.

https://nodejs.org/dist/v16.17.1/

There’s also some work along those lines in…

Author of GitHub - indygreg/python-build-standalone: Produce redistributable builds of Python here. (Thanks for the mention, Steve!)

Having somewhat solved this problem, I’ve thought a lot about it.

I have too much context lingering around in my head to capture in a single comment. But I’ll start with conveying some key points.

Foremost, there are both build system / binary portability issues as well as run-time issues. A lot of people (myself included) get transfixed over the obvious problem of how to build a Python distribution that can even run on other machines. This is indeed a hard problem (CPython’s build system doesn’t make this easy and cross-platform binary portability on operating systems like Linux make this harder than it deserves to be).

But even if you solve the build system problems to produce a standalone distribution (python-build-standalone is an existence proof this is possible), there are still a host of run-time problems that need to be tackled. OpenSSL needs to be pointed at trusted CA certificates. Terminfo needs to reference a database of terminal definitions otherwise the REPL (and anything else using readline) is completely broken. Extension module building (or anything else using sysconfig) is likely broken because the build-time settings don’t match the run-time environment. TCL files aren’t found, so tkinter is broken (CPython has Windows-only logic assuming the layout from the Windows MSI installer.). Even resolving the path to the stdlib can be finicky due to how some paths are baked into libpython.

So even if you produce binaries that are capable of running on other systems, there’s a lot that can go wrong when you invoke python. Today, CPython largely buries its head in the sand about these problems outside of Windows. For people like me existing outside CPython, that means you have to supplement the CPython runtime with addition logic to cajole OpenSSL, terminfo, tkinter, etc into working. (I wrote the pyembed Rust crate to handle this, and much more. And PyOxy is essentially the marriage of python-build-standalone + pyembed - via PyOxidizer - to produce a somewhat usable distribution. But there are still run-time corner cases this supplemental code doesn’t cover.)

If there were to be official standalone CPython distributions, you need to solve these run-time quirks to some degree. That requires maintaining a new pile of code somewhere. IMO a compelling case can be made for this logic existing in CPython + its stdlib itself. But that may be a contentious topic, as I’m sure people like Linux distro package maintainers may take issue with certain approaches. I’m not convinced it is possible to solve all the potential problems related to sysconfig on Linux: there’s just too much variance between machines. You may need to distribute your own compiler toolchain with your CPython distribution and point sysconfig at it to enable extension module compiling. (I always intended to do this with python-build-standalone but never found time to do it.)

I started python-build-standalone to support PyOxidizer. And my initial goal of PyOxidizer was to try to achieve single file Python applications, without any temporary files at run-time. I was therefore transfixed with having a single file executable for both the distribution and the Python application built on top. This is still a noble goal to have. But the reality is that while simple and convenient, single file distributions/applications probably aren’t as important as I initially thought. On Windows and macOS, the concept of installers is normalized among users. MSIs or exe installers on Windows. DMGs or pkg on macOS. And on Linux, BSDs, etc it is common to installs apps from a zip or tarball when bypassing your distro’s package manager. So in all cases your user base is likely familiar with some kind of install step to materialize a multi-file app. So having a single file executable only buys you so much.

I learned that single file Python distributions/applications break a lot of assumptions within CPython, the libraries it uses, and especially among Python packages in the wild. e.g. static OpenSSL libraries on Windows blow up in weird ways at run-time. And of course there are __file__ assumptions everywhere. As much as I desperately want to make it possible to have a single file Python distribution and application, they are a lot of pain. So my advice to others is to not get hung up on statically linking everything into a single binary. By all means make it possible to do this in the build system. But think long and hard before you recommend this as the shipping configuration, otherwise you are signing yourself up for a lot of funky bug reports.

I keep telling myself I’d like to upstream more and more of python-build-standalone into CPython. Ideally delete the python-build-standalone project completely, as official CPython distributions of the future would fulfill its use case. So I’m very much aligned with helping CPython produce official standalone distributions. Let me know how I can help further.

6 Likes