Creating a standalone CPython distribution

Some may have heard this from me before (and this is a bit off-topic here), but a lot of the packaging problems in Python are not actually packaging problems, but a cause of the combination of

  1. Python packages need an interpreter,
  2. A CPython interpreter is relatively difficult to distribute (on non-Windows), and
  3. Python packaging doesn’t cover the interpreter itself

And the distribution form, be it pipx or zipapp or something else, is always limited by the distribution of the interpreter itself.

There’s already some pretty well adapted approaches to distribute a full language toolchain instead of a single language implementation (Rust has rustup, for example), but CPython is comparably difficult to distribute because a CPython installation is not relocatable. I’m sure this is technically doable without changing CPython code (the Windows build of CPython has been relocatable for a long time, and IIRC Anaconda achieve relocatability on POSIX by patching rpath or something on installation), but it’d be much easier for (say) an equivalent of rustup for Python if this work can be done inside the interpreter’s build toolchain, or at least have a CPython document on exactly what to patch to enable this.

What would be a good way to facilitate collaboration on this? We probably need a few (CPython) core devs with build toolchain knowledge to be actively involved in this, and I know there are not too many of them to begin with, and the POSIX build tool in general needs a lot more love than it gets right now, even without this feature request…

12 Likes

Probably just to start doing it, and to keep doing it until it’s done. I know that’s not a great answer, but it’s the way these things start moving - most of the people needed for this are surprisingly resistant to persuasion and management (myself included :wink: ).

The work done by @indygreg in https://github.com/indygreg/python-build-standalone is definitely the right direction, but it needs to be paired with a project that has a clearly defined target audience (or else it’ll just be “yet another” fragmentation of the single-executable packaging tools). Or perhaps PyOxy is exactly the project you’re thinking of?

FWIW, Windows is relocatable because the OS is designed that way. If you look at many of the quirks in python-buid-standalone, they’re due to the OS expecting various absolute system paths to be embedded in executables, which was almost never a thing on Windows (and the ones that were have literally never changed).

Anaconda achieve relocatability by patching RPATH to point at their own private copies of “system” libraries. So it’s not patching up for the current OS, it’s near total isolation from the current OS. The core CPython distros are never going to take responsibility for redistributing these libraries (OpenSSL is bad enough!), so it’ll end up remaining with people who either believe they can care for those libraries, or those who believe they aren’t obliged to.

4 Likes

I have spoken with the release managers about this idea of releasing a self-contained build of CPython for various platforms. For them what they would need is basically a Makefile target that they can run as part of a release, i.e. it’s just another checkbox as part of the release.

It would also have to be messaged as being a simple distribution, but not a complete one, .e.g this wouldn’t work for Tcl/Tk on macOS due to requiring a framework build, not optimized for an OS like Linux distributions provide, etc.

So whatever collaboration happens for this, those are the parameters to aim for.

You also have to acknowledge that distributing a single binary means only wheels will work since there won’t be a Python.h. I realize this isn’t necessarily what the zipapp story cares about, but there’s also the education angle of having a single thing to download to anywhere that would greatly benefit from this (and selfishly I want this for VS Code and the Python Launcher to make installations dead-simple). Else we are now having to define the scope in case you want the single binary or a single zip/tarball that you can unpack like a virtual environment that has all the appropriate files to build an sdist.

But to directly answer the “how to coordinate/collaborate” question, probably a repo that holds a fork of CPython along with a very clear goal. That way we can focus the folks with the appropriate knowledge to try and solve this versus some folks working in silos.

There are unfortunately very few of those. I will happily handle championing this sort of work with python-dev, though.

Don’t we already have to watch out for this for Windows releases since that gets baked in?

But I will admit that whenever I bring up the idea of doing a static build of CPython, OpenSSL is immediately brought up. Usually, though, that’s the only library of concern. But if I’m right about Windows then it’s a bit of a red herring since we already have to care.

5 Likes

This doesn’t need to be a single binary though. From my POV, it’s sufficient to have a relocatable installation, shipped as a tarball containing compiled assets, which are guaranteed to work on the relevant platform.

https://nodejs.org/dist/v16.17.1/

There’s also some work along those lines in…

1 Like

Author of GitHub - indygreg/python-build-standalone: Produce redistributable builds of Python here. (Thanks for the mention, Steve!)

Having somewhat solved this problem, I’ve thought a lot about it.

I have too much context lingering around in my head to capture in a single comment. But I’ll start with conveying some key points.

Foremost, there are both build system / binary portability issues as well as run-time issues. A lot of people (myself included) get transfixed over the obvious problem of how to build a Python distribution that can even run on other machines. This is indeed a hard problem (CPython’s build system doesn’t make this easy and cross-platform binary portability on operating systems like Linux make this harder than it deserves to be).

But even if you solve the build system problems to produce a standalone distribution (python-build-standalone is an existence proof this is possible), there are still a host of run-time problems that need to be tackled. OpenSSL needs to be pointed at trusted CA certificates. Terminfo needs to reference a database of terminal definitions otherwise the REPL (and anything else using readline) is completely broken. Extension module building (or anything else using sysconfig) is likely broken because the build-time settings don’t match the run-time environment. TCL files aren’t found, so tkinter is broken (CPython has Windows-only logic assuming the layout from the Windows MSI installer.). Even resolving the path to the stdlib can be finicky due to how some paths are baked into libpython.

So even if you produce binaries that are capable of running on other systems, there’s a lot that can go wrong when you invoke python. Today, CPython largely buries its head in the sand about these problems outside of Windows. For people like me existing outside CPython, that means you have to supplement the CPython runtime with addition logic to cajole OpenSSL, terminfo, tkinter, etc into working. (I wrote the pyembed Rust crate to handle this, and much more. And PyOxy is essentially the marriage of python-build-standalone + pyembed - via PyOxidizer - to produce a somewhat usable distribution. But there are still run-time corner cases this supplemental code doesn’t cover.)

If there were to be official standalone CPython distributions, you need to solve these run-time quirks to some degree. That requires maintaining a new pile of code somewhere. IMO a compelling case can be made for this logic existing in CPython + its stdlib itself. But that may be a contentious topic, as I’m sure people like Linux distro package maintainers may take issue with certain approaches. I’m not convinced it is possible to solve all the potential problems related to sysconfig on Linux: there’s just too much variance between machines. You may need to distribute your own compiler toolchain with your CPython distribution and point sysconfig at it to enable extension module compiling. (I always intended to do this with python-build-standalone but never found time to do it.)

I started python-build-standalone to support PyOxidizer. And my initial goal of PyOxidizer was to try to achieve single file Python applications, without any temporary files at run-time. I was therefore transfixed with having a single file executable for both the distribution and the Python application built on top. This is still a noble goal to have. But the reality is that while simple and convenient, single file distributions/applications probably aren’t as important as I initially thought. On Windows and macOS, the concept of installers is normalized among users. MSIs or exe installers on Windows. DMGs or pkg on macOS. And on Linux, BSDs, etc it is common to installs apps from a zip or tarball when bypassing your distro’s package manager. So in all cases your user base is likely familiar with some kind of install step to materialize a multi-file app. So having a single file executable only buys you so much.

I learned that single file Python distributions/applications break a lot of assumptions within CPython, the libraries it uses, and especially among Python packages in the wild. e.g. static OpenSSL libraries on Windows blow up in weird ways at run-time. And of course there are __file__ assumptions everywhere. As much as I desperately want to make it possible to have a single file Python distribution and application, they are a lot of pain. So my advice to others is to not get hung up on statically linking everything into a single binary. By all means make it possible to do this in the build system. But think long and hard before you recommend this as the shipping configuration, otherwise you are signing yourself up for a lot of funky bug reports.

I keep telling myself I’d like to upstream more and more of python-build-standalone into CPython. Ideally delete the python-build-standalone project completely, as official CPython distributions of the future would fulfill its use case. So I’m very much aligned with helping CPython produce official standalone distributions. Let me know how I can help further.

15 Likes

I was only bringing up OpenSSL because we already do it. If we were to link in everything CPython depends on, we’d be signing up for glibc and probably some kind of threading library and math and probably a whole lot of stuff I don’t even know about. The experience with OpenSSL suggests we don’t want to do that (assuming you need past experience to convince you :wink: )

1 Like

For relocatable CPython builds on Linux, we could treat manylinux as a base definition for what things could be dynamically linked to.

That scopes out glibc and friends at that level.

1 Like

We have an interesting use case at work, where we have lots of CLIs written in Python, and a complex system to distribute these CLIs to desktops, laptops, and production systems (read: lots of machines). We currently use zipapps which refer to an also-preinstalled version of Python (well, versions as we support several at any one time). It works, but there are occasionally places where that preinstalled Python doesn’t exist and then the CLIs are broken. Single file executables would perhaps solve this (and I looked at PyOxidizer maybe two-three years ago for this purpose). Our environment is much more constrained to one or two versions of a particular Linux distro, and one or two modern versions of macOS, with some dabbling with Windows. That’s not to say that the complexities you outline aren’t real for us too, but they are hopefully more manageable than what you’d find out in the wild. It also means that running an installer isn’t ideal.

I tend to think[1] that people are relatively comfortable with installers for “full applications” - although it’s nearly always a good idea to also provide a “portable” (unzip and go, but not necessarily single-file) installer as well, as many people find that fits their workflow/environment better.

However, for smaller programs, such as local CLI tools like you describe, or utilities being shared between team members, installers are generally not what is needed. Zipapps are a relatively workable solution for this case, but they do have their rough edges, and something better (but less than a “full installer”) would be good.


  1. My perspective is Windows, Linux/Mac may be different. ↩︎

2 Likes

Flatpak provides instructions for creating python applications at Python — Flatpak documentation. The GNOME runtime also provides PyGObject (Available Runtimes — Flatpak documentation), so that presumably comes with a Python install as well. It might make sense to point people at that for “full application” packaging.

2 Likes

Very true and I would be quite happy with that, but every time I have tried to motivate folks around the “relocatable, self-contained virtual environment” idea (usually as a stepping stone towards a self-contained binary) I get “great idea, let me know how it goes!” and then no help. :sweat_smile: So I was hoping the self-contained binary might get more folks excited to help out.

Node is the exact example I have brought up at work when people complain about installing CPython. People seem to just want a single install story on Linux at least, if not a consistent story across Unix (including macOS). Tack on Windows so that installation is literally downloading a file(s) to drop anywhere on disk and you will solve a major pain point I see learners and data scientists bump up against constantly.

Yep, but they haven’t been pushed over the finish line such that they are being discussed upstream. Plus @njs 's goals are bit a bigger than just a tarball.

That’s what I have always assumed we would do. Nathaniel’s experiment suggests this is at least feasible.

That I definitely don’t care about (especially since you need a framework build on macOS to make tkinter work). :grin:

I would rather get something working that’s a good enough solution to slowly improve upon than strive for perfection out of the gate. If you need everything then use the installer or your OS to get CPython.

Yes, but I think if this is explained to be a simple solution instead of the optimal solution then the Linux distros will be okay.

I’m personally okay saying that you need to use wheels with such a distribution of CPython and you must use the full installer or OS distribution of CPython to be able to build sdists.

I’m also okay if installation instructions were:

  1. Download
  2. Unpack
  3. Run this command to use the Python binary to run an included Python script to patch stuff for your machine (could be a no-op or not exist on Windows)

That’s still better IMO than pyenv vs Linux distro vs some other set of instructions that vary based on distro and CPython version. Plus I can automate that in the Python Launcher or VS Code.

5 Likes

python-build-standalone does not use a framework build of tcl/tk and it seems to work? This does require some runtime logic to define the path to the TCL support files. But that’s trivial. Or are you talking about something else breaking?

python-build-standalone doesn’t use frameworks at all and people don’t seem to be complaining…

Reusing the manylinux definitions seems very reasonable to me. While I don’t claim compatibility with manylinux, python-build-standalone does dynamically link against glibc and the binaries are heavily portable. If you have a distro without glibc, you can run the builds that are fully statically linked against musl libc. But then you forfeit extension module loading support.

Relocatable macOS framework builds have been solved for ages, anyway: https://github.com/gregneagle/relocatable-python

1 Like

:person_shrugging: I was told that this idea wouldn’t work for tkinter once, but that opinion may have stemmed from not wanting runtime logic.

OK, so how serious are people about trying to solve this in some form? If people can show intent and what an MVP would look like I can talk to folks at the core dev sprints next week about the topic.

2 Likes

Thanks to @uranusjr for mentioning this post on Permission issues with observability tools and the official MacOS installer. Here I describe perhaps another reason why it might be worth trying to solve this problem.

TL:DR. “Non-relocability” causes permission issues on MacOS that prevent some observability tools from working.

2 Likes