Making Python relocatable on Linux

@njs:

The use case I had in mind when I started the thread, and that I think we really don’t serve as well as we could, is when you want to quickly and automatically install a local “scratch” copy of a specific version of Python. For example, think of tox trying to run tests against 5 different Python versions, or pipenv trying to bootstrap an environment with a specific Python version, or a CI environment where I want to grab a nightly build for $CURRENTPLATFORM and run tests.

Have you looked at Spack? It’s designed not just to install arbitrary versions of things, but arbitrary configurations. Given a spack.yaml file in a directory, like so:

spack:
    specs:
        - python@3.4.10
        - python@3.5.7
        - python@3.7.6

The workflow is:

$ git clone https://github.com/spack/spack
$ . spack/share/setup-env.sh
$ spack install

The recipe for Python is portable and works cross-platform.

You can also test more complicated matrices, e.g., 3 python versions built with gcc and icc, with different flags, for different architecture targets:

spack:
    specs:
        matrix:
            - [python@3.4.10, python@3.5.7, python@3.7.6]
            - ["%gcc", "%intel"]
            - ["cflags=-O3", "cflags=-O2"]
            - [target=broadwell, target=skylake]

That’ll give you 24 different python builds (it’s a Cartesian product).

Caveats:

  1. No windows support (yet).
  2. Spack builds from source by default, as we don’t (yet) have a public binary mirror for commonly build configurations. That’s something we’re working on. You can make a binary cache, which will create relocatable binaries so you don’t have to keep rebuilding the same matrix over and over.

Anyway, the caveats are probably a roadblock at the moment, but the tool is designed to build and package many different versions of “whatever”, which seems very close to your use case.

OK, so first step: how do we get relocatable interpreters for Linux? I’m starting there because:

  1. It could work nicely in containers
  2. It somewhat covers Windows through WSL
  3. May it’s generic UNIX enough that macOS would work?
  4. Steve wouldn’t be on the hook :wink:
  5. Need to start somewhere

Now if there’s some post-download patching that’s easy to do but required, that gets into the rustup-like solution that @uranusjr suggested. But first we would need to know how close can we get to the venv-in-a-zip-file experience and what is required to even get there. I personally don’t know and so this is a legitimate question to list out what is required to gauge workload to then gauge the work required and to then see if people are motivated to do the work.

OK, so first step: how do we get relocatable interpreters for Linux? I’m starting there because:

  1. It could work nicely in containers
  2. It somewhat covers Windows through WSL
  3. May it’s generic UNIX enough that macOS would work?
  4. Steve wouldn’t be on the hook :wink:
  5. Need to start somewhere

I mentioned this above, but Spack will build relocatable binaries for MacOS and Linux. You could just make a Spack binary package, install it, and zip up the result.

By default it’ll build its own dependencies but you can point them at system installs in containers.

Since the recipe is designed to be cross-(unix)-platform, would this be a good starting point?

I’m pretty sure CPython on Linux is relocatable by default – just ./configure --prefix=tmpdir; make; make install and then zip up tmpdir. And up-thread it sounded like @David-OConnor confirmed that. On top of that, you need to build using old distro (like CentOS 6/7, same as manylinux), and then you need to bundle the required libraries like openssl (same thing auditwheel does for wheels, could use the same code), but AFAIK those are the only required steps. I can’t think of anything that would need a post-install step.

@steve.dower How are the nuget packages different from a zip file that gets extracted somewhere? Is there a post-install script?

That is basically what the “portable PyPy builds” do. In order to make sure curses works you need to also ship the tk and tcl runtimes, since they are tied to the version of libtcl you would bundle. PyPy has had pretty good success with these, you can use the tarball anywhere the multiwheel2010 version of glibc is supported. The only post-install step (and truthfully it could be done as part of the bundling process) is to run pypy -mensurepip, and maybe copy/symlink the bin/pypy executable to various aliases in bin/

Possibly! If this became a thing then long term it might be tricky simply because we would have to take ownership of those recipes or something to make sure they stayed functional.

Great! I guess the next step is someone tries this out to make the “pretty sure” a “definitely sure” with a zip containing everything compiled into the binary and links to the file here so people can poke and prod it to help verify it works as expected.

If that works out then I think we can consider how serious we are about this idea, making this work on other platforms, deciding where to host the files, a pipeline to automate this as part of nightly and releases, etc. to make this just another supported way to get Python.

1 Like

Cool! The recipes themselves are written in Python so I do not think that would be too much of an issue for the broader community. It’s possible to have recipes in a separate package repo, completely forked from the mainline, though. So there’s that. What are the obstacles here?

We also are working on setting up a cloud build farm for Spack, and I’m somewhat interested in whether Spack can be used as a portable way to build wheels with native dependencies. Matthew Brett had suggested this to me at one point, and this would be a decent POC.

I was surprised by this – I would’ve thought the goal was to build something that used native platform dependencies (at least for OpenSSL). The other libraries I could see bundling, and that’s actually easier (for Spack) than using installed ones. Is bundling OpenSSL (and other libs that benefit from OS updates) really what you want?

That’s one of my concerns :slight_smile: That would mean that people who do expect those things to work would be installing it. And “work” means IDEs know about it, means the IT department (or security-conscious team member) approves of it, and many more things that we can reasonably take care of with a system package.

You’re the only person on this thread who can actually look at what I did :wink:

Basically, the pkg-config files need hacking, and you need to specify LD_LIBRARY_PATH to run it if you --enable-shared (which I do, but I guess that’s not required). Anything that embeds the prefix path needs updating (but I think this is just ensurepip and possibly pydoc_data). I think sysconfig ignores all the path info in the bundled Makefile, so that should be fine.

And of course you need all the system binaries to match perfectly. :man_shrugging:

No it’s not… It’s a terrible idea from the POV of working on a particular platform and obtaining security patches. But because of the strict ABI dependency, there isn’t really a choice.

Either we bundle, or the user builds on their target platform (or downloads a platform-specific build from someone who built on the target platform).

You can embed relative RPATHs with $ORIGIN to avoid the LD_LIBRARY_PATH requirement – is there a reason not to use that?

I’d be inclined to bundle all but OpenSSL, just to avoid the user having to install things as much as possible. Have you seen ABI breakage for any of the OpenSSL calls Python makes? If we are tying the binaries to distro major versions, I wouldn’t anticipate major issues here, but I have not been tracking it.

Right, that’s what auditwheel does. (Or maybe it uses RUNPATH, I forget, but that’s basically the same.)

We don’t tie binaries to distro major versions; we use a specific oldest-supported-distro environment to build binaries that will work across all currently-supported distros. And openssl has a history of doing stuff like breaking ABI compatibility in security hotfixes. And the distros we’re targeting have a ton of diversity, covering multiple openssl major versions. I’m not sure we can even assume that openssl is available.

There’s been a ton of work on this already for distributing wheels (there are also lots of wheels that depend on openssl, so they have the exact same issue). When I say “let’s reuse auditwheel” I’m saying we should re-use that work instead of reinventing it :slight_smile:

Also note that Windows and macOS binaries have to bundle openssl regardless, so bundling it on Linux as well doesn’t necessarily create any extra work for maintainers.

I think that works though because Python itself is using up to date dependencies. So even though wheels build/validate against old/broken version of zlib (at least in manylinux1), once they get onto an actual machine they use what’s there.

Essentially, this will be a scale issue. The first hundred users might seem to be fine, but as the total number increases, the <0.1% issues will start to show up in increasingly large numbers.

(And I’d love to stop shipping OpenSSL on Windows and macOS, as those platforms have their own perfectly sufficient APIs for most of what it gets used for. But Python made OpenSSL into public API, rather than an implementation detail. When faced with that kind of choice in the future, avoid it :wink: )

There seems to be a couple of sub-threads here, but I just wanted to give a hearty +1 on this.

In addition to the uses already mentioned, this would help a lot with folks developing buildpacks for Python. For example, the Heroku buildpack currently has to compile Python from source, where this could just be a download & extract if we had relocatable interpreters for Linux.

My gut reaction is that I would expect these to be listed/hosted on https://www.python.org/downloads/, similar to https://nodejs.org/dist/latest-v10.x/ for example. Any reason why not? (I’m not sure if I agree that they should be hosted on PyPI.)

+1, there is a ton of overlap here.

Not off the top of my head, but I don’t think any of us have thought about it deeply yet.

I’m personally still waiting for someone to post a zip file of a relocatable build of CPython 3.8 for Linux and the instructions on how to reproduce it to demonstrate this is all feasible before worrying about distribution. :wink:

1 Like

Not sure how useful it would be, but I did some research for my PythonUp tools a while ago. This is the closest thing I found:

I did some very basic testing and it seemed to work, but I have no idea if it still does or how well.

If my memory serves, I also asked @freakboy3742 back then about it, and he was also looking for something similar without much success :pensive:

Because of this discussion I had a look at statically linking Python 3, and successfully made an interpreter using musl. With Nixpkgs master it is already possible to build a working statically-linked interpreter, however, at this point the extension modules are not yet included. I’ll have a look at https://wiki.python.org/moin/BuildStatically and Setup.dist later. Including tkinter is going to require more effort (read: fixing recipes).

Nix recipe as well as artifacts can be found at https://github.com/FRidh/static-python

Hello,

this is not exactly a zip file, but close :stuck_out_tongue:

In order to distribute Python runtimes to my colleagues I relocated the manylinux Pythons to AppImages. Those are available from the python-appimage GitHub releases. One can extract the content to a folder and run Python from it. For example as:

wget https://github.com/niess/python-appimage/releases/download/\
python3.8/python3.8.3-cp38-cp38-manylinux1_x86_64.AppImage
chmod +x python3.8.3-cp38-cp38-manylinux1_x86_64.AppImage
./python3.8.3-cp38-cp38-manylinux1_x86_64.AppImage --appimage-extract
mv squashfs-root python3.8.3
./python3.8.3/AppRun

The resulting python3.8.3 folder can be moved around. One can also pip install to it for example as:

./python3.8.3 -m pip install numpy

Alternatively one can directly run the AppImage. But it is write only and so it doesn’t support “system” like pip install.

I havent tested it much outside of my personal use cases. There are probably cases that would still fail.

Currently the python binary and C extension modules have their RPATH set to relative using $ORIGIN. Most shared libs that they depend on are also bundled with a relative RPATH. In addition, I am patching the shebangs of scripts installed by pip in order to use relative paths. This is done with a dedicated sitecustomize.py that registers an atexit task triggered if pip is found in the loaded packages.

1 Like