Distributing CPython via PyPI

njs · September 18, 2019, 4:29am

That’s a question for @steve.dower

steve.dower · September 18, 2019, 6:51am

Check out the PC/layout script. That’s what we use for setting up our packages (it’s the moral equivalent of “make install” but with more options).

BTW, Windows is the trivial case here. If you search on nuget.org then you’ll find official packages containing Python that can simply be extracted. They’re on Nuget because it integrates well with Visual Studio based projects. Adding a PyPI package with just Python for Windows could be done in minutes.

The other platforms are more difficult. You would prove the feasibility of this idea by making a suitable Linux build. Windows is done, and will prove nothing.

David-OConnor · September 18, 2019, 4:45pm

Nice! Your builds on nuget.org for 3.5, 3.6, and 3.7 all worked for me.

David-OConnor · September 20, 2019, 6:43am

I’m able to get Python 3.4.x - 3.7.x builds made on Ubuntu 18.04 to work on Ubuntu 19.04 and Debian (Unsure which version; whichever comes with WSL). Note that this is not true for builds made on Ubuntu 19.04: Those fail to work on each of the other two. Unable to build with optomizations for 3.6.9 and 3.7.4 for some reason; the former terminates with an error, and the latter takes (projected) on the order of days.

Perhaps it’s worth creating an official 90% solution, where python.org hosts Steve Dower’s binaries for Windows, Debian/Ubuntu binaries built with an OS that supports the widest version range, RH, and Mint as well. I suppose the crux of this is “what is the use case?”. I have a specific one in mind, but perhaps this isn’t worth it if the demand isn’t there.

njs · September 20, 2019, 7:21am

@David-OConnor We already maintain a whole pile of infrastructure for creating distributable Linux binaries that work on a wide range of distros: the manylinux specs (PEP 513, PEP 571, PEP 599, PEP 600), the auditwheel tool that can inspect packages to check how portable they are and rewrite them to make them self-contained, and the manylinux build images (source, images) that tie it all together.

So there’s no need to invent a new way to build Linux binaries here; we can re-use all this stuff. The challenges are:

figure out how to build Python using the manylinux2010_x86_64 build image – this requires fetching or building some dependencies, but it should be pretty straightforward, because the docker image itself already has to do something similar to create itself, so you can peek at its source to see how it does it
figure out how to make the resulting Python relocatable – this might be trivial, I just haven’t tried it so I don’t know
figure out how to bundle up the interpreter and its required libraries into a single distributable package – auditwheel already has the code to do 95% of the work, but it probably needs some tweaking to run on a regular .zip instead of a wheel .zip

David-OConnor · September 25, 2019, 2:35pm

I’ll take a stab at it. The Ubuntu 18.04 builds I made also work on Kali, but not on Centos. Was able to build on Centos with no trouble though, using the same steps.

steve.dower · September 26, 2019, 1:14pm

Yes, this is how it will be. CPython is optimized to be easy to build, because it comes from a time when people only really moved source code around and not binaries. The level of customisation to worry about on most platforms is pretty high, and so precompiling something that will work is a challenge.

You probably want to start from the manylinux definitions and create a CPython build image that uses the same versions of all the libraries. That way your build should be as portable as the wheels that people publish under that name. You’ll have to tweak the makefile to put the output into a single location, though you shouldn’t have to statically link anything unless it’s not in the build image.

ofek · March 29, 2020, 3:35am

Any progress on this? Can I help?

steve.dower · March 29, 2020, 10:29am

Perhaps surprisingly, I’ve made something similar happen at work for “enough” platforms, so I can probably write up the extra steps needed to make things realistically portable/relocatable.

But PyPI is still the wrong place. We really need a Python-independent repository for this, and a package format that can do some pre-install checks and post-install commands to make sure it’s going to work.

Distributing a plain zip file with documentation (as we’ve done for Windows for some years) just leads to the people who don’t read documentation getting frustrated when it doesn’t work.

Turns out, apt/yum/etc. meet those requirements. So perhaps there are issues with those that should be resolved instead of trying to reimplement them?

uranusjr · March 29, 2020, 12:24pm

I would be most happy if the solution is a cross-platform one that works the same everywhere, like Rust’s rustup. It would be a tremendous benefit for teachers, instructors, and documentation writers. No more “do this if you’re on Windows, or if you’re on Linux and a Debian-derivative (student: what’s a derivative?) unless you use Nix blah blah blah.” Relying on a system package manager sounds like introducing even more fragmentation problems than Python already has.

pradyunsg · March 29, 2020, 5:51pm

Ah! I think that’s a fairly reasonable ask – a uniform installation experience across different platforms. I don’t think PyPI needs to factor into this approach though. The relevant binaries (and other stuff) can still be on files.pythonhosted.org for all I care, but they’d not be pip installable (much like how you can’t cargo install rust==1.0).

I think this would be a reallly useful discussion to have on a broader forum (like python-ideas) since as @zooba noted, the Python Packaging tooling isn’t sufficient for this use case.

steve.dower · March 29, 2020, 8:13pm

Conda works the same everywhere. What can’t it do that you need?

pf_moore · March 29, 2020, 9:06pm

Answering for myself - interoperate cleanly with other tools I’m familiar with (virtualenv, pip, etc) and that most documentation talks in terms of. It may be that things have improved, but that was definitely the case the last time I tried to work with conda.

I don’t have a problem with conda, but if it were the solution for everyone, we wouldn’t be distributing Python installers via python.org at all. There’s no reason we can’t improve the python.org installer experience just because conda exists.

steve.dower · March 30, 2020, 7:29am

That’s true, but the reason we do distribute installers (including “make install”) is for native-like platform integration. Which means a lack of consistency across platforms. But if you want consistency, you have to abandon native integration, which breaks the assumptions made by those other tools - even venv and distutils don’t handle non-standard installs well. I even had to contribute a few improvements to pip to make it work with the Windows Store package (as well as bundle a pip.ini with new default settings).

The logical progression here will be to create a new tool to install Python, which will necessitate wrapping/patching those other tools to work with it. But it’s not clear whether we’re looking to replace the platform installers or Docker/conda/(not venv)-style localised environments.

In fact, it’s not really clear to me that anything needs to be done here. I get that there will always be a few people with special requirements (like me), but we don’t have to shift the official releases or influence those we don’t directly control for them. We provide source code and endorse rebuilding for these people. But if a large number of users are encountering issues, we ought to look at those issues rather than start by assuming we need a reset.

njs · March 30, 2020, 7:59am

The use case I had in mind when I started the thread, and that I think we really don’t serve as well as we could, is when you want to quickly and automatically install a local “scratch” copy of a specific version of Python. For example, think of tox trying to run tests against 5 different Python versions, or pipenv trying to bootstrap an environment with a specific Python version, or a CI environment where I want to grab a nightly build for $CURRENTPLATFORM and run tests. Right now these all require complicated manual intervention and expertise by humans, but there’s no inherent reason why they have to.

For these use cases, you don’t care about native-like platform integration; in fact, it’s actively undesirable. You want the Python to be self-contained and isolated, not integrated with the system. The target users are programs, not humans, so it’s more important to be consistent across platforms than to provide a familiar native experience.

Docker doesn’t solve this, because it’s (effectively) Linux-only, and requires root. (Though if we had builds like this, then they’d probably get used by folks building Docker images.) Conda is much closer, but bootstrapping a conda environment is much more complicated than just downloading a zip file and unpacking it, and since conda is a whole self-contained distribution then it’s quite complicated to do things like ship Python nightly builds.

I don’t think this would be a replacement for the current python.org installers, or for conda or distros or the Windows Store, but it would fill a gap that none of those handle well.

The closest thing to this currently is pyenv, but it’s a bunch of shell scripts so it doesn’t work on Windows, often needs to build stuff from scratch, etc. (Though again, if we had standard standalone builds, then pyenv would certainly start using them where applicable. I think the popularity of pyenv demonstrates the demand for this kind of tool.)

pf_moore · March 30, 2020, 9:41am

Actually, the closest thing is probably the nuget distributions, but these are Windows only.

steve.dower · March 30, 2020, 9:50am

The closest closest thing are the builds I have running at work that get pulled down into other builds, extracted and used (or redistributed embedded into another app). But that’s a constrained enough environment it can work. The general case is much harder.

Also, when I say conda I’m implicitly assuming just conda - more akin to the Miniconda distro than the big one. But I know that’s not a universal assumption, sorry for not clarifying.

h-vetinari · March 30, 2020, 10:48am

This would be possible to integrate with conda - e.g. every night, a state of cpython is packaged into a conda-package. A conda create -n test_nightly python=<version>_<hash> goes very quickly (doing the same with python=3.8 on my windows) takes <10 seconds**

The knock-on problem would be that the conda index wouldn’t have any packages for the next python version yet, but that could broadly be considered in the wider scope of CFEP-5. It would need a so-called “migration” to add the nightly build to repos (perhaps as non-binding CI), so it’s a bit more resource (and volunteer-)intensive.

But barring that, maybe the golden middle would be to install a nightly python build packaged by conda, and then pip-install the remaining (or missing) packages in that environment.

** installing the following packages: ca-certificates-2019.11.28, certifi-2019.11.28, openssl-1.1.1e, pip-20.0.2, python-3.8.2, python_abi-3.8, setuptools-46.1.3, sqlite-3.30.1, vc-14.1, vs2015_runtime-14.16.27012, wheel-0.34.2, wincertstore-0.2 (note: python_abi is a recently added selector to allow conda to install either cpython or pypy; which seems to fit this subject as well…)

njs · March 30, 2020, 11:38am

Yeah, really there are a lot of things that are almost this: pyenv, the nuget packages, the official Windows zip files on python.org, PyPy’s portable builds and nightly builds, miniconda, etc. I have actually tried using all of these at various points to solve these problems (and am currently using several of them in an ad hoc way to cover different parts of the problem). But my experience is that none of these are nearly as smooth as they could be for this use case.

For example: let’s say pipenv wanted to use miniconda to automatically fetch any python version listed in a Pipenv file. To do this, someone would need to write code to:

figure out which miniconda installer URL is appropriate for the current platform
download the miniconda installer
run the miniconda installer. By default this is interactive and wants to modify your user environment, so you have to pass in special arguments to disable this, which are different on different platforms and redirect the install to a temporary directory. (Also, from those docs I’m not actually sure whether the Windows installer has a mode that doesn’t modify account-wide configuration?)
now that you have a full python+miniconda install, you can invoke it to create the actual python install you wanted in the first place… if it’s actually available in the conda repos (the default repos don’t even include builds for all python point releases, never mind betas, nightlies, etc.)
and eventually when you clean up the environment, you have to remember to delete the whole miniconda install.

Nothing against miniconda, it’s great for its target use cases, but this is dramatically more complicated than it has to be for this use case.

Here’s a straw-man alternative:

An “axle” is a file named like cpython-3.8.1-manylinux2010_x86_64.axle, or more generally, ${INTERPRETER}-${PEP 440 version}-${wheel-style platform tag}.axle. It’s a zip file, that when unzipped contains a standalone python environment, whose interpreter can be invoked as bin/python.
We allow axles to be uploaded to pypi. We use the same namespace as we use for regular packages, and the cpython name is of course owned by the cpython core team, the pypy name is owned by the pypy team, etc., re-using all our current administrative systems. Basically just teaching pypi/twine/etc. that these are valid files to upload.
When pipenv wants to bootstrap a python environment from scratch, it uses the standard version/platform resolution code it already has for wheels to find an appropriate axle, downloads it, unzips it, and then downloads all the wheels and installs them into the resulting environment.

(An axle isn’t a wheel; it’s a thing you install wheels onto.)

steve.dower · March 30, 2020, 12:27pm

It’s great on theory, but once you start expanding the list of things that won’t work you may reach a different opinion. Certainly when users start complaining that their one case fails because (e.g.) it’s not in the registry or it can’t run GUIs or it doesn’t upgrade properly.

I have no issue with pyenv or conda or anyone else doing this for themselves. I just don’t want to be on the hook to support it.