pip install python
Back in 2018 when Kushal first proposed PEP 582, several of us at the core devs sprint sat down and tried to brainstorm requirements for python workflow tooling that “grows with” users from novice to expert, and how well PEP 582 could meet them. (I posted notes.) One of Kushal’s main requirements, based on his experience teaching beginners, is that you should only have to install one thing to get started. So, that leaves two possibilities: either installing Python had to give you the workflow tool, or else installing the workflow tool had to give you Python.
So that got me thinking. Historically, our tools have started with the assumption that you already have a Python, and now you want to manage it. That means every tool needs to be prepared to cope with every possible way of installing/managing Python. It means a beginner-friendly workflow tool has to be part of the interpreter (the main motivation for PEP 582), even with all the limitations that imposes (consider that we’ve spent the last few years working on getting distutils out of the interpreter!). If you want to test your code on multiple Python versions a la tox
/nox
, then you’re on your own for figuring out how to get all those interpreters installed and then tox has to figure out how to find them. It really cramps your options.
But what if we went the other way, and uploaded CPython to PyPI, so you could pip install python
? Well, OK, you couldn’t actually pip install
it because pip
is written in Python, but pretend we had a tool that could do this. Then Kushal’s beginners could install this one tool, and it could bootstrap Python + the packages they needed. The UI could be simple and high-level, because it wouldn’t need users to tell which Python to use or fiddle with venvs or any of that; users could just say <tool> run myscript.py
and let it worry about the details. You could ship your project to a friend on a different OS and when they ran the tool, it would figure out how to get them the same version of Python + packages that you were using, so you could collaborate. For more advanced users, like if you’re maintaining a F/OSS library, you could run tests against whatever version of Python you liked, and new contributors could automatically get the right dev setup so they can run tests, formatters, linters, etc. A beautiful dream! Too bad none of it exists.
So, well… anyway, I wrote a spec for packing a Python interpreter into a wheel-like package called a “pybi” (PYthon BInary) suitable for uploading to PyPI.
But I wasn’t sure if it was possible to build them, so then I built them for all major platforms and all recent Python releases and put them up on a CDN so everyone can download them if they want.
But then there’s the pip install python
problem. So I wrote a python installer and package manager in Rust.
Current status
A lot of you have probably seen bits and pieces of this; I’ve been poking away at it slowly for a few years. But last night I finished support for installing packages from sdists (PEP 517)[1], and that was the last major core packaging feature, so I think it’s time to make it more public and get more feedback. Also this gives me way more confidence that the pybi format is workable, since it’s working.
If you want to try it out for yourself, I currently have a simple demo program, and it should drop you into a Python REPL with several packages installed. (In theory this should work on Linux/macOS/Windows, but I’ve mostly tested on Linux. It requires a rust install, but doesn’t require Python.)
$ git clone git@github.com:njsmith/posy
$ cd posy
$ cargo run
Try import numpy
, subprocess.run(["black", "--version"])
, print(sys.path)
to peak behind the curtain.
More interesting is the code that does it; here’s the main demo code:
let db = package_db::PackageDB::new(
&vec![
Url::parse("https://pybi.vorpus.org")?,
Url::parse("https://pypi.org/simple/")?,
],
PROJECT_DIRS.cache_dir(),
// PackageDB needs a place to install packages, in case it has to build some
// sdists. Using a shared env_forest is efficient, because it means different
// builds can share the same package installs.
&env_forest,
// This is the temporary directory we use for sdist builds. It's also a
// content-addressed store, so if we want to build the same package twice (e.g.
// first to get metadata, and then to get a wheel), we can re-use the same build
// directory.
&build_store,
)?;
// We can resolve and install for arbitrary platforms. But for this demo we'll just
// use the platform of the machine we're running on. Or platforms, in case it
// supports several (e.g. macOS arm64+x86_64, Windows 32bit+64bit, Linux
// manylinux+musllinux, etc.).
let platforms = PybiPlatform::native_platforms()?;
// A "brief" is a user-level description of a desired environment.
// https://en.wikipedia.org/wiki/Brief_(architecture)
let brief = Brief {
// "cpython_unofficial" is the package name I used for my test pybis at
// pybi.vorpus.org. We restrict to 3.10 or earlier because peewee upstream is
// broken on 3.11 (it attempts to use the now-private longintrepr.h)
python: "cpython_unofficial >= 3, < 3.11".try_into().unwrap(),
requirements: vec![
// Simple pure-Python package with some dependencies
"trio".try_into().unwrap(),
// Package with binary wheels
"numpy".try_into().unwrap(),
// Package with entrypoint scripts
"black".try_into().unwrap(),
// Package with no wheels, only sdist
"peewee".try_into().unwrap(),
],
allow_pre: AllowPre::Some(HashSet::new()),
};
// A "blueprint" is a set of fully-resolved package pins describing an environment,
// like a lock-file.
let blueprint = brief.resolve(&db, &platforms, None, &[])?;
// And an "env" of course is an installed environment.
let env = env_forest.get_env(&db, &blueprint, &platforms, &[])?;
let mut cmd = std::process::Command::new("python");
// env.env_vars() gives us the magic environment variables needed to run a command
// in our new environment.
cmd.envs(env.env_vars()?);
The code is new and unpolished and unoptimized; it certainly has bugs and missing features. It needs better reporting, the resolver needs better heuristics and better explanations for failures, and of course there’s no end-user CLI here. And there are nowhere near enough tests. But! It does have from-scratch, pure-Rust implementations of ~all the packaging PEPs: the PyPI simple API, fetching caching and unpacking wheels and sdists, parsers for METADATA and entry_points.txt, requirements, environment markers, invoking arbitrary build backends, platform tags, a full-fledged resolver (using pubgrub), etc. etc. The demo’s not much, but everything it does is real – no cheating.
One unconventional choice is that it doesn’t use traditional venvs at all, and no way to modify an environment in-place. In the framework where every tool starts by assuming an ambient Python environment, then of course you need to virtualize those environments. But here everything starts from an invocation of posy
, so instead we can work with totally declarative environments: you describe the environment you want, and then the tool constructs it on demand. If you want to change something, describe a new environment and we’ll construct that instead.
The implementation currently uses the EnvForest
type you saw in the code snipped above, which is a content-addressed-store full of unpacked pybis and wheels. To run a command in an environment, we check if our content-addressed-store has everything we need, fill in any gaps, and then construct some magic environment variables to pull them all together. So everything is stateless/immutable – you automatically get sharing between different environments when possible, the EnvForest
is just a cache so you can garbage-collect it and if you accidentally delete something that’s still useful then no worries, it’ll be automatically reconstituted on demand. It’s quite pleasant IMO.
(There is one gross workaround where we have to run a regex replacement on site.py
, because of #99312, but otherwise AFAICT everything else Just Works even without venvs.)
What’s next?
I have a pretty clear vision of what kind of tool I want to use, so I’m going to keep plugging away towards that. It’s a big project though, so if folks want to help please do :-). Also I think a lot of parts are useful even if you don’t share the full vision, e.g. having official CPython builds on PyPI would be fantastic for all kinds of use cases. Anyway, what I’m imagining is:
Target audience: anyone writing Python code, from first-day beginners to seasoned open-source contributors, and the continuum between.
Scope: the “bootstrap” process, of getting you from a project directory to a running Python environment, and invoking tools in that environment. So this includes stuff like pinning, managing environment descriptions (posy add somepkg
), and mapping shorthands like posy test
→ spinning up a designated environment and running a designated command in it. But it will never include a build backend, a code formatter, linters, etc. – there are tons of great options for these and posy’s job is to help you run them, not replace them. “Posy is the UI you use to invoke Python.”
So the basic model is that we define a project as “directory containing pyproject.toml
”, and pyproject.toml
contains some environment descriptions + command aliases to run in those environments. (Same basic idea as Hatch’s environments.) The simplest project just has a single default environment and no command aliases, so beginners can get started without even knowing what an environment is, e.g.:
$ posy new homework1
$ cd homework1
$ posy add jupyter numpy
$ posy run jupyter notebook
But of course we maintain lock files for environments, so as our beginner gets more advanced they can commit those to VCS, their friend can check out their project on a different OS and do posy run
to get the same Python version (!) and packages, and later they can add more environments, linters, a build backend if they’re making a redistributable library, etc., so it grows with you.
And… that’s basically it? Maybe some utilities for working with environments, like e.g. a posy export
command that takes an environment and renders it as a standalone directory that you can drop in a debian:slim
docker container or whatever; that’d be pretty trivial to do. And initial versions will only support pybi Pythons, which in practice also means they’ll be restricted to Windows/macOS/Linux; later on maybe we could add the ability to “adopt” a system Python or interoperate with conda or whatever, who knows. But we don’t need that to get something useful.
So yeah. What do you think?
it worked first try?!? Rust really is magic. ↩︎