[announce] Pybi and Posy

njs · January 22, 2023, 7:26am

`pip install python`

Back in 2018 when Kushal first proposed PEP 582, several of us at the core devs sprint sat down and tried to brainstorm requirements for python workflow tooling that “grows with” users from novice to expert, and how well PEP 582 could meet them. (I posted notes.) One of Kushal’s main requirements, based on his experience teaching beginners, is that you should only have to install one thing to get started. So, that leaves two possibilities: either installing Python had to give you the workflow tool, or else installing the workflow tool had to give you Python.

So that got me thinking. Historically, our tools have started with the assumption that you already have a Python, and now you want to manage it. That means every tool needs to be prepared to cope with every possible way of installing/managing Python. It means a beginner-friendly workflow tool has to be part of the interpreter (the main motivation for PEP 582), even with all the limitations that imposes (consider that we’ve spent the last few years working on getting distutils out of the interpreter!). If you want to test your code on multiple Python versions a la tox/nox, then you’re on your own for figuring out how to get all those interpreters installed and then tox has to figure out how to find them. It really cramps your options.

But what if we went the other way, and uploaded CPython to PyPI, so you could pip install python? Well, OK, you couldn’t actually pip install it because pip is written in Python, but pretend we had a tool that could do this. Then Kushal’s beginners could install this one tool, and it could bootstrap Python + the packages they needed. The UI could be simple and high-level, because it wouldn’t need users to tell which Python to use or fiddle with venvs or any of that; users could just say <tool> run myscript.py and let it worry about the details. You could ship your project to a friend on a different OS and when they ran the tool, it would figure out how to get them the same version of Python + packages that you were using, so you could collaborate. For more advanced users, like if you’re maintaining a F/OSS library, you could run tests against whatever version of Python you liked, and new contributors could automatically get the right dev setup so they can run tests, formatters, linters, etc. A beautiful dream! Too bad none of it exists.

So, well… anyway, I wrote a spec for packing a Python interpreter into a wheel-like package called a “pybi” (PYthon BInary) suitable for uploading to PyPI.

But I wasn’t sure if it was possible to build them, so then I built them for all major platforms and all recent Python releases and put them up on a CDN so everyone can download them if they want.

But then there’s the pip install python problem. So I wrote a python installer and package manager in Rust.

Current status

A lot of you have probably seen bits and pieces of this; I’ve been poking away at it slowly for a few years. But last night I finished support for installing packages from sdists (PEP 517)^[1], and that was the last major core packaging feature, so I think it’s time to make it more public and get more feedback. Also this gives me way more confidence that the pybi format is workable, since it’s working.

If you want to try it out for yourself, I currently have a simple demo program, and it should drop you into a Python REPL with several packages installed. (In theory this should work on Linux/macOS/Windows, but I’ve mostly tested on Linux. It requires a rust install, but doesn’t require Python.)

$ git clone git@github.com:njsmith/posy
$ cd posy
$ cargo run

Try import numpy, subprocess.run(["black", "--version"]), print(sys.path) to peak behind the curtain.

More interesting is the code that does it; here’s the main demo code:

    let db = package_db::PackageDB::new(
        &vec![
            Url::parse("https://pybi.vorpus.org")?,
            Url::parse("https://pypi.org/simple/")?,
        ],
        PROJECT_DIRS.cache_dir(),
        // PackageDB needs a place to install packages, in case it has to build some
        // sdists. Using a shared env_forest is efficient, because it means different
        // builds can share the same package installs.
        &env_forest,
        // This is the temporary directory we use for sdist builds. It's also a
        // content-addressed store, so if we want to build the same package twice (e.g.
        // first to get metadata, and then to get a wheel), we can re-use the same build
        // directory.
        &build_store,
    )?;
    // We can resolve and install for arbitrary platforms. But for this demo we'll just
    // use the platform of the machine we're running on. Or platforms, in case it
    // supports several (e.g. macOS arm64+x86_64, Windows 32bit+64bit, Linux
    // manylinux+musllinux, etc.).
    let platforms = PybiPlatform::native_platforms()?;

    // A "brief" is a user-level description of a desired environment.
    //   https://en.wikipedia.org/wiki/Brief_(architecture)
    let brief = Brief {
        // "cpython_unofficial" is the package name I used for my test pybis at
        // pybi.vorpus.org. We restrict to 3.10 or earlier because peewee upstream is
        // broken on 3.11 (it attempts to use the now-private longintrepr.h)
        python: "cpython_unofficial >= 3, < 3.11".try_into().unwrap(),
        requirements: vec![
            // Simple pure-Python package with some dependencies
            "trio".try_into().unwrap(),
            // Package with binary wheels
            "numpy".try_into().unwrap(),
            // Package with entrypoint scripts
            "black".try_into().unwrap(),
            // Package with no wheels, only sdist
            "peewee".try_into().unwrap(),
        ],
        allow_pre: AllowPre::Some(HashSet::new()),
    };
    // A "blueprint" is a set of fully-resolved package pins describing an environment,
    // like a lock-file.
    let blueprint = brief.resolve(&db, &platforms, None, &[])?;

    // And an "env" of course is an installed environment.
    let env = env_forest.get_env(&db, &blueprint, &platforms, &[])?;

    let mut cmd = std::process::Command::new("python");
    // env.env_vars() gives us the magic environment variables needed to run a command
    // in our new environment.
    cmd.envs(env.env_vars()?);

The code is new and unpolished and unoptimized; it certainly has bugs and missing features. It needs better reporting, the resolver needs better heuristics and better explanations for failures, and of course there’s no end-user CLI here. And there are nowhere near enough tests. But! It does have from-scratch, pure-Rust implementations of ~all the packaging PEPs: the PyPI simple API, fetching caching and unpacking wheels and sdists, parsers for METADATA and entry_points.txt, requirements, environment markers, invoking arbitrary build backends, platform tags, a full-fledged resolver (using pubgrub), etc. etc. The demo’s not much, but everything it does is real – no cheating.

One unconventional choice is that it doesn’t use traditional venvs at all, and no way to modify an environment in-place. In the framework where every tool starts by assuming an ambient Python environment, then of course you need to virtualize those environments. But here everything starts from an invocation of posy, so instead we can work with totally declarative environments: you describe the environment you want, and then the tool constructs it on demand. If you want to change something, describe a new environment and we’ll construct that instead.

The implementation currently uses the EnvForest type you saw in the code snipped above, which is a content-addressed-store full of unpacked pybis and wheels. To run a command in an environment, we check if our content-addressed-store has everything we need, fill in any gaps, and then construct some magic environment variables to pull them all together. So everything is stateless/immutable – you automatically get sharing between different environments when possible, the EnvForest is just a cache so you can garbage-collect it and if you accidentally delete something that’s still useful then no worries, it’ll be automatically reconstituted on demand. It’s quite pleasant IMO.

(There is one gross workaround where we have to run a regex replacement on site.py, because of #99312, but otherwise AFAICT everything else Just Works even without venvs.)

What’s next?

I have a pretty clear vision of what kind of tool I want to use, so I’m going to keep plugging away towards that. It’s a big project though, so if folks want to help please do :-). Also I think a lot of parts are useful even if you don’t share the full vision, e.g. having official CPython builds on PyPI would be fantastic for all kinds of use cases. Anyway, what I’m imagining is:

Target audience: anyone writing Python code, from first-day beginners to seasoned open-source contributors, and the continuum between.

Scope: the “bootstrap” process, of getting you from a project directory to a running Python environment, and invoking tools in that environment. So this includes stuff like pinning, managing environment descriptions (posy add somepkg), and mapping shorthands like posy test → spinning up a designated environment and running a designated command in it. But it will never include a build backend, a code formatter, linters, etc. – there are tons of great options for these and posy’s job is to help you run them, not replace them. “Posy is the UI you use to invoke Python.”

So the basic model is that we define a project as “directory containing pyproject.toml”, and pyproject.toml contains some environment descriptions + command aliases to run in those environments. (Same basic idea as Hatch’s environments.) The simplest project just has a single default environment and no command aliases, so beginners can get started without even knowing what an environment is, e.g.:

$ posy new homework1
$ cd homework1
$ posy add jupyter numpy
$ posy run jupyter notebook

But of course we maintain lock files for environments, so as our beginner gets more advanced they can commit those to VCS, their friend can check out their project on a different OS and do posy run to get the same Python version (!) and packages, and later they can add more environments, linters, a build backend if they’re making a redistributable library, etc., so it grows with you.

And… that’s basically it? Maybe some utilities for working with environments, like e.g. a posy export command that takes an environment and renders it as a standalone directory that you can drop in a debian:slim docker container or whatever; that’d be pretty trivial to do. And initial versions will only support pybi Pythons, which in practice also means they’ll be restricted to Windows/macOS/Linux; later on maybe we could add the ability to “adopt” a system Python or interoperate with conda or whatever, who knows. But we don’t need that to get something useful.

So yeah. What do you think?

it worked first try?!? Rust really is magic. ↩︎

pf_moore · January 22, 2023, 8:14am

This is beyond awesome. I hadn’t realised you were actively working on this. I shall be taking a look at it as soon as I can!

I’d love to help out, too. I’m a relative beginner with Rust, so it may take me a while to get up to speed, but I’m sure I can generate “dumb newbie misunderstanding” issues in the meantime

FRidh · January 22, 2023, 9:20am

Really nice to see this! This direction of a path per package is really the direction I think we should be going. Virtualenvs are a waste of space, especially nowadays with a lot of data-related packages that are massive.

From what I gather this is very similar to what we do in Nixpkgs, where we have each package in a different store path, and then compose environments using environment variables/wrappers/symlinks, of course also declaratively.

I’m curious to know more about how you compose the environments. I found it tricky to compose environments, while maintaining an FHS layout (and thus also no top-level pyvenv.cfg) and while keeping the Python environmental variables (such as PYTHONPATH and PYTHONHOME) available to users, and that they can use the composed environments to create virtualenvs not managed by Nix.

Also, correct me if I am wrong, but non-PEP 420 namespace packages can’t be composed using PYTHONPATH. Unfortunately there are still many of those around. We typically run a hook to delete the __init__.py we think need to be deleted.

You mention everything is content addressed. Python bytecode is not entirely reproducible, hence package rebuilds can result in different outputs. How do you intend to handle this when doing a composition? For example, with Nix, we’re actually using inputs to compute the output hash. Thus, if a build is not reproducible, you can still compose because this hash won’t change, whereas the content hash would change.

njs · January 22, 2023, 9:52am

Don’t worry, it’s my first rust code too

Since I’m unpacking and “own” the Python install, I inject a sitecustomize.py that calls site.addsitedir on all the package dirs that are included in the environment. This gives them the same treatment as site-packages, so I think .pth files should be processed, namespace packages should work. I haven’t actually checked though

PYTHONHOME doesn’t really make sense with these environments; maybe we should even clear it on entry? but PYTHONPATH should work fine.

For pybis and wheels downloaded from package index, I use the same hash the index provides. For wheels built from sdists, they actually get indexed as {sdist hash}/{wheel tag}, since the same sdist can produce multiple wheels when built on different arches, and when composing an environment we need to be able to list the existing wheels to check if any of them are compatible or whether we need to build a new one.

I don’t have support yet for installing direct from URLs or a local filesystem directory. tbh I’m not 100% sure how to make it work within the whole stateless/declarative framework – do you go re-read the directory every time you enter the environment to check if it’s changed? For git URLs it’s ok because you can resolve to an exact revision hash when generating the pins, and for direct http URLs maybe we can require the user provides a hash, or mayyyyybe use HTTP caching semantics to query the server to ask if it’s changed? Anyway, yeah, I’m sure we could do more here, but it’s a can of worms and I think most people are just fetching packages from PyPI, so I punted on it for now.

jezdez · January 22, 2023, 10:14am

Well, that certainly blew my mind, count me in to explore how we could make this work for conda, which needs Python as a runtime environment and could profit from a declarative environment to curtail the dreaded conda base environment. I’d be interested in particular how easy it would be to create pybi files in addition to standard conda files (which are similar in structure).

Regarding the topic of what to include in pybis as discussed in the spec, perhaps it would make sense to look into @dholth 's nonstdlib project again as part of the story?

FRidh · January 22, 2023, 10:43am

Yes, I agree, that should work I think.

Users might want to create virtualenv environments using PYTHONHOME. This could be done e.g. as part of a test suite they are working on. Hence clearing it could break downstream usecases.

Okay, so the hashes are actually also input-based and not content-addressed. That is, they are based on artifacts that are consumed, instead of the artifacts that are output/generated (an installed/unpacked package). Here one needs to be careful when building with extension modules though.

Yes, this is a very difficult part. You would indeed take the whole directory and hash it. Or, if if it is a vcs repo, you could clone it, but you also need to check whether it is dirty or not. Then with vcs there are for example tags that are not necessarily stable which is another issue. For Nix Flakes (that’s for managing Nix expressions/recipes like we manage builds already) there is an open issue on that and proposed solution.

sinoroc · January 22, 2023, 11:18am

This checks many of the boxes of what I have in mind as exposed here, so I am really happy to see this. I hope I manage to give it a try soon-ish, but in the mean time:

Why? I do not understand this position. It seems to me like you have all the pieces to provide a UX such as posy install 'https://github.com/httpie/httpie' so that I can use httpie without even having to think about things like how to install a Python interpreter and which one, do I need to create a virtual environment, do I need to use git or pip. httpie is a bit of a silly example because it is probably available in all package managers and installers (apt, homebrew, winget, etc.) but for something that is more niche and hard to install that could be really helpful.

But we also have the following, so I guess it is not completely out of scope:

Python interpreters in wheels (or wheel-ish artifacts) is fantastic!
- (My wish for the future is that we have non-language-specific distribution formats so that we can npm install python or posy install nodejs or pip install gcc or anything like that)
“sharing between different environments”, if it is what I think it is, then it is awesome as well
I think I’d prefer if we had 2 tools as exposed in my post one tool to execute code for the end-user persona; and one tool to write code for the developer/redistributor/packager/etc. personas
I think I’d prefer if this had more Python code, I see the point of using compiled code for the bootstrapping story, but maybe it could delegate to a Python interpreter (and real venvs?) as soon as possible
Collaborate with Brett Cannon’s python-launcher?
We need a lock file format
Does this bring us closer to a single file distribution format for Python applications? (not installer)

Anyway, this is exciting!

h-vetinari · January 22, 2023, 11:51am

It seems the pybi spec already contains the right skeleton for this to be extended in such a fashion (see Pybi-Paths:), but then it really just becomes a way of distributing relocatable binary files (ReBi?), and would have as little to do with Python as GCC (which is still a lot, actually…). I happen to think that this is a good direction, but for now the problem seems to be intentionally much more constrained, and that’s likely a very good thing to get anything done.

Personally I think it’s great that it doesn’t! It would make posy’s job all the harder if it needs the very Python interpreter it’s installing, and we get amazing benefits like using pubgrub by staying away from python for something like package resolution (huge pain point in current UX for all major installers).

Still have to look in more details (I’ve been following the repo for a while without really diving into it), but these are certainly interesting times in python packaging land!

Great job @njs!

pf_moore · January 22, 2023, 12:35pm

These are the sorts of situations that make pip’s codebase way more complicated than you’d hope it would be - people want to “just install from the source that they are working on”, but that “just” hides a whole load of subtle but important differences between a source tree and a distribution. Packaging standards basically haven’t really tackled this yet, leaving installation from source trees as something for individual frontends (i.e., pip!) to handle.

I think that a tool which only installed from formal distributions (sdist and wheel) would still be of significant benefit, and while you may get a lot of “but what about…” comments, I think that punting on this is the right decision until the basic framework is solid. At that point, having essentially two full-featured “installer frontends”, one of which is not even written in Python, will give us a much better incentive to work on standardising whatever makes sense, and having coherent^[1] opinions for the rest. (And I note from the comments @FRidh made that we should probably consider Nix as a third data point in this context, even if Nix isn’t technically a Python package installer in the sense that packaging PEPs mean it).

I.e., not just “well, pip works like this so let’s assume it’s probably good enough” ↩︎

agoose77 · January 22, 2023, 12:36pm

Moving to a space in which the Python interpreter itself can change … that’s a big deal, and I think one of the main selling points of this approach. My background is mainly Physics research (though I’ve always been a Python programmer), and I’ve often found myself loading my interpreter from Conda, and then going all-in on the Python packaging ecosystem. Having posy be able to provide that interpreter better aligns with what general purpose package managers like Conda are doing, and would in my case mean I didn’t need to use Conda in most circumstances^[1]

Conda is a great tool. It doesn’t play well if say you want to use the Python project management tools. So, if I can avoid using it, it means I get to use hatch or pdm. ↩︎

pradyunsg · January 22, 2023, 3:42pm

I’m happy to see this Posy get publicly announced!^[1]

This might just be my mental mindset at the moment, but I can’t shake the feeling that this is xkcd: Standards but across two dimensions; a new alternative to the workflow tooling we have today as well as to how Python is distributed.

Like, GitHub - David-OConnor/pyflow: An installation and dependency system for Python is also a written-in-Rust-and-manages-Python-install tool which makes a bunch of different design choices which made it less portable/reusable AFAICT than this would eventually be (PEP 582 for the virtual-environment-alternative, Posy seems to be inventing its own scheme based off of paths in an env-var; it uses some form of dependency cache while Posy does a proper resolve etc)^[2].

The PyBI-based model for managing Python installations functionally proposes that we should either (a) completely change how Python is distributed by core devs, or (b) add yet-another-way to get Python that is, at least initially, workflow-tool specific. The former is a huge community-wide initiative, that we’d want to get buy-in on from CPython core. The latter is definitionally yet-another-way unless we do something to avoid that issue.

If we take away PyBI and Python management for a moment (eg: like enabling it to work with any pre-existing Python installation while being a single non-Python binary would), I can’t help but view it as an alternative to all the Python-based workflow tools we have today (Poetry’s auto-managed venv, PDM’s “PEP 582” management, similarity to Hatch’s environments is explicitly mentioned, etc). There’s different design tradeoffs to this model compared to those but, as it stands, it is fundamentally an alternative.

Am I missing something that alleviates this?

FWIW, please don’t conflate my caution with opposition or as an attempt to tone down others’ enthusiasm – if the idea is that we all want to lean into this, I’m on board.^[3] I’m mainly wary of a one-more-choice situation and that is coming from a more broad view that isn’t specific to this announcement/tool. Besides, there are lots of things that I like about this model.^[4]

I knew about this effort prior to this announcement. ↩︎
I might be wrong – this is based on a very surface level understanding of both tools/models. ↩︎
I’m very uncertain about how my words/actions will be interpreted; given Should PEP 704 be a PEP? - #7 by pradyunsg ↩︎
I trust that @njs knows this. ↩︎

pradyunsg · January 22, 2023, 4:13pm

Well… I recently spent a lot of digital ink to discuss that (a) our heavy focus on standards and (b) having multiple choices is leading to a bad UX for end users.

theacodes · January 22, 2023, 5:47pm

This is very interesting stuff and even if this ends up being another tool in the crowded toolbox, I am very glad to finally see a new approach and vision. Existing tools work within the current constraints of Python and virtual environments, I think it’s this sort of higher-level vision that we need to really push Python’s user experience in a better direction.

I don’t think we’ll ever make the “one tool to rule them all” without shattering our assumptions and pre-conceived notions of how packaging could and should work.

BrenBarn · January 22, 2023, 6:45pm

I think this is an important feature, and the lack of it creates confusion for users, because they have to make multiple choices (where do I get Python, then where do I get some environment manager, then what do I do if I want to change the Python version). Moving to a model where the “top level installed thing” is not Python itself but an environment manager which can manage multiple Python versions seems to me like a good step forward (and in fact by coincidence I just mentioned this as a desideratum on another packaging thread).

That said, there is already at least one tool that does this, namely conda. Can you comment on how posy (either now or in a future where it gets more fleshed out) would compare to conda in terms of functionality?

pf_moore · January 22, 2023, 7:10pm

Yeah. And I’m still digesting it - but I agree with most of what you said.

And I can see why “yet another approach” is just making the problem worse. But what I like here is that it’s specifically trying to solve for the whole project lifecycle model that @njs linked to above (and I’ve referenced many times in the past). Most tools and approaches I’ve seen either frame themselves as “beginner friendly” (stage 1 and maybe 2), or as aimed at stage 3 (deployable webapp/reusable library/standalone app) and later. And both groups assume that stages 1 and 2 - “simple scripts” and “sharing with others” are beginner workflows, not needed by more advanced users^[1]. Or at least, that’s how the documentation, examples and discussions feel to me.

I’ve no idea whether this project will succeed in unifying the full lifecycle described in that document. I don’t know if it’ll make our existing problems worse. I’m concerned about the fact that it’s inventing new mechanisms for things like isolation that may or may not work. I suspect that a model based around heavy manipulation of sys.path will cause huge problems for the static typing community, for example. But I’m pleased that someone is looking at a problem which I feel like struggled to express well enough to get the existing tools to pay attention to^[2], and I’m glad that we’re still innovating, and not just fighting to consolidate what we have and deal with legacy issues.

Look at scientific users struggling with “one venv per project” or “projects need to be built to be used” models to see what I mean. ↩︎
Not that I want to try to claim to be some “lone voice in the wilderness” who’s the only one who sees the real issue here. ↩︎

pf_moore · January 22, 2023, 7:12pm

I also spotted this parallel, and given my general reservations with conda for my own needs, I wondered what the difference was. For me, it’s the fact that posy will consume wheels from PyPI, and not require a separate, parallel set of builds of “everything”. I can’t speak for @njs, but that’s the key difference for me.

rbtcollins · January 22, 2023, 7:27pm

Awesome! I’m not sure if this is enough to get me actively contributing here again, but its very cool and a lovely approach to the problem space.

BrenBarn · January 22, 2023, 7:58pm

Interesting. To me this illustrates, though, that the “too many ways” packaging problem has many aspects that are not technical. The “advantage” you describe is 100% a matter of messaging, endorsement, and implicit patterns of behavior. It is only because people perceive PyPI as “the default”, because pip comes with Python, and so on, that PyPI is perceived as the “normal” repository and build system and conda is a “separate” one. What you are essentially saying is that there is no technical difference between conda and this new posy; it’s just that posy integrates with a system that has a certain social status (PyPI).

This is not to diminish the innovative work Nathaniel has done here in terms of the implementation, but from my perspective the way to build on this is not to get too attached to this implementation, or conda’s implementation, or any other implementation, but just say: “Yes, it would be a good idea if the default way that people think about Python is that you do not install Python; instead you install a program that manages Python (along with managing libraries used by it). Posy does that. Conda does that. But what is the best way to do that and to integrate that feature into a coherently designed Python-packaging utopian vision?”

smontanaro · January 22, 2023, 8:04pm

This is the first I’ve seen of this (I’m not a big user of packaging bits, barely knowing how to use build, twine and pip, so I am almost certainly way off-base here, but… Doesn’t this substitute one bootstrap problem (having a runnable Python environment) for another (having a runnable Rust environment)?

What am I missing?

davidism · January 22, 2023, 8:17pm

Presumably binaries of the tool will eventually be available for download, rather than needing to compile it locally.