PEP 711: PyBI: a standard format for distributing Python Binaries

mitsuhiko · April 23, 2023, 9:39pm

I want to voice my strong support for this type of proposal. While I’m not sure about the particular details about it, I am very much looking forward to something of that nature.

I have been using the PyOxidizer builds of Python (GitHub - indygreg/python-build-standalone: Produce redistributable builds of Python) as primary python builds for my development machine for a while now via my experimental rye tool. While those builds are not perfect for development (they for instance use libedit instead of readline due to GPL reasons), they are still so incredibly comfortable because I don’t have to compile anything on my own and can switch effortlessly between Python versions.

WhyNotHugo · April 26, 2023, 4:00pm

I feel that, in its current form, this proposal completely ignores the last couple of decades of prior art. There is not a single mention of how distributing Python works on downstream distributions or how this would be an improvement.

Mind you, I’m not saying that this is not an improvement; I’m pointing out that, if it is, there is no mention of why this would be an improvement in this PEP because there is no mention of the status quo at all.

From the Motivation section above:

It becomes quick and easy to try Python prereleases, pin Python versions in CI, make a temporary environment to reproduce a bug report that only happens on a specific Python point release, etc.

All these environments already have a mechanism for downloading and installing Python, so a few unanswered questions come to mind: How is a new delivery mechanism an improvement over the existing ones? Why is a new format required? Is a new package manager required to handle this package format? How does this interoperate with existing distribution package managers? How are runtime dependencies resolved?

I understand that quite a few distributions don’t have a working, vanilla (e.g.: unpatched) Python. I’ll assume that the goal here is to address those particular distributions, there’s an obvious question that remains unanswered: why is packaging in this new package format better than shipping a native package for those distributions?

Finally, I see here a proposal for a new package format (which will then require the support tooling around it), and I cannot help thinking that this is perfect example of xkcd927.

njs · April 26, 2023, 8:12pm

The point is to have a consistent way to get any version of Python on any system. Linux distributions generally only offer maybe 2 Python point releases at a time, usually not the latest, and there’s a bewildering variety of ways to install them – never mind cross-building environments for system X when running on system Y, or bundling up a python environment to send to a friend!

Basically the situation is exactly analogous to wheels. And I know some people wish that pip install didn’t exist and instead everyone was forced to get python packages through their distribution, but I don’t feel like I need to spend a lot of words explaining why pip install is useful

tacaswell · May 10, 2023, 12:39am

I think this is a bit unfair, in the recent discussion there have been a number of people who have raised concerns about the wheel approach. https://pypackaging-native.github.io is the result of discussions here [I know many people reading this are well aware ] and while looking for a link to the PEP 704 discussion I came across a question abut how to share so s between wheels.

Many people are choosing to disengage, are using more suitable to their problem tools (system packaging, conda, containers), or quietly finding workarounds to put non-Python dependencies in wheels.

I want to co-sign basically everything @BrenBarn said above

I also suspect that if you go down @indygreg 's suggestion of shipping you own compilers and start having non-Python software in wheels there will be pressure to put shared libraries into their own wheels (e.g. packaging libhdf5 for h5py , pytables, and netcdf to depend on) and then you are most of the way to re-writing conda.

I would say the sdists uploaded to pypi are the backbone of the Python ecosystem, not the wheels. Eliding sdists and wheels to be “at the same level” is not correct. As I said in another thread sdists are a point of truth for what a release “is” and wheels are binary artifacts for one (of many) binary package managers that is derived from the sdist (that by historical path happens to be hosted adjacent to the sdists).

In all of these discussions I am not sure I have a clear idea of what about conda does not serve people well. Among reasons I think I have heard:

wall time to get from a tag to <tool> install package working with conda forge. But that is a cost of a central build farm [ok, public CI] and can be solved by a local channel on top of conda-forge
does not work with python.org binaries. But that is because conda provides its own Python that is built consistently with all of the c-extensions.
the solver is slow. But that is due to trying to be “correct”, some choices about what versions to (continue) to expose, and they just switched to using a faster solver implementation
it is controlled by a company. But that is not true anymore
conda envs have to be activated. But that is because you can install scripts / env variable to be set on activate. Some is this basically direnvs but for environments not paths and some of this is getting c-libraries to behave correctly in all cases.

From my comments, I am not particularly persuaded by these arguments. Are there others I am missing or am I not giving these issue enough weight?

pf_moore · May 10, 2023, 8:48am

It depends on the segment of the user base you are considering. For the users^[1] I deal with personally (predominantly Windows users, with no compiler, who found Python from python.org^[2]) the availability of binary wheels for numpy, pandas, matplotlib etc., is the core factor. Sdists are useless for anything but pure python libraries.

And myself for that matter! ↩︎
Some of whom tried conda because I mentioned it as an alternative, and abandoned it because they hated it (their words, not mine) ↩︎

pf_moore · May 10, 2023, 9:23am

From the feedback I’ve had, and my personal experience:

The vast majority of Python documentation and tutorials that describe using pip, venv and similar tools. Using conda means independently learning how to translate such instructions, and accepting the risk that you get something wrong in the process.
This may be out of date, but my recollection is that conda didn’t come pre-configured with conda-forge active. So the “out of the box” experience (important for all the people who don’t read the manual!) is suboptimal and frustrating.
Very subjectively, some people simply don’t like the conda UX. Personally, I don’t like activating environments (and if I do, I prefer to start a subshell with the environment activated, rather than modifying my existing shell). The conda concept of channels is obscure and frustrating for some people.

All of this can be argued as subjective, but again that’s the whole point. We cannot force a whole section of the Python community to use a new tool that they don’t like, and expect it to be a good experience. Languages like go and rust got to make that decision because they were starting from nothing. We don’t have that luxury.

Conda is great - many people swear by it and there’s no reason to believe they are wrong. But it’s not for everyone, and trying to insist that it is will fail. Creating a new tool that appeals to everyone might be possible, but I’m not sure - people are rather set in their preferences by now. And it’s a lot of work for an uncertain result. Why not just accept that multiple tools can co-exist, and work on making them work well together, and on guiding new users through the process of picking the tool that works best for them?

jeanas · May 10, 2023, 10:43am

Speaking for myself, this is one thing I’m not too enthusted about with Conda. It seems that if I want to make my package installable “the way everyone expects” (i.e., from conda-forge), and especially if I make this my main method of distribution, I need to tie myself to the conda-forge community and infrastructure. Speaking as the author of the thread about python-poppler-qt5 that you linked to, I might eventually contribute a python-poppler-qt5 recipe to conda-forge, but for the time being I find learning about a separate repository and getting a recipe reviewed and merged then from “staging” to conda-forge and committing to later update it with releases (and, as you say, wait a certain delay after a release) a bit daunting.

steve.dower · May 10, 2023, 1:22pm

It’s very easy (and free!) to get your own channel at anaconda.org, and then your package is available as conda install -c <your channel name> <your package name>. You can even reupload builds of other packages that you need, and your builds will be preferred by anyone installing your package.

There is a general lack of options for Conda repositories, compared to PyPI-style repositories. But there’s certainly no reason you need to get your package into conda-forge. In my experience, packages aren’t even more discoverable in conda-forge - someone is more likely to discover new packages by browsing a smaller channel they heard about, such as pytorch or nvidia.

And I know this is getting off topic, but it seems inevitable that a (successful) PyBI approach would lead to similar patterns. The need to share packages that can trace up to a matching PyBI package is going to require certain scenarios to set up their own complete or near-complete indexes - PyTorch is probably a good example here. So I think the description of how an existing equivalent currently works is useful proof that it can work.

BrenBarn · May 10, 2023, 5:53pm

This is of course true, but this will also apply to any future “official” tools that get created. It even applies to the current tools. There are still websites out there talking about running setup.py install. People sometimes still google stuff and somehow get sent to docs pages for Python 2! Any changes that are made to Python packaging will always require doc updates and directing people away from outdated info. (And maybe, as I’ll say below, doc updates should happen even there are no changes to the official tools.)

I won’t derail this thread by getting into the details here, but I’m very interested in them! I really would like to have that discussion about how a tool should work, and I have the sense (to my frustration) that in these various threads people are often shying away from directly stating their preferences about such matters. So I appreciate you stating yours.

That said, plenty of people don’t like the UX of the existing official tools either, which is why they use alternatives like conda or poetry. So even now the existing tools do not force anyone to do anything — which means a different set of official tools would also not force anyone to do anything. The question to me is what is the feature set that will be most beneficial to the widest swath of users, and should that feature set then be adopted in the official tools. (It was mentioned in one of the threads that future surveys might get into this, and I really hope that pans out.)

No doubt, but not all the differences are subjective. Things like “conda can manage the version of Python in the environment and pip cannot” are not subjective; they are genuine, factual differences. Of course that doesn’t mean they automatically override the subjective considerations. But again, to me the question is how can we determine the best set of features, taking into account both objective differences between the tools and subjective matters about their UI style. For instance, if there were a tool that combined the pip/venv UX you like with the additional ability to manage the python version in an environment, wouldn’t that be clearly better (according to your own subjective tastes) than the current scenario?

Sounds good to me. In fact, as I’ve mentioned on other threads, I think a good deal of the problem with the current setup is the official docs don’t do that: they basically just say “pip is the tool you should use”, when in fact, as you say, for many people that is not good advice.

This also gets back to what I see as a fundamental difference in viewpoint lurking in the corners of all these discussions, namely the relative importance of tools being “official” (in some sense or other). As I’ve mentioned before, my own belief is that a huge number of people who currently use pip/venv/etc. do not actually “like them” as such; what they like is having documentation available on python.org and having a tool that comes automatically with Python. If a different set of tools appeared in those official channels, many people would happily use those. From this perspective it is natural that multiple tools would co-exist, and the question is just which of them (perhaps more than one) gets foregrounded in the official documentation and/or is a transparent part of the Python install process.

dstufft · May 10, 2023, 6:29pm

I personally don’t use conda for a few reasons:

Every time I’ve tried to use conda in the past, the shenanigans they do to create environments has caused subtle breakages. I assume that’s not the general experience for people using Conda, so there’s something different about my use that’s exposing it… but the “standard” Python tools I’ve rarely had issue with (and when I’ve had, I can normally resolve it by forcing a problematic package to build from source).
A non trivial portion of time, I have a Python environment already that is being managed by something else and I need to install things into that environment. AFAIK Conda does not provide any mechanism for doing so.
AFAICT it’s pretty common for people using Conda to still need to use pip (or similar) into their Conda environments, so I still end up using the non Conda tools anyways.

Perhaps an important thing is I’ve never in my life installed a Numpy or a Scipy or a Pytorch for any reason other than to test a behavior of packaging tools. Pretty much all of the hard to build/distribute Python packages I’ve never personally had a reason to use them, so the benefits that Conda brings for them simply don’t matter for me.

njs · May 10, 2023, 6:48pm

I don’t think we’re disagreeing, actually :-). Sdists are the backbone of the Python ecosystem. But then my next point is: where do sdists come from? Maintainers have to create and upload them. And to do that, sdists need to be tested, and in particular they need to be tested against the dependencies described in their pypi-level metadata. Or when someone files a bug report about a bad interaction between two specific package releases, then the maintainer needs to be able to test that combination, regardless of whether that combination is distributed in (conda / debian / whatever downstream system that maintainer might prefer).

So: sdists maintainers need the ability to easily create arbitrary Python environments, based on pypi-level metadata, including arbitrary pypi-packages. And that specific problem is one that wheels can solve and conda can’t, which is why wheels and related tooling are unavoidable. (And this also happens to be my personal motivation for caring about PyBIs… I want better tooling for contributors to my OSS libraries, and my OSS libraries are distributed on PyPI and use PyPI dependency metadata, so their contribution workflows have to use PyPI-based tooling.)

Of course if wheels exist, some people with less constraints who could have used conda/nix/whatever might choose to use wheels instead. And if these were the only users, then we could talk about whether that’s the best choice for them, or what would need to change to let us get rid of wheels and make everyone happy with conda, etc. But the point about OSS maintainers means we can’t get rid of wheels, so that whole discussion is a waste of energy – wheels have to exist, conda will continue to exist, we need to focus on how to improve everyone’s experience given those two facts.

(Plus for the PEP’s purposes I wanted to make the point that pybi/wheels are not just duplicative of conda – they do actually have unique benefits that conda doesn’t, even if those benefits aren’t relevant to everyone.)

pf_moore · May 10, 2023, 9:58pm

Those reasons pretty much exactly match mine. Plus, I’ve always struggled to find an easy introduction to “how to use conda”, which means I always end up in a mess because I don’t know how to discover/use the full range of packages available natively in conda (the channel discovery problem).

I don’t believe that’s as important as you might thing - in my previous job I was a heavy user of numpy, pandas, matplotlib and other “data science” tools, and I never felt blocked because I didn’t use conda.

BrenBarn · May 10, 2023, 11:24pm

Just to clarify here: are you basically saying that you consider it a requirement to specifically use pypi metadata because that is what people currently use? And if so, doesn’t that mean something like “wheels can solve this problem and conda can’t” is a tautology? It’s just saying you can’t solve the problem of “this doesn’t use wheels” unless you. . . use wheels.

There’s no doubt a certain logic in that, and it seems to be a common mindset in the software development world, but I’m very leery of it because it leads us down the path of “we can’t improve things because we can’t change things because we’ve gotten used to doing it a certain way”.

I would certainly agree that package authors need to specify metadata, and want to depend on other packages, and so on. But are there any actual, intrinsic features of pypi metadata specifically that make that more possible or better than alternative conceptions of doing that?

brettcannon · May 10, 2023, 11:44pm

Nathaniel is the one putting a PEP forward on how to handle this and built the prototype, and since he is focusing on the pip side of things that’s where the focus is; that’s my point about “doing the work”. I didn’t mean to suggest no one from the conda side of the world aren’t participating or anything.

BrenBarn · May 11, 2023, 12:41am

Sure, but let’s not forget that tons of work was done outside the PEP system long ago to create conda^[1], which, undeniably, at this point offers orders of magnitude more functionality to end users than pybi. If only building on the PyPI system counted as “doing the work” we’d have no hope of reaching the long-dreamed-of improved integration between pip/conda/poetry/etc.

I hasten to add that I did none of this work. ↩︎

njs · May 11, 2023, 1:06am

No, I think this has gotten long enough that we’re losing track of the original PEP text Maybe re-read the section about conda? I’m saying it’s a requirement for upstream OSS package maintainers to use pypi metadata because that’s the higher-level abstract metadata that then gets “projected down” into conda/debian/nix/whatever metadata.

BrenBarn · May 11, 2023, 2:20am

Ha, whoops, thanks for redirecting me. I still don’t see that that is true in a necessary sense, though, and I’m not even sure it’s true in a practical sense.

On the practical side, it may be true that conda generates a wheel, but it isn’t true that all the conda metadata is simply a “projection” from the pypi metadata. In particular, the dependency information probably won’t be. It’s perfectly possible to make a conda package where all the dependency info is specified “from scratch” in the meta.yaml. You still need a setup.py or pyproject.toml to get the build to work, but it can list no dependencies at all.

And if pypi-level dependencies do exist for the project,they’re likely to be useless for a conda package, because the conda equivalents may have different names and may be split up in different ways (because non-Python deps can be separated). To some extent this translation can be done automatically, but often it involves a human looking at the pyproject.toml and figuring out what the appropriate conda packages are to reproduce the dependency info in conda land. So in this sense the PyPI metadata is no more special than a text-file readme like from the bad old days pre-pip, where the author would say “before installing this you better install that and t’other, but that’s on you”; it’s just something for a human to read to tell them what to do. There’s nothing essential about the format or even the content^[1]. The package author’s intent is important, but I don’t see that the pypi metadata format has special importance as a means of expressing that^[2]. (I have no idea whether any of what I’m saying here is also true of adapting a pypi package to something like debian or nix, although it certainly seems like it could be,)

When I’ve done this, I see it as simply a nuisance that I have to write a pyproject.toml at all to get conda to build my project. As near as I can tell the only really necessary part is specifying a build backend so conda can use it to build the wheel; there doesn’t actually need to be any contentful pypi-level metadata. I think even the version info there doesn’t matter, because the info in meta.yaml will take precedence over it. So I don’t see this as a super important role for wheels qua wheels; it’s just that wheel-building has been repurposed as an intermediate step in a conda build, and that could well be replaced by something else.

On the conceptual side, I still don’t think that basing something like pybi on that pypi metadata is even a good idea, because it doubles down on the limitations of that system^[3]. In particular, it does not solve the larger issue of wanting to depend on arbitrary things that aren’t written in Python.

As far as I can tell the PEP is basically only about installing the Python interpreter. It’s true that that’s probably the most important non-Python dependency^[4] for many people, but it’s far from the only one. Also, because it maintains the Python-first model^[5], it doesn’t allow authors to actually depend on pybi in the same way they would depend on a normal package. So this would let people install Python itself from pypi (using posy or whatever), but for all the other things they might want to install from pypi that aren’t Python libraries, they’re still out of luck.

So, all in all, I don’t think it’s actually the case that conda packages “derive” (or must derive) their metadata from pypi metadata; and even if it were, I don’t think it’s a good idea to double down on a metadata system that is completely unable to handle non-Python dependencies. What you call the “abstract” pypi metadata is, to my mind, not abstract enough, because (among other things) it conflates a Python package with its bundled non-python dependencies. If that larger problem were solved, then pybi could be just one among many installable things that aren’t Python libraries.

i.e., the actual package names depended on ↩︎
again, apart from the fact that a lot of people use it ↩︎
cogently discussed by @steve.dower here ↩︎
in the sense that Python itself is not a Python library ↩︎
i.e., the environment is inside Python rather than Python inside the environment ↩︎

dstufft · May 11, 2023, 3:34am

I don’t think the wheel ecosystem itself needs to exist, just something needs to be the default. Currently that’s the wheel ecosystem. One could imagine a world where instead of standardizing wheels, we made PyPI host sdists + a conda channel, and the upstream OSS package maintainers role just ended up served by conda (or some conda like system). I don’t think that would be an inherently better or worse world but it would represent different trade offs then we historically made.

This PEP very much builds on our existing ecosystem of tooling to extend it^[1].

However, I think that you’re missing that there is a key strength here in keeping the dependency information for Python level dependencies and system level dependencies separate. That strength is that PyBI is an optional thing for people to use to install their Python from. The original Python packages are still wholly independent of the specific Python interpreter that they are installed into.

This drives straight into one of the downsides of Conda that was mentioned upstream-- It can only support Conda provided Python, it has absolutely no mechanism to install into a non conda Python environment. That’s perfectly fine if you can dictate to your users that they can only use Conda, but most OSS developers are unable or unwilling to do that. When someone comes to me, as a Python developer, and tells me that they have a bug that is happening when running under a Python provided by say Debian, I need the ability to install my project into that Debian environment, targeting that Debian Python, so that I can explore, fix, and hopefully test my fix against that environment.

If Python is treated as “just another dependency”, then it becomes a lot harder to support an arbitrary interpreter to provide that dependency ^[2].

Yes, this bifurcation means that this system is unable to install nearly as many different types of dependencies… and that’s ok? We don’t need the PyPA ecosystem to support every use case for everyone. If you’re operating in a context where the tradeoffs made to support arbitrary interpreters isn’t useful but the ability to treat everything as just the same is… then you should definitely use those other systems.

And TBH, I’m not really sure how I feel about the PEP. It feels like a reasonable incremental change, but it also feels like something that we’re entering a world where we start having multiple different “types” of packages, each for their own specific use case, and maybe we would be better served by trying to unify them to a single package format that is flexible enough to satisfy multiple use cases. ↩︎
I’m sure that with enough effort that we could do it, but the fact that most, or all of, of the conda-like tools don’t provide that functionality I think is a sign that trying to do that is perhaps more difficult than expected. ↩︎

rgommers · May 11, 2023, 5:24am

I think “pypi metadata” is the wrong term and hints at an incorrect conceptual model (you understand all that I’m sure, but readers of the PEP may not). I’d use “Python package metadata”. It’s source-level metadata that is only defined in a single place as part of the Python package, and it’s equally valid whether it’s hosted on PyPI or taken directly from an sdist created directly from a VCS tag. Then for binaries, whether wheels or any other format, we need different metadata (also on PyPI).

Please hold your horses on this one. The PEP on filling this gap is almost ready for submission. And when that lands, it will be a significant benefit to packaging systems like conda-forge and Linux distros.

You’re missing multiple things here, most importantly that the metadata in meta.yaml got there initially by (mostly) automated translation from pyproject.toml metadata. So it’s definitely not the case that pyproject.toml metadata has no relevance for conda.

I hope you mean something like “in separate sections of pyproject.toml” (and yes, that seems like a good thing). If you mean that it’s a key strength to not record info on system dependencies at all, then I could not disagree more - it’s a huge pain.

BrenBarn · May 11, 2023, 5:52am

If that’s what @njs meant then that mollifies many of my objections. But I read “pypi metadata” as specifically referring to the type of metadata that pypi packages now have, and not additional stuff (like for instance non-Python deps).

Sounds great! Looking forward to it.

As I described in my post, that is possible, but not necessary. You can write most of the metadata (and in particular, the dependencies) directly in meta.yaml. Of course, yeah, some people generate the one from the other, but I see that as, again, due to social/marketing factors that are important but orthogonal to the tool functionality. I’m not trying to say that pyproject.toml has no relevance to conda, but rather that pyproject.toml (or a wheel, or a PyPI package) is not necessarily the “single source of truth” that @njs seemed to be suggesting.