PEP 711: PyBI: a standard format for distributing Python Binaries

I do not mean that it’s a strength to not record them at all. Recording that information as independent pieces of metadata seems like a reasonable and positive thing. I mean that the nature of package managers like Conda/Apt/etc that treat all the types of dependencies as the “same”, while a strength for their use cases, are a weakness for the use case where you want to be able to use different providers, some of which may not be a package managers at all, for those dependencies.

The key thing here is that when you’re developing a Python project, it’s useful to be able to install into varying target environments and interpreter and system level dependencies. That means you need to be able to differentiate between “things I want to get from the system” and “things I want to get from pip et al”. When all of those concerns are collapsed into a single dependency chain it becomes difficult to do that.

Yeah, I keep making up different handwavy terms here because we don’t really have a standard term.

But, I will defend this one a little bit: one of the fundamental differences between dependency metadata in conda, Debian, nix, etc. is that they all need some mapping from names → projects, and that they each have their own independent namespace to specify this mapping. For the metadata that we put in sdists, PyPI and its operators define the normative package namespace. So PyPI is special for this metadata, even though yeah the metadata itself ends up in all sorts of places and lots of times you can do useful things with it without ever contacting https://pypi.org.

It’d be nice to fix that, but I admit that I’m not sure how. Maybe adding terms to Glossary - Python Packaging User Guide after writing a short informational document somewhere?

Sure, but that’s not really all that interesting, it just means that if you think about packages being able to come from different ecosystems/providers, the pypi: part of pypi:pkgname is implicit and pkgname is the canonical name as chosen by the package authors.

Even that latter bit isn’t always true, e.g. pybind11 and pybind11-global both provide a pybind11 Python package. It is mostly that “I need a pybind11 package installed” part that is generic, not the name of the providing PyPI package. From the PyPI and package manager perspective those are different, but from the package author’s perspective they’re not - it tries to capture the requirement that follows from “I have #include <pybind11/pybind11.h> in my code base”.

That is true for a Linux distro or Homebrew, which lack multi-version support. It is not true for the likes of Conda/Spack/Nix, where I don’t think you lose anything of importance - that approach is strictly superior I’d say (dependency management wise), compared to using two separate package managers with implicit dependencies between them. You are using Python packages where there is almost no dependencies on system dependencies it seems, and the concerns are then orthogonal to you. But in general this is not true, it is simply not an orthogonal space. As an example, if different versions of numpy support different version ranges of openblas, then having that link broken is very painful. You just end up doing manually what you otherwise get from the package manager.

3 Likes

I agree with the discussion about conda being a waste of energy (here at least) - it’s not like we’re going to discover anything new there in yet another Discourse thread.

I do want to point out that your reasoning about wheels conflates two things: wheels as an intermediate binary format (which is perfectly fine), and wheels as redistributable artifacts (which causes huge amounts of pain). I think the former are necessary, the latter are debatable. No need to have that debate here, but it’d be good to improve the way you phrased this in your PEP and disentangle the two. I’d be happy to take this elsewhere and help review/edit your “Why not just use conda?” section in more detail?

2 Likes

It’s also true for Conda/Spack/Nix afaik. If I’m developing a binding to OpenSSL, and I want to test it against a Debian environment, if my “source” is a conda package, I have to figure out how to split the dependencies that I now want Debian to provide, from the dependencies I still want to get from my typical toolchain.

Not at all, I suspect you are misunderstanding. Your own “source”, a Python package in VCS, would not change at all. I know that because this is what most projects I work on are already like. We have devs that use Debian, or Conda, or Docker, or Homebrew - it all has to work at the same time, because different folks have different preferences. You can always build one or more Python packages on top of a set of dependencies from any given provider.

The point is not about changing the way a Python package itself works, the point is about whether to install all your dependencies with 1 or 2 package managers. Assuming the package manager has all the versions of all the dependencies you care about, it seems clear that using 1 is better and more general than using 2. This is just the nature of dependency management - you have more complete information, and hence can do a better job. (analogy: split your python dependencies in half, and install the first half with pip and the second half with poetry - you get perhaps the same end result, never better, often worse than doing it all at once with pip alone).

2 Likes

Just a quick note - it’s been pointed out to me that this could be interpreted as me dismissing the feedback we’ve had from people who do find the “PyPA tools” experience frustrating. I didn’t mean it like that, I was simply describing my personal experience[1].

Sorry for any confusion I may have caused, and to anyone who feels like I’m dismissing the struggles they might personally have experienced.


  1. And yes, I know I’m not exactly an “average user” of packaging tools :slightly_smiling_face: ↩︎

2 Likes

Sure, but there’s also the question of what package managers you have. On Linux systems, there’s always the “system package manager”. On MacOS, I believe Homebrew is extremely common, but not universal. On Windows, there’s nothing (Add/Remove programs is for applications, not shared library dependencies). I’d argue that the number of people who install an “extra” package manager is very much a minority. Python users get pip/PyPI as part of the default install, so that’s present, but as noted doesn’t include non-Python dependencies.

So the choice, by default, is between 1/2 (Linux and MacOS/Homebrew) or 0/1 (Windows and non-Homebrew MacOS). In the 1/2 case, yes 1 is better than 2. But also, 1 is better than 0.

I think we’re making progress on making sure PyPA tools work with Linux distros and Homebrew. We’re not there yet, but we have the processes in place and we’re working on it.

People using Nix or Spack are likely either specialists, or are using environments managed by specialists for them (HPC being the case that immediately comes to mind). I’m going to ignore them for now, both because I have very little knowledge of them and in the interests of brevity.

And then there’s conda. I don’t know how Linux users/maintainers see conda, and I’d be really interested to better understand that. I guess MacOS either feels “Linux-like” (Homebrew) or “Windows-like” (non-Homebrew) but again, I’d love to get actual information here. But on Windows, my impression is that many users[1] view Conda as an application, much like RStudio, Eclipse, or Visual Studio[2], which provides a “language environment” for Python users[3]. As such, they don’t think of it as a system-level package manager (Windows users don’t tend to even know what a package manager is!), but more like a “plugin manager” for the application. So you use conda to install stuff for conda. Using pip feels weird and slightly wrong. Finding something is missing from conda seems like something you have to live with, not something you can address yourself. Etc.

To be clear, that’s how the conda users I’ve worked with have perceived conda. It may not be the way people here expect or want users to view it, but in that case there may be an education gap that conda needs to look at. Or maybe not - maybe conda developers are happy with how people use conda and there’s not a problem. But I think it’s something that we should be aware of here, as we have a long history of misunderstanding each other, particularly around the relationship between conda and PyPA, and I think explaining our understandings, even if they seem wrong or misguided to the other parties, is a useful way of establishing some common ground.

PS This is getting quite a long way away from PEP 711. Maybe it should be split off into a separate thread? On the other hand, it’s ground we’ve covered before, so maybe we should simply leave it at this point?


  1. At least in the closed-source corporate “data science” world, where I worked. ↩︎

  2. And not like nodejs or perl. ↩︎

  3. And yes, I know conda offers more than Python, but again, that’s not how people see it - no-one suggests installing conda to people who want access to R, for example. ↩︎

1 Like

I don’t think it’s entirely unrelated, because PyBIs will have the same question to answer: Where do you get your external dependencies? As far as I can tell, a main difference between a conda Python and a PyBI will always be that the former has unvendored many libraries that the later vendors. So can and will there be “conda PyBIs”, which fall back on conda libraries? The reflexive answer is “no, use different environments”, but how much do you bet that people will try and test e.g. all supported Python versions within one conda environment as soon as that’s on the horizon?

1 Like

One or the other I’d say. There is a bit of overlap with PEP 711, but most of the last posts indeed did not overlap too much. My personal feeling is that the Discourse format is too limited to make much more progress on the distro/conda mutual understanding - I could reply to every other sentence in your last post, but I think it’d be much more productive if we’d spend an hour on a video call once and have a higher-bandwidth conversation.

1 Like

I’m going to say if this side conversation goes any farther I will split it, but I also agree I’m not sure if it will help much to spark yet another thread on this topic, so I’m hoping we can stop now to have a more productive conversation in some other way.

2 Likes

I disagree that the Python project dependency metadata is “projected down”[1] for the binary artifacts generated by whichever packaging ecosystem because the dependencies on the binary artifact are in general different (but at least always stricter!) than what the project claims.

A concrete example is Matplotlib’s pypy38 wheels being broken because the metadata claims one minimum supported numpy (true by API) but the wheels were built with a newer version of numpy (so the minimum of the wheel set by ABI is higher) and the pip/wheel tooling can not (currently) protect the user from that error.

A second example is with h5py we support a wide range of versions of the underlying libhdf5 but currently when we publish wheels we have to pick a single version (tends to be the latest stable). There are also a number of build-time options on libhdf5 which, again, we have to pick something for the wheels. Users may have a good reason to care about the exact version of libhdf5 they use (or need different build time options), however the users can only install by the h5py version and can not even query the version of libhdf5 the wheel was built with (short of installing it and asking h5py).


I very much agree with:

and, contra @njs, with my OSS maintainer hat on (h5py and Matplotlib[2]) I do not think redistributed binary wheels are an inevitability. Binary wheels being on pypi is a convenience for the users / makes installs faster, but if platform specific binary wheels were to suddenly go away or stop being produce we would be fine.

Pushing on @dstufft 's scenario of debugging a weird problem on a particular system, if you work entirely from wheels (which bring their system dependencies vendored with them) you are not really testing against the host system, you are testing against wheels on the host platform which is probably not what you want. Going back to the h5py example, if the problem is an interaction with the libhdf5 that debian ships then installing a wheel with a vendored (and maybe mangled) version of libhdf5 is not what you want.


With the context of @indygreg 's comment about the obvious (and I suspect inevitable) scope creep, from my point of view redistributed binary wheels + this PEP is already creating an entirely new packaging ecosystem[3]. However a lot of that work and hard problems are being pushed out to the projects in a distributed ad-hoc way[4] in a way the feels coercive.

This is compounded by the difficulty we are all seem to be having communicating with each other.


Sorry for posting this after people are saying to wrap up the conversation…I spent far too long on this and started before those were posted :laughing: .


  1. I suspect even with the upcoming work on including non-Python dependencies ↩︎

  2. to be clear I am speaking as me in these posts not for the projects ↩︎

  3. The effort has been on-going for a while, I’m not claiming this is the start of it. However it is far from done and I think is only starting to understand and address the hard problems! ↩︎

  4. An amazing amount technical work has been done to tooling make things “mostly” work! manylinux is great, cibuildwheel is great, the 3 different wheel auditing tools are great ↩︎

2 Likes

I don’t want to go down any further rabbit holes, but as I’ll explain below I think some of these rabbit holes need to be gone down. Anyway, as regards the PEP, my thoughts are this:

The fragmented and confusing nature of the Python packaging ecosystem is the single most important problem with Python packaging today.[1] This is borne out by the recent survey. Every additional development or addition to that system risks making it more complex and worsening that problem.

Therefore, every decision about every packaging PEP, including this one, needs to be considered in light not just of the immediate features of the specific proposal, but in terms of how it affects the overall trajectory of the ecosystem — specifically, does it move us towards making things less confusing and less fragmented for end users. Adding this or that thing that seems cool can actually be harmful if it makes that fragmentation problem worse.

In my view, discussions about what can or cannot be done with pip, conda, poetry, etc., their relative strengths and weaknesses, why people use them, and so on, are not mere distractions from discussion of this PEP or similar proposals. It is essential that those larger conceptual issues be hashed out and resolved to determine how the fragmentation problem is going to be fixed. There is no urgent need to move forward with proposals such as this PEP. It is better to simply wait until we have a better idea of the overall direction we want to go.

From a technical perspective I think PyBI is quite cool and I don’t have any problems with it. My point is just that even the most technically amazing packaging idea in the world won’t seem like a good idea to me unless we have some long-term vision of how it will help (or at least not worsen) the fragmentation problem. This PEP doesn’t seem to make things that much more complex, but it does introduce one more way of installing Python, which is one more option that can confuse users, and that’s a cost. It also doesn’t obviate any existing tools, so there’s no corresponding savings in potential confusion elsewhere. As for benefits, the main one seems to be “it can go on PyPI” which to me does not outweigh that potential confusion.[2]

I recognize that in these discussions I speak from a position of relatively low credibility as someone who doesn’t write any packaging software. But to me the survey results (not to mention many other encounters with users in the wild) indicate that the problem does exist, and is big, even if most of the people complaining about it are people of little credibility like me. And I think that it’s vital to foreground the end-user experience, and especially to consider how it may be affected (and has been affected) by the gradual evolution of the packaging system, and to try to direct that evolution in a global way, rather than just considering individual proposals in isolation.


  1. I’d actually go further and say it’s the single biggest problem Python faces today. ↩︎

  2. And, as I mentioned before, I am leery of the whole idea of building on the existing PyPI system because I see it too as needing significant revision in order to solve the fragmentation problem. ↩︎

2 Likes

I have only skimmed the above conversation regarding the high-level/conda discussion, so apologies if I’m speaking out of term. I just came here to say, I think this is a great proposal, and I’d love to see it approved and implemented. I had a situation today where cibuildwheel needed a cross-platform way to install a CPython 3.11 interpreter (for building Pyodide wheels) and this would be perfect.

I don’t really understand the hesitation/controversy above, from my perspective, these PyBIs are just a more generally useful version of the installers we already have today on Python.org. Even if we as a community don’t know the right way to go, the worst outcome is gridlock. We should keep moving and improving things.

Brendan covered the hesitation pretty well in the post immediately above yours, and it reads well on its own without all the discussion leading up to it.

Further (official) fracturing is not an improvement over gridlock, especially given the thing we want to improve is the already huge amount of fracturing. If we’re going to push forward with PyBI as a canonical source of Python binaries, such that no matter where you get yours from they’re going to be the same as in the “python.org” PyBI packages, then this becomes a good way out of gridlock.[1]

Many of us see our job here is to evaluate the second- and third-order effects of this proposal becoming reality. So while the first-order effect[2] is very attractive, some of the second-order effects[3] and third-order effects[4] give us reason to pause and ask whether this is the outcome we want, and what we ought to plan for in order to achieve the better effects.


  1. I don’t think this is the proposal, nor do I think it’s a good idea, but it does seem to be the way that PyBI can “solve” packaging. ↩︎

  2. Binaries are available without having to talk to your OS/vendor ↩︎

  3. Harder to distribute binaries compatible with PyBI packages and OS provided builds ↩︎

  4. Fewer binary packages are distributed and everyone needs to build from source or get a different vendor who provides a compatible set of Python packages. ↩︎

2 Likes

second-order effects: Harder to distribute binaries compatible with PyBI packages and OS provided builds

third-order effects: Fewer binary packages are distributed and everyone needs to build from source or get a different vendor who provides a compatible set of Python packages.

Thanks for these notes, these are the kind of concrete details I had missed in the conversation above. I apologise for getting into meta-discussion. I only wish to say this PEP sounds very useful to me on a first-order level.

2 Likes

I want to say something positive here though, since Nathaniel is not claiming to want to solve any of these harder second/third-order effects I believe. The goals in the PEP abstract read well, and if PyBI’s can simply be scoped as “a nicer alternative to python.org Python binaries, as well as other standalone Python binaries that are floating around”, then why not? I’d say that as long as it actually does get rid of the need for those other binaries and not just adds yet another flavor of them, it seems like a net win.

Regarding the PEP content itself, one more thing stood out to me: the large amount of special-casing for macOS universal2. It’s a “design smell”. We have to recognize that universal2 has turned out to not be the best idea, and the needs in CI and local development are going to lean towards thin PyBI’s just like there’s a large preference for thin wheels[1]. I’d like to see this PEP simplified and address the problem at its root at the same time, by simply using thin macOS interpreter builds.


  1. I’m sure a few folks will disagree, but I consider this matter settled by people voting with their feet, pip always preferring thin wheels, as well as by the arguments under What’s the deal with universal2 wheels? in Add details on native packaging requirements exposed by mobile platforms by freakboy3742 · Pull Request #27 · pypackaging-native/pypackaging-native · GitHub. The initial argument was poor already, and now that the transition phase to arm64 is over, there really is no need for it anymore. ↩︎

1 Like

I mean, this is an example of just ignoring second and third-order effects :slight_smile:

The additional effects I suggest are potential results from implementing this proposal, which is why you don’t see them written in there. We don’t expect every proposal to explicitly cover every potential effect (though we do predict some - “security implications” and “how to teach this” are two), but we should certainly cover the likely impact during discussion. If we believe they’re unlikely to result, then they get dismissed. If we dispute the likelihood, they probably get mentioned and dismissed explicitly in the PEP, and then the delegate decides whether it’s reasonable or not. Or we adapt the proposal to make them less likely (or more likely, if they’re desirable effects).

But importantly, because they are the second and third order effects, then by definition they are not the target of the proposal. That would have made them first order effects - the things the proposal says it’s going to do - rather than the things that may happen as a result.

And the question I was answering was “why the hesitation?” I answered by pointing out that people have recognised some potential ramifications of the proposal, citing an earlier post that makes the same point, and noting that we don’t have the luxury of ignoring the ramifications of proposals we accept, and so there is some hesitation. None of this is negative, it’s just how discussions work (we don’t normally get so meta about them, and unless I’m accused of being negative then I’ll say no more on this side-topic).

3 Likes

Oh definitely not! Apologies if that is how you read my phrasing. It seemed to me like the last N posts were overall veering towards the negative and about more general pain points in Python packaging. While the actual content around PyBI’s - which looks quite good - hasn’t yet gotten the attention it deserves.

6 Likes

The thing I don’t follow is this proposal is being labelled as fragmentation. What is this fragmenting? From what I can tell, the new distribution format solves legitimate issues that are currently blocking certain progress in the ecosystem, and should be considered a strict improvement to currently mainstream solutions. Considering this a fragmentation is the exact reason to the gridlock, and the only way to resolve that involves accepting a proposal like this one.

3 Likes