PEP 711: PyBI: a standard format for distributing Python Binaries

Those reasons pretty much exactly match mine. Plus, I’ve always struggled to find an easy introduction to “how to use conda”, which means I always end up in a mess because I don’t know how to discover/use the full range of packages available natively in conda (the channel discovery problem).

I don’t believe that’s as important as you might thing - in my previous job I was a heavy user of numpy, pandas, matplotlib and other “data science” tools, and I never felt blocked because I didn’t use conda.

1 Like

Just to clarify here: are you basically saying that you consider it a requirement to specifically use pypi metadata because that is what people currently use? And if so, doesn’t that mean something like “wheels can solve this problem and conda can’t” is a tautology? It’s just saying you can’t solve the problem of “this doesn’t use wheels” unless you. . . use wheels.

There’s no doubt a certain logic in that, and it seems to be a common mindset in the software development world, but I’m very leery of it because it leads us down the path of “we can’t improve things because we can’t change things because we’ve gotten used to doing it a certain way”.

I would certainly agree that package authors need to specify metadata, and want to depend on other packages, and so on. But are there any actual, intrinsic features of pypi metadata specifically that make that more possible or better than alternative conceptions of doing that?

1 Like

Nathaniel is the one putting a PEP forward on how to handle this and built the prototype, and since he is focusing on the pip side of things that’s where the focus is; that’s my point about “doing the work”. I didn’t mean to suggest no one from the conda side of the world aren’t participating or anything.

1 Like

Sure, but let’s not forget that tons of work was done outside the PEP system long ago to create conda[1], which, undeniably, at this point offers orders of magnitude more functionality to end users than pybi. If only building on the PyPI system counted as “doing the work” we’d have no hope of reaching the long-dreamed-of improved integration between pip/conda/poetry/etc.


  1. I hasten to add that I did none of this work. :slight_smile: ↩︎

No, I think this has gotten long enough that we’re losing track of the original PEP text :slight_smile: Maybe re-read the section about conda? I’m saying it’s a requirement for upstream OSS package maintainers to use pypi metadata because that’s the higher-level abstract metadata that then gets “projected down” into conda/debian/nix/whatever metadata.

Ha, whoops, thanks for redirecting me. :upside_down_face: I still don’t see that that is true in a necessary sense, though, and I’m not even sure it’s true in a practical sense.

On the practical side, it may be true that conda generates a wheel, but it isn’t true that all the conda metadata is simply a “projection” from the pypi metadata. In particular, the dependency information probably won’t be. It’s perfectly possible to make a conda package where all the dependency info is specified “from scratch” in the meta.yaml. You still need a setup.py or pyproject.toml to get the build to work, but it can list no dependencies at all.

And if pypi-level dependencies do exist for the project,they’re likely to be useless for a conda package, because the conda equivalents may have different names and may be split up in different ways (because non-Python deps can be separated). To some extent this translation can be done automatically, but often it involves a human looking at the pyproject.toml and figuring out what the appropriate conda packages are to reproduce the dependency info in conda land. So in this sense the PyPI metadata is no more special than a text-file readme like from the bad old days pre-pip, where the author would say “before installing this you better install that and t’other, but that’s on you”; it’s just something for a human to read to tell them what to do. There’s nothing essential about the format or even the content[1]. The package author’s intent is important, but I don’t see that the pypi metadata format has special importance as a means of expressing that[2]. (I have no idea whether any of what I’m saying here is also true of adapting a pypi package to something like debian or nix, although it certainly seems like it could be,)

When I’ve done this, I see it as simply a nuisance that I have to write a pyproject.toml at all to get conda to build my project. As near as I can tell the only really necessary part is specifying a build backend so conda can use it to build the wheel; there doesn’t actually need to be any contentful pypi-level metadata. I think even the version info there doesn’t matter, because the info in meta.yaml will take precedence over it. So I don’t see this as a super important role for wheels qua wheels; it’s just that wheel-building has been repurposed as an intermediate step in a conda build, and that could well be replaced by something else.

On the conceptual side, I still don’t think that basing something like pybi on that pypi metadata is even a good idea, because it doubles down on the limitations of that system[3]. In particular, it does not solve the larger issue of wanting to depend on arbitrary things that aren’t written in Python.

As far as I can tell the PEP is basically only about installing the Python interpreter. It’s true that that’s probably the most important non-Python dependency[4] for many people, but it’s far from the only one. Also, because it maintains the Python-first model[5], it doesn’t allow authors to actually depend on pybi in the same way they would depend on a normal package. So this would let people install Python itself from pypi (using posy or whatever), but for all the other things they might want to install from pypi that aren’t Python libraries, they’re still out of luck.

So, all in all, I don’t think it’s actually the case that conda packages “derive” (or must derive) their metadata from pypi metadata; and even if it were, I don’t think it’s a good idea to double down on a metadata system that is completely unable to handle non-Python dependencies. What you call the “abstract” pypi metadata is, to my mind, not abstract enough, because (among other things) it conflates a Python package with its bundled non-python dependencies. If that larger problem were solved, then pybi could be just one among many installable things that aren’t Python libraries.


  1. i.e., the actual package names depended on ↩︎

  2. again, apart from the fact that a lot of people use it ↩︎

  3. cogently discussed by @steve.dower here ↩︎

  4. in the sense that Python itself is not a Python library ↩︎

  5. i.e., the environment is inside Python rather than Python inside the environment ↩︎

I don’t think the wheel ecosystem itself needs to exist, just something needs to be the default. Currently that’s the wheel ecosystem. One could imagine a world where instead of standardizing wheels, we made PyPI host sdists + a conda channel, and the upstream OSS package maintainers role just ended up served by conda (or some conda like system). I don’t think that would be an inherently better or worse world but it would represent different trade offs then we historically made.

This PEP very much builds on our existing ecosystem of tooling to extend it[1].

However, I think that you’re missing that there is a key strength here in keeping the dependency information for Python level dependencies and system level dependencies separate. That strength is that PyBI is an optional thing for people to use to install their Python from. The original Python packages are still wholly independent of the specific Python interpreter that they are installed into.

This drives straight into one of the downsides of Conda that was mentioned upstream-- It can only support Conda provided Python, it has absolutely no mechanism to install into a non conda Python environment. That’s perfectly fine if you can dictate to your users that they can only use Conda, but most OSS developers are unable or unwilling to do that. When someone comes to me, as a Python developer, and tells me that they have a bug that is happening when running under a Python provided by say Debian, I need the ability to install my project into that Debian environment, targeting that Debian Python, so that I can explore, fix, and hopefully test my fix against that environment.

If Python is treated as “just another dependency”, then it becomes a lot harder to support an arbitrary interpreter to provide that dependency [2].

Yes, this bifurcation means that this system is unable to install nearly as many different types of dependencies… and that’s ok? We don’t need the PyPA ecosystem to support every use case for everyone. If you’re operating in a context where the tradeoffs made to support arbitrary interpreters isn’t useful but the ability to treat everything as just the same is… then you should definitely use those other systems.


  1. And TBH, I’m not really sure how I feel about the PEP. It feels like a reasonable incremental change, but it also feels like something that we’re entering a world where we start having multiple different “types” of packages, each for their own specific use case, and maybe we would be better served by trying to unify them to a single package format that is flexible enough to satisfy multiple use cases. ↩︎

  2. I’m sure that with enough effort that we could do it, but the fact that most, or all of, of the conda-like tools don’t provide that functionality I think is a sign that trying to do that is perhaps more difficult than expected. ↩︎

2 Likes

I think “pypi metadata” is the wrong term and hints at an incorrect conceptual model (you understand all that I’m sure, but readers of the PEP may not). I’d use “Python package metadata”. It’s source-level metadata that is only defined in a single place as part of the Python package, and it’s equally valid whether it’s hosted on PyPI or taken directly from an sdist created directly from a VCS tag. Then for binaries, whether wheels or any other format, we need different metadata (also on PyPI).

Please hold your horses on this one. The PEP on filling this gap is almost ready for submission. And when that lands, it will be a significant benefit to packaging systems like conda-forge and Linux distros.

You’re missing multiple things here, most importantly that the metadata in meta.yaml got there initially by (mostly) automated translation from pyproject.toml metadata. So it’s definitely not the case that pyproject.toml metadata has no relevance for conda.

I hope you mean something like “in separate sections of pyproject.toml” (and yes, that seems like a good thing). If you mean that it’s a key strength to not record info on system dependencies at all, then I could not disagree more - it’s a huge pain.

4 Likes

If that’s what @njs meant then that mollifies many of my objections. :slight_smile: But I read “pypi metadata” as specifically referring to the type of metadata that pypi packages now have, and not additional stuff (like for instance non-Python deps).

Sounds great! Looking forward to it. :slight_smile:

As I described in my post, that is possible, but not necessary. You can write most of the metadata (and in particular, the dependencies) directly in meta.yaml. Of course, yeah, some people generate the one from the other, but I see that as, again, due to social/marketing factors that are important but orthogonal to the tool functionality. I’m not trying to say that pyproject.toml has no relevance to conda, but rather that pyproject.toml (or a wheel, or a PyPI package) is not necessarily the “single source of truth” that @njs seemed to be suggesting.

I do not mean that it’s a strength to not record them at all. Recording that information as independent pieces of metadata seems like a reasonable and positive thing. I mean that the nature of package managers like Conda/Apt/etc that treat all the types of dependencies as the “same”, while a strength for their use cases, are a weakness for the use case where you want to be able to use different providers, some of which may not be a package managers at all, for those dependencies.

The key thing here is that when you’re developing a Python project, it’s useful to be able to install into varying target environments and interpreter and system level dependencies. That means you need to be able to differentiate between “things I want to get from the system” and “things I want to get from pip et al”. When all of those concerns are collapsed into a single dependency chain it becomes difficult to do that.

Yeah, I keep making up different handwavy terms here because we don’t really have a standard term.

But, I will defend this one a little bit: one of the fundamental differences between dependency metadata in conda, Debian, nix, etc. is that they all need some mapping from names → projects, and that they each have their own independent namespace to specify this mapping. For the metadata that we put in sdists, PyPI and its operators define the normative package namespace. So PyPI is special for this metadata, even though yeah the metadata itself ends up in all sorts of places and lots of times you can do useful things with it without ever contacting https://pypi.org.

It’d be nice to fix that, but I admit that I’m not sure how. Maybe adding terms to Glossary — Python Packaging User Guide after writing a short informational document somewhere?

Sure, but that’s not really all that interesting, it just means that if you think about packages being able to come from different ecosystems/providers, the pypi: part of pypi:pkgname is implicit and pkgname is the canonical name as chosen by the package authors.

Even that latter bit isn’t always true, e.g. pybind11 and pybind11-global both provide a pybind11 Python package. It is mostly that “I need a pybind11 package installed” part that is generic, not the name of the providing PyPI package. From the PyPI and package manager perspective those are different, but from the package author’s perspective they’re not - it tries to capture the requirement that follows from “I have #include <pybind11/pybind11.h> in my code base”.

That is true for a Linux distro or Homebrew, which lack multi-version support. It is not true for the likes of Conda/Spack/Nix, where I don’t think you lose anything of importance - that approach is strictly superior I’d say (dependency management wise), compared to using two separate package managers with implicit dependencies between them. You are using Python packages where there is almost no dependencies on system dependencies it seems, and the concerns are then orthogonal to you. But in general this is not true, it is simply not an orthogonal space. As an example, if different versions of numpy support different version ranges of openblas, then having that link broken is very painful. You just end up doing manually what you otherwise get from the package manager.

3 Likes

I agree with the discussion about conda being a waste of energy (here at least) - it’s not like we’re going to discover anything new there in yet another Discourse thread.

I do want to point out that your reasoning about wheels conflates two things: wheels as an intermediate binary format (which is perfectly fine), and wheels as redistributable artifacts (which causes huge amounts of pain). I think the former are necessary, the latter are debatable. No need to have that debate here, but it’d be good to improve the way you phrased this in your PEP and disentangle the two. I’d be happy to take this elsewhere and help review/edit your “Why not just use conda?” section in more detail?

2 Likes

It’s also true for Conda/Spack/Nix afaik. If I’m developing a binding to OpenSSL, and I want to test it against a Debian environment, if my “source” is a conda package, I have to figure out how to split the dependencies that I now want Debian to provide, from the dependencies I still want to get from my typical toolchain.

Not at all, I suspect you are misunderstanding. Your own “source”, a Python package in VCS, would not change at all. I know that because this is what most projects I work on are already like. We have devs that use Debian, or Conda, or Docker, or Homebrew - it all has to work at the same time, because different folks have different preferences. You can always build one or more Python packages on top of a set of dependencies from any given provider.

The point is not about changing the way a Python package itself works, the point is about whether to install all your dependencies with 1 or 2 package managers. Assuming the package manager has all the versions of all the dependencies you care about, it seems clear that using 1 is better and more general than using 2. This is just the nature of dependency management - you have more complete information, and hence can do a better job. (analogy: split your python dependencies in half, and install the first half with pip and the second half with poetry - you get perhaps the same end result, never better, often worse than doing it all at once with pip alone).

2 Likes

Just a quick note - it’s been pointed out to me that this could be interpreted as me dismissing the feedback we’ve had from people who do find the “PyPA tools” experience frustrating. I didn’t mean it like that, I was simply describing my personal experience[1].

Sorry for any confusion I may have caused, and to anyone who feels like I’m dismissing the struggles they might personally have experienced.


  1. And yes, I know I’m not exactly an “average user” of packaging tools :slightly_smiling_face: ↩︎

2 Likes

Sure, but there’s also the question of what package managers you have. On Linux systems, there’s always the “system package manager”. On MacOS, I believe Homebrew is extremely common, but not universal. On Windows, there’s nothing (Add/Remove programs is for applications, not shared library dependencies). I’d argue that the number of people who install an “extra” package manager is very much a minority. Python users get pip/PyPI as part of the default install, so that’s present, but as noted doesn’t include non-Python dependencies.

So the choice, by default, is between 1/2 (Linux and MacOS/Homebrew) or 0/1 (Windows and non-Homebrew MacOS). In the 1/2 case, yes 1 is better than 2. But also, 1 is better than 0.

I think we’re making progress on making sure PyPA tools work with Linux distros and Homebrew. We’re not there yet, but we have the processes in place and we’re working on it.

People using Nix or Spack are likely either specialists, or are using environments managed by specialists for them (HPC being the case that immediately comes to mind). I’m going to ignore them for now, both because I have very little knowledge of them and in the interests of brevity.

And then there’s conda. I don’t know how Linux users/maintainers see conda, and I’d be really interested to better understand that. I guess MacOS either feels “Linux-like” (Homebrew) or “Windows-like” (non-Homebrew) but again, I’d love to get actual information here. But on Windows, my impression is that many users[1] view Conda as an application, much like RStudio, Eclipse, or Visual Studio[2], which provides a “language environment” for Python users[3]. As such, they don’t think of it as a system-level package manager (Windows users don’t tend to even know what a package manager is!), but more like a “plugin manager” for the application. So you use conda to install stuff for conda. Using pip feels weird and slightly wrong. Finding something is missing from conda seems like something you have to live with, not something you can address yourself. Etc.

To be clear, that’s how the conda users I’ve worked with have perceived conda. It may not be the way people here expect or want users to view it, but in that case there may be an education gap that conda needs to look at. Or maybe not - maybe conda developers are happy with how people use conda and there’s not a problem. But I think it’s something that we should be aware of here, as we have a long history of misunderstanding each other, particularly around the relationship between conda and PyPA, and I think explaining our understandings, even if they seem wrong or misguided to the other parties, is a useful way of establishing some common ground.

PS This is getting quite a long way away from PEP 711. Maybe it should be split off into a separate thread? On the other hand, it’s ground we’ve covered before, so maybe we should simply leave it at this point?


  1. At least in the closed-source corporate “data science” world, where I worked. ↩︎

  2. And not like nodejs or perl. ↩︎

  3. And yes, I know conda offers more than Python, but again, that’s not how people see it - no-one suggests installing conda to people who want access to R, for example. ↩︎

1 Like

I don’t think it’s entirely unrelated, because PyBIs will have the same question to answer: Where do you get your external dependencies? As far as I can tell, a main difference between a conda Python and a PyBI will always be that the former has unvendored many libraries that the later vendors. So can and will there be “conda PyBIs”, which fall back on conda libraries? The reflexive answer is “no, use different environments”, but how much do you bet that people will try and test e.g. all supported Python versions within one conda environment as soon as that’s on the horizon?

1 Like

One or the other I’d say. There is a bit of overlap with PEP 711, but most of the last posts indeed did not overlap too much. My personal feeling is that the Discourse format is too limited to make much more progress on the distro/conda mutual understanding - I could reply to every other sentence in your last post, but I think it’d be much more productive if we’d spend an hour on a video call once and have a higher-bandwidth conversation.

1 Like

I’m going to say if this side conversation goes any farther I will split it, but I also agree I’m not sure if it will help much to spark yet another thread on this topic, so I’m hoping we can stop now to have a more productive conversation in some other way.

2 Likes