Those reasons pretty much exactly match mine. Plus, Iâve always struggled to find an easy introduction to âhow to use condaâ, which means I always end up in a mess because I donât know how to discover/use the full range of packages available natively in conda (the channel discovery problem).
I donât believe thatâs as important as you might thing - in my previous job I was a heavy user of numpy, pandas, matplotlib and other âdata scienceâ tools, and I never felt blocked because I didnât use conda.
Just to clarify here: are you basically saying that you consider it a requirement to specifically use pypi metadata because that is what people currently use? And if so, doesnât that mean something like âwheels can solve this problem and conda canâtâ is a tautology? Itâs just saying you canât solve the problem of âthis doesnât use wheelsâ unless you. . . use wheels.
Thereâs no doubt a certain logic in that, and it seems to be a common mindset in the software development world, but Iâm very leery of it because it leads us down the path of âwe canât improve things because we canât change things because weâve gotten used to doing it a certain wayâ.
I would certainly agree that package authors need to specify metadata, and want to depend on other packages, and so on. But are there any actual, intrinsic features of pypi metadata specifically that make that more possible or better than alternative conceptions of doing that?
Nathaniel is the one putting a PEP forward on how to handle this and built the prototype, and since he is focusing on the pip side of things thatâs where the focus is; thatâs my point about âdoing the workâ. I didnât mean to suggest no one from the conda side of the world arenât participating or anything.
Sure, but letâs not forget that tons of work was done outside the PEP system long ago to create conda[1], which, undeniably, at this point offers orders of magnitude more functionality to end users than pybi. If only building on the PyPI system counted as âdoing the workâ weâd have no hope of reaching the long-dreamed-of improved integration between pip/conda/poetry/etc.
I hasten to add that I did none of this work. âŠď¸
No, I think this has gotten long enough that weâre losing track of the original PEP text Maybe re-read the section about conda? Iâm saying itâs a requirement for upstream OSS package maintainers to use pypi metadata because thatâs the higher-level abstract metadata that then gets âprojected downâ into conda/debian/nix/whatever metadata.
Ha, whoops, thanks for redirecting me. I still donât see that that is true in a necessary sense, though, and Iâm not even sure itâs true in a practical sense.
On the practical side, it may be true that conda generates a wheel, but it isnât true that all the conda metadata is simply a âprojectionâ from the pypi metadata. In particular, the dependency information probably wonât be. Itâs perfectly possible to make a conda package where all the dependency info is specified âfrom scratchâ in the meta.yaml. You still need a setup.py or pyproject.toml to get the build to work, but it can list no dependencies at all.
And if pypi-level dependencies do exist for the project,theyâre likely to be useless for a conda package, because the conda equivalents may have different names and may be split up in different ways (because non-Python deps can be separated). To some extent this translation can be done automatically, but often it involves a human looking at the pyproject.toml and figuring out what the appropriate conda packages are to reproduce the dependency info in conda land. So in this sense the PyPI metadata is no more special than a text-file readme like from the bad old days pre-pip, where the author would say âbefore installing this you better install that and tâother, but thatâs on youâ; itâs just something for a human to read to tell them what to do. Thereâs nothing essential about the format or even the content[1]. The package authorâs intent is important, but I donât see that the pypi metadata format has special importance as a means of expressing that[2]. (I have no idea whether any of what Iâm saying here is also true of adapting a pypi package to something like debian or nix, although it certainly seems like it could be,)
When Iâve done this, I see it as simply a nuisance that I have to write a pyproject.toml at all to get conda to build my project. As near as I can tell the only really necessary part is specifying a build backend so conda can use it to build the wheel; there doesnât actually need to be any contentful pypi-level metadata. I think even the version info there doesnât matter, because the info in meta.yaml will take precedence over it. So I donât see this as a super important role for wheels qua wheels; itâs just that wheel-building has been repurposed as an intermediate step in a conda build, and that could well be replaced by something else.
On the conceptual side, I still donât think that basing something like pybi on that pypi metadata is even a good idea, because it doubles down on the limitations of that system[3]. In particular, it does not solve the larger issue of wanting to depend on arbitrary things that arenât written in Python.
As far as I can tell the PEP is basically only about installing the Python interpreter. Itâs true that thatâs probably the most important non-Python dependency[4] for many people, but itâs far from the only one. Also, because it maintains the Python-first model[5], it doesnât allow authors to actually depend on pybi in the same way they would depend on a normal package. So this would let people install Python itself from pypi (using posy or whatever), but for all the other things they might want to install from pypi that arenât Python libraries, theyâre still out of luck.
So, all in all, I donât think itâs actually the case that conda packages âderiveâ (or must derive) their metadata from pypi metadata; and even if it were, I donât think itâs a good idea to double down on a metadata system that is completely unable to handle non-Python dependencies. What you call the âabstractâ pypi metadata is, to my mind, not abstract enough, because (among other things) it conflates a Python package with its bundled non-python dependencies. If that larger problem were solved, then pybi could be just one among many installable things that arenât Python libraries.
I donât think the wheel ecosystem itself needs to exist, just something needs to be the default. Currently thatâs the wheel ecosystem. One could imagine a world where instead of standardizing wheels, we made PyPI host sdists + a conda channel, and the upstream OSS package maintainers role just ended up served by conda (or some conda like system). I donât think that would be an inherently better or worse world but it would represent different trade offs then we historically made.
This PEP very much builds on our existing ecosystem of tooling to extend it[1].
However, I think that youâre missing that there is a key strength here in keeping the dependency information for Python level dependencies and system level dependencies separate. That strength is that PyBI is an optional thing for people to use to install their Python from. The original Python packages are still wholly independent of the specific Python interpreter that they are installed into.
This drives straight into one of the downsides of Conda that was mentioned upstream-- It can only support Conda provided Python, it has absolutely no mechanism to install into a non conda Python environment. Thatâs perfectly fine if you can dictate to your users that they can only use Conda, but most OSS developers are unable or unwilling to do that. When someone comes to me, as a Python developer, and tells me that they have a bug that is happening when running under a Python provided by say Debian, I need the ability to install my project into that Debian environment, targeting that Debian Python, so that I can explore, fix, and hopefully test my fix against that environment.
If Python is treated as âjust another dependencyâ, then it becomes a lot harder to support an arbitrary interpreter to provide that dependency [2].
Yes, this bifurcation means that this system is unable to install nearly as many different types of dependencies⌠and thatâs ok? We donât need the PyPA ecosystem to support every use case for everyone. If youâre operating in a context where the tradeoffs made to support arbitrary interpreters isnât useful but the ability to treat everything as just the same is⌠then you should definitely use those other systems.
And TBH, Iâm not really sure how I feel about the PEP. It feels like a reasonable incremental change, but it also feels like something that weâre entering a world where we start having multiple different âtypesâ of packages, each for their own specific use case, and maybe we would be better served by trying to unify them to a single package format that is flexible enough to satisfy multiple use cases. âŠď¸
Iâm sure that with enough effort that we could do it, but the fact that most, or all of, of the conda-like tools donât provide that functionality I think is a sign that trying to do that is perhaps more difficult than expected. âŠď¸
I think âpypi metadataâ is the wrong term and hints at an incorrect conceptual model (you understand all that Iâm sure, but readers of the PEP may not). Iâd use âPython package metadataâ. Itâs source-level metadata that is only defined in a single place as part of the Python package, and itâs equally valid whether itâs hosted on PyPI or taken directly from an sdist created directly from a VCS tag. Then for binaries, whether wheels or any other format, we need different metadata (also on PyPI).
Please hold your horses on this one. The PEP on filling this gap is almost ready for submission. And when that lands, it will be a significant benefit to packaging systems like conda-forge and Linux distros.
Youâre missing multiple things here, most importantly that the metadata in meta.yaml got there initially by (mostly) automated translation from pyproject.toml metadata. So itâs definitely not the case that pyproject.toml metadata has no relevance for conda.
I hope you mean something like âin separate sections of pyproject.tomlâ (and yes, that seems like a good thing). If you mean that itâs a key strength to not record info on system dependencies at all, then I could not disagree more - itâs a huge pain.
If thatâs what @njs meant then that mollifies many of my objections. But I read âpypi metadataâ as specifically referring to the type of metadata that pypi packages now have, and not additional stuff (like for instance non-Python deps).
Sounds great! Looking forward to it.
As I described in my post, that is possible, but not necessary. You can write most of the metadata (and in particular, the dependencies) directly in meta.yaml. Of course, yeah, some people generate the one from the other, but I see that as, again, due to social/marketing factors that are important but orthogonal to the tool functionality. Iâm not trying to say that pyproject.toml has no relevance to conda, but rather that pyproject.toml (or a wheel, or a PyPI package) is not necessarily the âsingle source of truthâ that @njs seemed to be suggesting.
I do not mean that itâs a strength to not record them at all. Recording that information as independent pieces of metadata seems like a reasonable and positive thing. I mean that the nature of package managers like Conda/Apt/etc that treat all the types of dependencies as the âsameâ, while a strength for their use cases, are a weakness for the use case where you want to be able to use different providers, some of which may not be a package managers at all, for those dependencies.
The key thing here is that when youâre developing a Python project, itâs useful to be able to install into varying target environments and interpreter and system level dependencies. That means you need to be able to differentiate between âthings I want to get from the systemâ and âthings I want to get from pip et alâ. When all of those concerns are collapsed into a single dependency chain it becomes difficult to do that.
Yeah, I keep making up different handwavy terms here because we donât really have a standard term.
But, I will defend this one a little bit: one of the fundamental differences between dependency metadata in conda, Debian, nix, etc. is that they all need some mapping from names â projects, and that they each have their own independent namespace to specify this mapping. For the metadata that we put in sdists, PyPI and its operators define the normative package namespace. So PyPI is special for this metadata, even though yeah the metadata itself ends up in all sorts of places and lots of times you can do useful things with it without ever contacting https://pypi.org.
Itâd be nice to fix that, but I admit that Iâm not sure how. Maybe adding terms to Glossary â Python Packaging User Guide after writing a short informational document somewhere?
Sure, but thatâs not really all that interesting, it just means that if you think about packages being able to come from different ecosystems/providers, the pypi: part of pypi:pkgname is implicit and pkgname is the canonical name as chosen by the package authors.
Even that latter bit isnât always true, e.g. pybind11 and pybind11-global both provide a pybind11 Python package. It is mostly that âI need a pybind11 package installedâ part that is generic, not the name of the providing PyPI package. From the PyPI and package manager perspective those are different, but from the package authorâs perspective theyâre not - it tries to capture the requirement that follows from âI have #include <pybind11/pybind11.h> in my code baseâ.
That is true for a Linux distro or Homebrew, which lack multi-version support. It is not true for the likes of Conda/Spack/Nix, where I donât think you lose anything of importance - that approach is strictly superior Iâd say (dependency management wise), compared to using two separate package managers with implicit dependencies between them. You are using Python packages where there is almost no dependencies on system dependencies it seems, and the concerns are then orthogonal to you. But in general this is not true, it is simply not an orthogonal space. As an example, if different versions of numpy support different version ranges of openblas, then having that link broken is very painful. You just end up doing manually what you otherwise get from the package manager.
I agree with the discussion about conda being a waste of energy (here at least) - itâs not like weâre going to discover anything new there in yet another Discourse thread.
I do want to point out that your reasoning about wheels conflates two things: wheels as an intermediate binary format (which is perfectly fine), and wheels as redistributable artifacts (which causes huge amounts of pain). I think the former are necessary, the latter are debatable. No need to have that debate here, but itâd be good to improve the way you phrased this in your PEP and disentangle the two. Iâd be happy to take this elsewhere and help review/edit your âWhy not just use conda?â section in more detail?
Itâs also true for Conda/Spack/Nix afaik. If Iâm developing a binding to OpenSSL, and I want to test it against a Debian environment, if my âsourceâ is a conda package, I have to figure out how to split the dependencies that I now want Debian to provide, from the dependencies I still want to get from my typical toolchain.
Not at all, I suspect you are misunderstanding. Your own âsourceâ, a Python package in VCS, would not change at all. I know that because this is what most projects I work on are already like. We have devs that use Debian, or Conda, or Docker, or Homebrew - it all has to work at the same time, because different folks have different preferences. You can always build one or more Python packages on top of a set of dependencies from any given provider.
The point is not about changing the way a Python package itself works, the point is about whether to install all your dependencies with 1 or 2 package managers. Assuming the package manager has all the versions of all the dependencies you care about, it seems clear that using 1 is better and more general than using 2. This is just the nature of dependency management - you have more complete information, and hence can do a better job. (analogy: split your python dependencies in half, and install the first half with pip and the second half with poetry - you get perhaps the same end result, never better, often worse than doing it all at once with pip alone).
Just a quick note - itâs been pointed out to me that this could be interpreted as me dismissing the feedback weâve had from people who do find the âPyPA toolsâ experience frustrating. I didnât mean it like that, I was simply describing my personal experience[1].
Sorry for any confusion I may have caused, and to anyone who feels like Iâm dismissing the struggles they might personally have experienced.
And yes, I know Iâm not exactly an âaverage userâ of packaging tools âŠď¸
Sure, but thereâs also the question of what package managers you have. On Linux systems, thereâs always the âsystem package managerâ. On MacOS, I believe Homebrew is extremely common, but not universal. On Windows, thereâs nothing (Add/Remove programs is for applications, not shared library dependencies). Iâd argue that the number of people who install an âextraâ package manager is very much a minority. Python users get pip/PyPI as part of the default install, so thatâs present, but as noted doesnât include non-Python dependencies.
So the choice, by default, is between 1/2 (Linux and MacOS/Homebrew) or 0/1 (Windows and non-Homebrew MacOS). In the 1/2 case, yes 1 is better than 2. But also, 1 is better than 0.
I think weâre making progress on making sure PyPA tools work with Linux distros and Homebrew. Weâre not there yet, but we have the processes in place and weâre working on it.
People using Nix or Spack are likely either specialists, or are using environments managed by specialists for them (HPC being the case that immediately comes to mind). Iâm going to ignore them for now, both because I have very little knowledge of them and in the interests of brevity.
And then thereâs conda. I donât know how Linux users/maintainers see conda, and Iâd be really interested to better understand that. I guess MacOS either feels âLinux-likeâ (Homebrew) or âWindows-likeâ (non-Homebrew) but again, Iâd love to get actual information here. But on Windows, my impression is that many users[1] view Conda as an application, much like RStudio, Eclipse, or Visual Studio[2], which provides a âlanguage environmentâ for Python users[3]. As such, they donât think of it as a system-level package manager (Windows users donât tend to even know what a package manager is!), but more like a âplugin managerâ for the application. So you use conda to install stuff for conda. Using pip feels weird and slightly wrong. Finding something is missing from conda seems like something you have to live with, not something you can address yourself. Etc.
To be clear, thatâs how the conda users Iâve worked with have perceived conda. It may not be the way people here expect or want users to view it, but in that case there may be an education gap that conda needs to look at. Or maybe not - maybe conda developers are happy with how people use conda and thereâs not a problem. But I think itâs something that we should be aware of here, as we have a long history of misunderstanding each other, particularly around the relationship between conda and PyPA, and I think explaining our understandings, even if they seem wrong or misguided to the other parties, is a useful way of establishing some common ground.
PS This is getting quite a long way away from PEP 711. Maybe it should be split off into a separate thread? On the other hand, itâs ground weâve covered before, so maybe we should simply leave it at this point?
At least in the closed-source corporate âdata scienceâ world, where I worked. âŠď¸
And yes, I know conda offers more than Python, but again, thatâs not how people see it - no-one suggests installing conda to people who want access to R, for example. âŠď¸
I donât think itâs entirely unrelated, because PyBIs will have the same question to answer: Where do you get your external dependencies? As far as I can tell, a main difference between a conda Python and a PyBI will always be that the former has unvendored many libraries that the later vendors. So can and will there be âconda PyBIsâ, which fall back on conda libraries? The reflexive answer is âno, use different environmentsâ, but how much do you bet that people will try and test e.g. all supported Python versions within one conda environment as soon as thatâs on the horizon?
One or the other Iâd say. There is a bit of overlap with PEP 711, but most of the last posts indeed did not overlap too much. My personal feeling is that the Discourse format is too limited to make much more progress on the distro/conda mutual understanding - I could reply to every other sentence in your last post, but I think itâd be much more productive if weâd spend an hour on a video call once and have a higher-bandwidth conversation.
Iâm going to say if this side conversation goes any farther I will split it, but I also agree Iâm not sure if it will help much to spark yet another thread on this topic, so Iâm hoping we can stop now to have a more productive conversation in some other way.