Wanting a singular packaging tool/vision

johnthagen · November 16, 2022, 2:50am

Just wanted to second what Steve said. In whatever way it is possible, rallying towards providing the kind unified tooling experience like Rust has would have huge benefits for the Python community. The kind of unified, cross platform experience the Rust team has managed with rustup/cargo/rustc that bootstraps so well would be great.

For Python it could be creating some kind of cohesive vision to rally around a central, official, out-of-the-box workflow that provided a similar set of cross platform tooling features (easy multi-interpreter management, pipx-like global installations, Poetry-like locking project dependencies, black-like formatter, etc.).

There are so many great people working on projects in these areas that perhaps the SC could help focus the community to reduce duplication of effort and have an easier onboarding experience for new developers looking to try out Python.

A concrete example of what this could look like is posted here:

Wanting a singular packaging tool/vision - #25 by johnthagen

pf_moore · November 16, 2022, 11:19am

This is a pretty significant change in scope for the SC and should be discussed more visibly than on Brett’s nomination thread. So if people want to discuss expanding the SC’s role to include packaging (or any other area) can it be moved to its own thread (and cross-linked to the packaging category if it’s going to affect them)?

Without wishing to sound defensive, what do you think the PyPA have been trying to do all this time? Simply dumping the problem on the SC isn’t going to make a huge amount of difference by itself, it will just add work for them.

Yes, the SC has an authority that the PyPA maybe doesn’t (we’ve always been uncomfortable with the “A” standing for “Authority” ) but an “official vision” won’t directly make anything happen. For a much smaller example, look at the stdlib venv module. That is the “official vision” on how to create virtual environments, but virtualenv still exists and thrives, conda has its own environment creation mechanism, etc.

Anyway, as I say, this should be for another thread.

h-vetinari · November 16, 2022, 11:19am

Python has a way, way, WAY harder context here than rust^[1], since it exists as a glue language to just about everything. So without a comprehensive and cross-platform answer how to deal with non-python artefacts (building, distribution) that live in python projects - anything from C/C++/Fortran/Rust/Java/NodeJS/CUDA… - a cargo-like experience is very far away. And doubly so for a solution that doesn’t just hard-copy binary artefacts (e.g. openblas) into wheels, redundantly across packages.

Poetry is well-loved for its UX, but suffers from the same issues as all wheel-based installers – i.e. no comprehensive & portable solution how to deal with non-python dependencies. Conda/mamba has the most complete (though certainly not perfect) answer to this set of problems, but doesn’t get much recognition (especially on DPO), for reasons that elude me - even though it’s effectively been offered for that purpose.

Perhaps this is because packaging is an unglamorous task with lots of responsibility, and consequently tends to attract few people, and those that do work on the topic regardless tend to be people who don’t require much recognition, but have lots of opinions about how things should be done (witness the proliferation of python build tools/frameworks, or almost any packaging PEP discussion).

Don’t get me wrong, I’d love to see python packaging make a big leap towards a cargo-like experience, but it’d IMO need a project that’s at least as organised/funded as faster-cpython, for a likely even larger body of work^[2], in a fractious ecosystem, for – let’s be honest – a topic that’s way less sexy (=harder to sell) than performance.

Undoubtedly this is something the SC could push for, I just fear that the problem space is so big that inevitably people go “nah, not touching that”, which is pretty much what happened periodically ever since numpy/scipy first ran into these issues (ironically, that was the catalyst for conda being created), and settling on standardizing a variation of the most comprehensive existing solution seems to be completely off the table due to non-technical(=human) constraints.

where dependencies are mostly mono-lingual and can be built with the same build system, which conveniently also comes with a uniform compiler everywhere ↩︎
for reference, even the huge C++ ecosystem famously has no standardized solution for building/distribution, and this is but a subset of the problem a cargo-for-python would have to tackle at least partially. ↩︎

pitrou · November 16, 2022, 12:44pm

I’m sure the PyPA has done of ton of nice and useful things, but “some kind of cohesive vision to rally around a central, official, out-of-the-box workflow” doesn’t sound like it. On the contrary, the PyPA seems to openly go for an ecosystem of scattered utilites with partially overlapping goals (not to mention the diversity of configuration schemes that goes with it).

Bootstrapping a Python project always leads to looking around the Internet for clues as to “the idiomatic way”, only to find out that there is none and that you have to make a half-informed choice between multiple possibilities (not to mention multiple documentation sources emphasizing different practices).

And I say that as a Python core developer. I have no idea how frustrating it might be for the occasional Python developer who was hoping to do the Right Thing without hassles.

pf_moore · November 16, 2022, 12:44pm

My experience is that there’s a non-trivial subset of users who find the conda UX unpleasant to the point of not being willing to use it^[1]. So while promoting conda as “the solution” is a common (and not unreasonable, from one point of view) suggestion, no-one has yet explained how that would change anything unless conda choose to work on catering for those users’ concerns and issues.

But as I said, we’re way off topic here.

Yes, I’m one of those users, but I know of many others, in various fields and with various levels of experience. ↩︎

pf_moore · November 16, 2022, 8:00pm

Agreed, we’ve never achieved this. We’ve tried (multiple times - pipenv, poetry, conda, …) but nothing ever pleased enough people to become “the definitive solution”. It’s possible the SC could exert their authority and say “this is the one and only packaging solution”. Would that work? Do they even have that type of authority? I don’t know.

h-vetinari · November 16, 2022, 9:25pm

From my perspective, I think the UX is the least important aspect for a having a unified workflow - it’s the bikesheddiest part where everyone has an opinion, but what I’m interested in from the conda side is that it has enough abstraction power to deal with all sorts of non-python dependencies in a way that runs stably (i.e. no random ABI divergences and crashes between packages).

In other words, I think the UX can be polished, but the UX itself without a strong technological foundation is no major improvement of the status quo.

The impression I got from your comments re: conda over the years (I might very well be wrong) is that you haven’t used it much if at all in recent years, and that there are some usecases you’re interested in, which conda doesn’t do well or at all (e.g. running against a development version of python).

That’s fair enough, but overlooks a large chunk of problems that have been solved in a way that leads to - inter alia - certain heavy dependencies in the data stack being conda-only, because it’s essentially impossible to pull off with wheels (e.g. a lot of the Nvidia / rapids.ai ecosystem).

A lot of those benefits come with a substantial cost though, mainly in the form of a lot of integration work (making sure shared dependencies are unvendored, recompiling packages against new library versions, etc.), where conda-forge is essentially a cross-platform distribution, that’s being kept up to date by an army of bots and a substantial number of volunteers.

So the fit with the broader python ecosystem is not trivial, and there’s an infinite number of details to disagree about. Though by my reading of the role of the SC, they could bless a given solution as the “official” way (presumably after a large project to iterate to something acceptable for most people).

But again, it’s a vast problem space, with any number of possible local optima, and so charting a path from “we are here” to “be more like cargo” is just a monumental undertaking IMO.

CAM-Gerlach · November 16, 2022, 9:44pm

There is a ton of outdated and just plain wrong information out there, for sure. But at least after the recent overhaul, the official PyPA Packaging Tutorial describes “the idiomatic way”, and seems to do a pretty good job at it now. It covers pretty much all the key steps, and aside from a few specific differences that it highlights, the basics are common between all the backends.

And configuration-wise, basically all backends now use the standardized, Cargo-inspired pyproject.toml configuration format, in which only the backend-specific settings differ between backends (aside from Poetry, which is still working on its support for the PEP 621 [project] table for metadata).

brettcannon · November 16, 2022, 9:44pm

I’m going to suggest people avoid phrasing anything as “the conda approach is better/worse than the PyPA approach” as that happens enough and never leads to a better outcome. As someone who has to work with both worlds from a tooling perspective as their job I can tell you no one has nailed the packaging problem perfectly.

If people truly want to work towards a unified tooling solution for Python, I would suggest discussing how to even works towards that, not even ignoring what it would have to support. Such a migration would be huge and with such a large ecosystem you will need to work out how to move everyone forward, else you have just added yet another toolchain which is the exact opposite of what people are seemingly wanting.

To this particular point, it has been stated previously that if someone were to come forward and help work out GPU detection in some standardized way that would be great, but no one has. So I don’t think characterizing it as “impossible to pull off with wheels” is fair as much as no one has been up for putting in the work to figure out what would need to change to make that happen. But that is yet a different topic and not important for this specific topic of unifying the packaging toolchain.

barry · November 16, 2022, 10:07pm

I’m coming round to the idea that we can get a lot of mileage and improvements in the ecosystem by separating out the front-end UX concerns of package management, and the backend build-system support. Having been around for a long time, I think we should celebrate just how far Python packaging has come. It really is way better today than it’s ever been, even given the proliferation of tools and gaps in what IMHO we still need. Kudos to everyone who works on this stuff.

Also as someone who is actively working to modernize my corporate ecosystem for Python, I have lots of thoughts, which I plan on sharing later. In general, I think we have more opportunities for unifying the experience, doubling down on pypackage.toml, pushing for more standards/RFCs, etc. Other language ecosystems don’t have the “benefit” of decades of Python’s legacy systems and backward compatibility requirements. I also think that open source and corporate needs have lots of similarities, but also lots of differences, that need to be taken into account.

bryevdv · November 16, 2022, 10:12pm

@h-vetinari @brettcannon It’s definitely not impossible, even today, c.f RAPIDS is back on pip

I think conda’s link-farm environments are really terrific, especially when considering non-Python package dependencies, but I am biased ^[1]. I definitely prefer the simplicity and observability of a link farm to the auto-magic of venvs, and of course they are more easily generalizable beyond Python. But I do think, at this point, that “conda packages” themselves are redundant and duplicative for the vast majority of “python packages”. In my ideal world, there would be a marriage of conda environments with wheels for most packages, and “conda packages” reserved for non-python things. (If today’s wheels and PyPA tools had existed circa 2012, perhaps that’s what we would have ended up with, too.)

I am the original author of conda, though I have not worked on it in many years ↩︎

h-vetinari · November 16, 2022, 11:42pm

It’s not just GPU detection, it’s also bleeding edge C++ toolchains (with no reasonable way to bring along the stack you need, etc.), and so on. I can list many more examples, and each one individually might be rejected as “too niche”, but then, the issue is so much larger in aggregate [extract from here]

Solving this (build toolchain standardization + tracking non-python library dependencies and their ABI) is the core problem of a unified packaging story. Fixing GPU detection (in isolation) is just another bandaid.

I agree broadly, though I want to distinguish “approach = way to solve a problem for the whole ecosystem” from the technological capabilities. On the latter, conda is unquestionably superior. Why? It can take into account system level dependencies (glibc, cuda, …), replace them where necessary (newer libcxx on MacOS), it can reasonably ensure ABI-coherence, it does not lead to multiple redundant artefacts (i.e. many packages vendoring OpenSSL / OpenBLAS / …), it can distinguish build and target environments well enough to support cross-compilation, etc.

And that lack of equivalent capabilities on the pip-side is causing a lot of wasted engineering effort, leading several times to people/companies throwing their hands up and saying: not possible. Occasionally some volunteers later close that gap again. Other times conda-artefacts are fed back to the wheel side, c.f. numpy on osx-arm^[1].

It’s fair enough to say that the specific approach taken by conda is not feasible for PyPA for a myriad reasons, but let’s not pretend that wheels as a paradigm are somehow similarly capable.

which wouldn’t have worked without cross-compilation capabilities, due to the lack of broad availability of free osx-arm CI. ↩︎

h-vetinari · November 17, 2022, 12:19am

I think the individuals fighting for improvements in this space are heroes in their own right, though I don’t think the situation is worth celebrating.

I once summarized the situation for a presentation as follows:

The core issue is that packaging is an afterthought also on the language level, and it shows. Guido routinely said he’s not interested in the packaging side, and the fact that PyPA operates so far removed from the SC reinforces that. So I’m excited to hear:

… because this issue needs more than a couple lone volunteers trying to firefight the problems with “Step 0” for all of python. It’s a language level issue (already pointed out as possible Black Swan Russell Keith-Magee’s PyCon 2019 keynote), and should be treated as such^[1].

Rust doesn’t, that’s for sure. But C++ struggles with the exact same problem, especially w.r.t. to tooling/packaging. And it’s a completely unsolved problem there as well – so I empathise with the difficulty of fixing this. But all the languages that have decent answers to this made packaging a first-class citizen – the same can hardly be said about python (yet?!).

As a positive counter-example, even Fortran(!) managed to reinvent itself and now has a package manager.

It’s IMO reminiscent of a tragedy of the commons - everyone needs to install packages, but it’s really unappealing to try to solve the surrounding immense challenges as an individual, so people will just hack something together on their machine until it runs, so they can get back to the fun part. ↩︎

pf_moore · November 17, 2022, 11:15am

Nobody is pretending that, as far as I’m aware. What is happening is that a significant group of users (significant enough to justify that “wasted engineering effort” you refer to) are not willing to accept the trade-offs that using conda would involve.

For a “Python language packaging vision” we need to address the position of conda. Is conda part of that vision, or is it a separate, specialised, ecosystem that remains forever as an alternative to the core approach?

If we treat conda as an outlier, a specialised ecosystem unrelated to the “Python language packaging vision”, then binary dependency management conda-style is indeed “out of scope” for Python packaging, because “use conda” is the response to people who want that. But people who are in the core audience for the “Python language packaging vision” have made it abundantly clear that they want access to libraries like numpy, scipy, sklearn, pytorch, etc. So the vision needs to look at how to make it possible for the developers of those libraries to make them available to that set of users. And “use conda” is not the answer, so “conda solves those problems so you shouldn’t try to make wheels do so” also isn’t the answer.

If conda is part of the vision, it needs to change, probably quite significantly (to meet the needs of those people who currently don’t use it). Only the conda developers can decide if they want that much change, but if they do, then we (the conda devs, the PyPA and the packaging community) need to get together, because there’s a lot to do…

pitrou · November 17, 2022, 1:22pm

Perhaps you could start by explaining how conda doesn’t fit those needs? (other than “I don’t like the UX”, which as @h-vetinari pointed out is largely irrelevant)

As an aside, to me at least it seems that using vague and lofty words like “vision” tends to drown the debate into theological arguments - which don’t help at all. Unless, of course, you can point me to a precise definition of that “Python packaging vision” you’re talking about?

pf_moore · November 17, 2022, 2:49pm

I don’t agree the UX is irrelevant, I know of people who were learning Python who would have given up if they had to use conda (I can’t get details as I no longer work with the individuals in question, I’m afraid). However, I’m happy to put that point aside.

The most glaring example is that (to my knowledge, at least) conda doesn’t work with python.org builds of Python, the Windows Store distribution or Linux distro builds.

I agree. It wasn’t me that asked for a “vision” here, I’m mainly just exploring the implications if people want to go down that route. Personally, I’m largely happy with the direction things are going in under the PyPA (on the understanding that progress is frustratingly, even glacially, slow ), and having conda be a separate ecosystem for people who prefer/need it.

pitrou · November 17, 2022, 3:19pm

Developer ecosystems are often able to standardize on tools with defective UX compared to competitors (git vs. Hg, for example), which is why I think UX is not the primary concern here. Also, let’s not forget that the pip UX isn’t always pretty either (witness the hodgepodge of options pip install has, for example)

I agree that conda not being able to work with non-conda Python installs is one of its major drawbacks. I’ve never had any important concerns with the conda Python builds, but I suppose YMMV.

steve.dower · November 17, 2022, 5:17pm

Of course it doesn’t, conda works with its own builds, because it is for managing the entire system.

conda in no way equals pip. They are fundamentally different tools. Trying to use them both the same is only going to lead to confusion (which you appear to be enjoying already )

FWIW, I don’t think there’s a need to reconcile conda into a “Python packaging vision”. They can remain totally independent and self-promote, because they’re full stack.

steve.dower · November 17, 2022, 5:25pm

Also, I never asked for a singular packaging vision, but a singular vision for growing, developing and supporting the Python user base that includes packaging (and education, and documentation, and outreach, etc.)

I had hoped that the SC was in a position to provide that vision, but it appears they are not. So we’re looking to some other person/group to pull everything together and find the important priorities.

Right now, honestly, Anaconda is doing it best, giving their users multiple tools, docs, guidance, etc. and actively developing new ways to use Python. Meanwhile, python-dev is looking like mere caretakers of a GitHub repository, and PyPA is trying to put out fires and reconcile massive divergence between ideas that became implementations because the discussions were too hard.^[1] I hope we can Discourse our way out of it into something with a bit more focused momentum, but it feels unlikely.

And before anyone takes offence, I am definitely putting myself in the “caretaker” camp. I have no affiliation with Anaconda, and just get to watch on from the outside while they do all the cool stuff. ↩︎

ofek · November 17, 2022, 5:34pm

I understand UX isn’t the only concern, but I’d argue it is the primary one and quite literally the only thing the OP is asking about.