Wanting a singular packaging tool/vision

rgommers · February 20, 2023, 10:07pm

@BrenBarn as much as I like using conda-forge, you’re missing a whole bunch of points, and advocating for “more conda-forge, less PyPI” or “conda-forge is better for users because curation” gets us nowhere. I suggest you stop here, because it’s noise at best, and more likely counter-productive to whatever you are trying to achieve.

BrenBarn · February 20, 2023, 10:08pm

I’m trying to use terms like “the standard built-in packaging tool” because I’m trying to avoid the assumption that in the utopian future that tool is pip. So no, I wasn’t talking about pip. I’m saying that whatever official packaging tool comes with Python should combine the features that provide the best experience for users, and if that is not pip as we know it, then pip as we know it shouldn’t be the official packaging tool.

Right now, yes, I think almost everyone would be better off switching to conda if the alternative is pip. But in the utopian future maybe not. What I see as the point of this discussion (and those in related threads) is to figure out how we can move towards a future where people have a different and better range of choices available to them, instead of the current situation where (for many people) the best option is to completely ignore everything that PyPA and all the packaging PEPs are doing and just use conda instead.

Your point about the heavy burden on PyPI maintainers is well taken. But to me it gets at a central tension here. PyPI is a service that requires constant maintenance. And the PyPI maintainers are unsung heroes to gazillions of Python users the world over. But how did we wind up in a situation where the official, most-used package repository has two maintainers, while a separate repository, totally unofficial but also community-driven and volunteer-run, and (presumably) used by fewer people, has 26? (Or 6 on the staging-recipes team that actually reviews new recipe PRs.)

I apologize if I came off as criticizing or dismissive toward PyPI, as that’s not my intention. My point is just that I think the discussion here is hampered if it’s always in terms of “what does pip do” or “what does conda do” or “what does PyPI do”. The question is “what can Python packaging do that will give users the best experience?”, and pip, PyPI, and all the rest are just different means to that end.

BrenBarn · February 20, 2023, 10:11pm

Sorry, posted my last comment just as yours came in. But I’ll just say (as I said there) that although I may seem to be advocating for conda, what I’m really trying to do is advocate for a broader view of what can be done to improve the situation for users, rather than seeing things in terms of a choice between existing options, none of which is perfect. It’s just kind of hard to be consistent about always clarifying the distinction between “how conda/pip/etc. works now” and “what we can learn from how conda/pip/etc. works now to apply it to better tools in the future”.

dstufft · February 20, 2023, 11:04pm

conda-forge is not a “more curated PyPI” though. It is an entirely different packaging toolchain that has some overlap with the PyPA provided toolchain, but isn’t a 1:1 replacement. More to the point, conda-forge isn’t a more curated PyPI because I can’t use it, afaik, without throwing out my entire toolchain and using the conda toolchain wholesale, which is an entirely different thing than what Paul was suggesting.

That also means that it becomes really difficult to separate the effects of tradeoffs made by curation of the “default” channels in Conda, and the effects of all of the other trade offs that conda(-forge) did differently than the PyPA tooling.

After all, curation isn’t good, curation isn’t bad. Curation represents a trade off, which means it has positives and negatives.

The positive side of something like PyPI which does not curate is that the bar for publishing, assuming you’ve already packaged the software, is very low. That means people are more likely to publish their software, which means as a whole there is more generally useful software out there.

Any barriers to entry you put in the way of publishing software, means that there is going to be less software out there in the world.

Just as an example, I’ve not used conda or conda-forge, but I’m told that it’s popular to use conda+conda-forge and still ultimately use pip to install something because the curation process has meant that something that someone has yet to put in the effort to clear those barriers for that particular software yet.

Obviously the downsides to something like PyPI is that because there is no curation, individual users are more responsible for vetting the software that they choose to install, whereas something like conda(-forge) there’s a human layer involved that is already doing that vetting for them.

Another benefit to the PyPI style approach is that it scales much higher. If we look at conda-forge, they have something like 30k total packages, if we look at PyPI it has almost 500k total packages. Granted those 30k on conda-forge are all likely to be useful packages, and some number of the 500k on PyPI are not, but we can assume that some number of the 500k that aren’t on conda-forge are also useful since it’s still very popular, afaik, to use pip and conda-forge together.

In fact, I’m not aware of a single curated repository in existence, outside of niche targeted ones like internal company ones, where it’s not considered a regular occurrence that some subset of their users have to bypass their curation and install directly from PyPI. That suggests that literally none of them have managed to scale their curation process to even “all of the useful software on PyPI”, much less “all of the software on PyPI”. That suggests that if we were to try curation by default, we’d either bottleneck the ecosystem OR everyone would just end up configuring pip to also include the uncurated repository anyways-- which means we’ve spent a whole lot of effort to not gain much overall.

Neither answer, curation or FFA, is better in all cases, they just represent different tradeoffs, and which of those trade offs make sense will depend on the specific facts in the situation at hand.

I’m sympathetic to the view, but you also have to temper this by the fact that most of the people involved here have been working in or around Python’s packaging for a long time now, and we’ve seen a lot of people come and go who want to burn the existing system to the ground and start over new or have broad ideas that are divorced from the reality of the needs of the entire packaging ecosystem, particularly Python’s packaging ecosystem, which attempts to solve some of the hardest problems that most packaging ecosystems have to deal with due to the nature of Python as an interpreted glue language.

I think I could probably name no less than 5 attempts to re-envision Python’s packaging that ultimately foundered for a number of reasons, typically things like:

They tried to boil the ocean and fix everything all at once, and ultimately burnt out.
They assumed that something that works fine for some specific area in Python’s ecosystem would apply globally to all use cases that Python supports.
They jumped in and wanted to make broad changes without understanding why things are the way they are, what benefits come from things the way they are.
They didn’t include some way for the ~500k packages that already exist, and just assumed that everyone would jump onto the new system right away and drop support for the old way immediately.

I think it would be incredibly valuable to try and design what we think an ideal system would look like from first principles without the anchor of legacy or the status quo holding it back. I don’t think that’s particularly useful as a thing to go off into the wilderness and come back with some brand new ecosystem, but rather as a guide for things that we could evolve in the existing toolchain to get closer to that ideal system.

ssweber · February 20, 2023, 11:34pm

I think how Mozilla does curation for Firefox makes a lot of sense.

Mozilla badges

I think something to this affect helps curate, without needing a second stand-alone package repository. (it would take some thinking into how to put this into place)

PythonCHB · February 21, 2023, 6:37am

It seems this conversation has gotten a bit sidetracked by mingling two orthogonal concepts:

what should the “standard” package manager do? The key difference (there are many) between conda and pip (the pip-based stack – venv, etc, etc) are that pip is a python package manager, and conda is a general purpose package manager. That is the entire reason for conda’s existence – folks wanted a system to manage python and other non-python packages, and the precursor to the PyPa didn’t want to take that on. Does the PyPa want to expand its focus to take that on now? I have no idea – but that perhaps should be part of this vision discussion.
what are the advantages / disadvantages of a “curated” package repo like conda-forge? Someone could certainly build a wheel-forge if you wanted to, there’s nothing special about conda packages in that regard.

However, other than the curation, the main service that conda-forge provides is a CI system for building packages – this is a really big deal. cibuildwheel, I think, provides some of that, but I don’t know how widely used (or easy to use) it is.

Well, yes and no. Folks use pip along with conda because there may be no conda package available for the package at hand – but that’s not (necessarily) because of the curation process per se. (and it’s a lot less common than it was)

Anyone can put up a free channel on anaconda.org (or their own server for that matter, the protocol is pretty simple). No curation required.

If a conda package isn’t available, it’s because non one has made the effort to make it available. It’s not much effort, but it is some – but the curation process is the least of it. It can be a bottleneck, but for the most part putting a package on conda-forge is easier than doing it yourself – you get help, and the CI infrastructure is fantastic. (I haven’t use my org’s conda channel for years …)

The reason many packages on PyPi aren’t on conda-forge (or any other conda channel) is primarily social, not technical or administrative. Package maintainers choose not to put their stuff on there for a wide variety of reasons I won’t get into, but “the conda-forge admins wouldn’t accept my package” is not one of them.

In fact, most of the packages on conda-forge are not maintained by the package authors, but by third parties (I myself maintain quite a few) – I’m not entirely sure why that’s true, but it’s certainly social, not technical.

PythonCHB · February 21, 2023, 7:45am

Actually, I think that’s 6 in addition to the 26 core members.

But it’s a totally different system – the actual packages are served up by anaconda.org, which I’m pretty sure is run by anaconda.com. So a commercial company is actually doing teh equivalent of running PyPy.

conda-forge is a system for building packages – and the “curators” are spending most of their time helping folks get their recipes in order, not “curating” per se.

jagerber · February 21, 2023, 1:40pm

Disclaimer: I’m a newcomer here trying to learn the ropes but am interested in trying to contribute. Take what I say with a grain of salt.

I fully understand the reaction against “burn the system” approaches. It is clear from experience and your nice summary why such approaches will fail in python. But when reading through these threads, when new ideas are shot down with “well that might be a good idea but it would be very challenging to build to given the current state of tools” or “that’s a good idea but we don’t have the resources for it” it’s very frustrating, especially in a “vision” conversation like this one. I would prefer to first have a free conversation about what the ideal tools or workflows would look like that is totally divorced from existing tools. Then once one or more distinct requirements/desiderata lists are agreed upon we could then assess each one for what it’s workload or feasibility would be. Likely more satisfying lists of requirements would be a heavier workload. But I think it would be nicer to assess those things once more possibilities are laid out in summary form.

Again for me, as an outsider looking in, it’s hard to tell why some ideas are discussed endlessly and some ideas are shot down with a “we don’t have the resources to implement that”. I’d rather see complete, digested, lists of ideas that can be compared and contrasted with each other wrt satisfactoryness and resources etc. all at once. For example, maybe the only set of ideas that is easy enough to implement in 2 years is a marginal enough improvement that it’s not even worth spending time on and it would be better to work on a set of ideas that will take 5 years to implement but will really be worth it in the end. Something like that…

Wanting one or more freely generated list of desiderata/requirements for later vetting is not the same as wanting to burn the system.

edit: For example, one of the requirements would be “At no point should the ~500k existing packages become inaccessible because of a change made on the python packing end without N years of notice” (where N could be infinity…)

pf_moore · February 21, 2023, 2:39pm

I have a really hard time responding here, because at some level you are absolutely right. The problem is one of resources, though, but maybe not in the way that it seems at first.

The people involved in packaging are typically overwhelmed with “stuff that people want us to do”. And one of the things people want us to do is discuss stuff. Personally, I spend the vast majority of my packaging time these days just participating in discussions - I very rarely get time to write actual code. This is an example of that - I think your post deserves a response, so I’m spending some of my time trying to put something together that I hope will be helpful.

But I can’t respond to everything, so I have to prioritise. In doing so, I’m certain I come across as “shooting down” ideas. I hope I don’t say precisely “we don’t have the resources to do X”, but I definitely say “that’s hard, by all means look into it and come back if you can get it to work”. That’s shorthand for “we’ve thought about this and can’t work out how to do it” - but the latter invites follow-up questions like “can you point me at previous discussions?” And again, I simply don’t have the time^[1] to go and find links for that person.

To give a specific example, the following came to mind for me when looking at the recent discussion on a “curated PyPI” (something which isn’t even that radical an idea, TBH):

What counts as “curation”? I genuinely don’t know, but it looks like conda-forge treats it as “we can get it to build”. OK, so in one sense, everything on PyPI is curated to that extent, because you have to build at least a sdist to upload it.
In the conda-forge case, it’s a bit more, though - they build using a different process, so there’s a check there. But not everyone wants to build for conda (that’s a whole different debate) so we can’t mandate that “builds under conda” is a check that everything must pass. So once again, what is a tool-independent definition of what is a sufficient check here?
We could say “must build into a usable wheel”. There’s a long-running intent to add some sort of build farm to PyPI, but it’s really hard, because existing build systems (notably setuptools) can run arbitrary code when building - so sandboxing, protection against malware, etc, need to be considered. And in any case, what counts as “usable”? A Linux user can’t use a Windows wheel.
There’s discussion about coverage - we can talk forever about what proportion of the packages on PyPI are actually useful, but any curation proposal needs to have policies about what gets priority when the curators have too much to do. So how do we decide that?
And what’s the fallback when the curation isn’t enough and someone wants a package that hasn’t yet been dealt with (or maybe a critical bugfix of something that is included, if curation means approving every new release)?
Saying that project developers can do their own curation makes no sense, because that’s what PyPI is. There will always be fewer curators than project authors. So we have to look at bottlenecks, whether we want to or not.
Transition questions arise as well. Your edit mentioned not losing access to anything - that’s exactly the sort of question that someone needs to answer - how do we maintain access to the packages people are using, when we don’t know what those are for certain? (Download stats may help here, but mirrors confuse things).
Any alternative to PyPI needs hosting - the hosting costs for PyPI would be huge if not provided by sponsors. So how do we host a “PyPI next generation” while it’s being developed without omitting big chunks of stuff? Commercial support might be an answer, but will people accept a new PyPI hosted by (say) Microsoft?
Do we need all the history on PyPI? The 10-year old versions of packages that release weekly? How do we know? Who decides? How do we set a policy? If we say “nothing more than 5 years old”, how does that work for a critical, but stable, project that hasn’t needed a release in 5 years?

That was just a very quick brain dump, nothing more than thoughts I have on one current question. And it took me 20-30 minutes to write. And frankly, I’d be surprised if it’s of much use to anyone (it’s entirely negative, in the sense that it’s purely a list of potential problems - sorry to the people wanting to discuss curated repositories, I genuinely don’t want to stop that discussion happening, if it’s useful!)

If I said “we don’t have the resources to implement curation”, would that be shutting down the discussion? On the other hand, if all of the “insiders” simply ignored someone new coming in and saying “why not have a team reviewing package submissions”, would that be shutting down the suggestion?

I think the reality is that most of the “insiders” (I don’t like that term, but I understand that for a newcomer it feels like that) simply no longer have the time to engage in discussion on the more radical ideas. We’ve come to the conclusion, from bitter experience, that incremental improvement of what we have, while not ideal, is the only option that we can manage while supporting the existing infrastructure (which has to be our priority). That’s not to say that such ideas aren’t allowed, or that they aren’t useful, just that the people with years of packaging experience, and the people maintaining the current systems, haven’t got the bandwidth to explore them, so someone else is going to have to do a lot of work. And that will include taking on board “have you thought of X” style comments, which are likely intended to be helpful, but which will come across as negative, or as outright rejection, simply from lack of time to frame more positively.

OK, I have now spent 45 minutes on this post, and I really need to do some other stuff which I’ve been putting off. Please excuse the fact that this isn’t very well written - I genuinely don’t have time to edit it any further.

Sorry I don’t have a better answer for you.

Or, I’ll be honest, the interest. ↩︎

jagerber · February 21, 2023, 3:30pm

Paul Moore:

I think the reality is that most of the “insiders” (I don’t like that term, but I understand that for a newcomer it feels like that) simply no longer have the time to engage in discussion on the more radical ideas. We’ve come to the conclusion, from bitter experience, that incremental improvement of what we have, while not ideal, is the only option that we can manage while supporting the existing infrastructure (which has to be our priority). That’s not to say that such ideas aren’t allowed, or that they aren’t useful, just that the people with years of packaging experience, and the people maintaining the current systems, haven’t got the bandwidth to explore them, so someone else is going to have to do a lot of work. And that will include taking on board “have you thought of X” style comments, which are likely intended to be helpful, but which will come across as negative, or as outright rejection, simply from lack of time to frame more positively.

OK, I have now spent 45 minutes on this post, and I really need to do some other stuff which I’ve been putting off. Please excuse the fact that this isn’t very well written - I genuinely don’t have time to edit it any further.

Sorry I don’t have a better answer for you.

Thank you for taking the time to send your thoughtful response. I don’t find anything insufficient about this response. The only thing I’ll say in response is this. It’s clear that there is tons of work, that these are super hard problems, and the act of discussing the problems itself (including taking time to educate newcomers like me, like you’ve just done) is an additional burden of work. Acknowledging that burden, I’ll just make the statement (sort of a repeat of the sentiment of my last message that you already responded to) that, without such summaries of requirements, questions, acknowledged challenges etc. less “insider” users like myself can only look on and hope that the “insiders” (1) are actually making progress towards a shared understandings of things and (2) are actually moving towards identifying feasible and valuable incremental improvements that can be made.

pf_moore · February 21, 2023, 4:01pm

Glad it was of use.

To clarify one point:

By that, do you mean that you hope the current members of the packaging community will come up with such summaries? If so, I think you’ll be disappointed - more experience in packaging doesn’t make these questions clearer, it makes them harder. It’s the newcomers who think things are simple

Basically, my experience is that there is no behaviour of the current system that we could describe as “not needed” without someone screaming. Which is why the experienced users can’t define these requirements - we have too many battle scars. And without wishing to sound ungrateful to the people who created the survey, and the people who took the time to respond, the survey didn’t offer much help in that area either.

I’d love to see a set of requirements like you suggest. But I’m hoping someone with a fresh perspective will produce it (and do the research to make sure it’s reasonable, without the “everything must stay the same” biases of the “old hands”).

dstufft · February 21, 2023, 4:41pm

It’s actually kind of wild the sheer breadth of use cases that the current tooling is expected to handle, and I’m not sure that any one person even knows all of them. Trying to enumerate them all would be a pretty huge undertaking but if someone were to do that, would be pretty useful I think.

I’ve had a number of ideas for “simple” fixes to even small corners of packaging over time, and I think almost all of them, once I started to explore actually doing them, ended up producing several thousand words just to fully express the idea and factor in various workflows that people are using that seem generally reasonable.

I’m probably someone that gets viewed as “shutting down” conversation since I think I do often push back against “simple” ideas and try to expand into why those things aren’t workable without a lot more thought being put into them. Speaking for myself, I don’t actually mind that kind of discussion. I think it’s one of the best ways we have of doing knowledge transfer in this area right now, and if someone feels strongly enough about a suggestion, I hope they continue to push for it and try to come up with mitigations for the concerns, evidence that the concerns are overblown, or justifications that the change is worth the cost anyways.

I think this back and forth ultimately ends up making for better solutions to these problems as well. I think an interesting “demo” of this working in practice is these recent threads on dependency confusion. Where you can see “simple” ideas get thrown around, push back against them for not solving X, Y, Z use cases, and then refinement to the ideas until the idea got forged into something that was no longer quite as simple, but that actually handles most of the use cases. In participating in that discussion, I even personally ended up finding new use cases for our tooling that I didn’t even realize people were doing.

PythonCHB · February 22, 2023, 6:39am

Just a couple additional thoughts on “curation” – it is hard to define, and there are many, many levels of curation – but I think any curation is a big step from no curation.

PyPi has an enormous number of packages that are, to be blunt, worthless. Unmainted, version 0.0.0.0.1, name squatting, experiments, etc, etc. Even some malicious ones now and again.

Malicious ‘Lolip0p’ PyPi packages install info-stealing malware.

Most of this has no bad intent, and I think a lot is due to mistaken reading of the documentation on packaging – so many docs on how to make a package and put it on PyPi, that people do it because they think they should, or just to try it out.

Even a tiny bit of gatekeeping would help a lot.

NOTE: I talk a lot about conda-forge below – I’m not advocating that we should jsu tuse conda-forge, but it IS a good example that can be learned from,and it’s the one I know. And it works, by some definition of “works”

Well, not quite that simple – it builds, it works with conda-forge (i.e. all dependencies are available) and it meets certain basic standards of “quality” – includes license files, doesn’t vendor stuff it shouldn’t, etc.

But that sdist could be completely broken, and unusable – but other than quality and functionality, even a tiny barrier means folks ask themselves: “do I really need to do this?”

now that I think about it, most of the package son conda-forge are put up by someone other than the original author – which is curation in itself – at least one person finds this package useful.

I think that’s irrelevant – you are either using conda or you are not. I suppose the conda-forge channel is more like a wheelhouse than PyPi – but that may be a good thing, and another one to keep in mind – should the place to get source (sdists) be the same as the place to get ready to run packages? PyPi started before there were wheels, and before there were services like gitHub – maybe we on longer need a repository for sdists?

I think there are lessons to be learned from conda-forege there, too – the “build farm” is CI systems driven by gitHub. The actual serving up of the packages is anaconda.org – totally different systems.

And there are some efforts along those lines for wheels:

https://cibuildwheel.readthedocs.io/en/stable/cpp_standards/

There absolutely needs to be other sources of packages that are easy to access. Private package repositories, probably a “free-for-all” repository, etc. (conda allows arbitrary “channels”). HAving a curated repo should not restrict any less curated use. Don’t the Linux distros do this?

I don’t think it should – once a package is accepted into conda forge, further maintenance is done by the package maintainers, not the core team

Of course, yes. But like a wiki, curatation doesn’t have to be as careful as, e.g. commiting code to a project. You could have a lot of curators.

See above a “legacy” channel would be helpful here.

Big problem, yes. conda-forge couldn’t have happened without anaconda.org (or binstar, or whatever it was called before). And Anaconda.com is a very Python-focused, open-source focused small company – and there are folks that still don’t trust it.

I think this shows some of the benefit of the “distribution” maintainer not having to be the package author – if the code still works, and there is no one maintaining it, someone else can maintain the distribution – and if you can’t find anyone to do that, maybe it really isn’t important to have it.

OK – THAT was longer than I intended – whew!

steve.dower · February 22, 2023, 12:26pm

Just an additional data point no this: at $work, a package that hasn’t had a release in ~18 months is considered unmaintained and not suitable for us to rely on unless we have forked and built it ourselves. This is to ensure that if an issue does arise, we’re already in a position to patch and use the fixed version. There is definitely a business opportunity in offering, essentially, insurance to patch incredibly stable software as needed.

(Though anecdotally, when faced with this requirement, many teams decide that they don’t actually need the stable/unmaintained library that badly, and move onto something that is maintained. Generally this brings our teams into better alignment and they share more dependencies, so it’s arguably a bigger win than maintaining the old code would achieve.)

PythonCHB · February 23, 2023, 6:52am

Good point – there are downsides to it being very easy to add anew dependency – I find folks on my team jump oin things a bit too quickly – they find something that does something useful, it works right now for the use case at hand – now we have a new dependency. – oops! itls buggy, it’s not maintained, it’s not available for all platforms …

Anyone remember “left pad” all over.

I don’t think we should make it impossible for folks to use old no-longer-mained software, but we don’t need to make it easy, or critically, the default.

ntessore · February 23, 2023, 7:05am

Reading through this entire thread, somewhere halfway through it sounded as if the crux of the matter would be resolved if pip packages could declare runtime and build dependencies on the system (“you need libssl and a C compiler and Python with this ABI”) and conda, which would be a provider of such dependencies, could query pip package information. Am I getting that right?

PythonCHB · February 23, 2023, 7:14am

Hmm – I’m not sure it’s the only “crux of the matter”, but I do like the idea. I was just thinking about that today, when working with one of my complex packages:

I’m a heavy conda user, so my python libs have a conda_requirements.txt file sitting right there in the repo. If that was standardized in some way, then a conda package could be auto-built for any python package.

Though now that I say that out loud, maybe that’s a matter of having a conda recipe meta.yaml file in the repo

The real challenge with this is knowing what the package name is for, e.g. libssl – what namespace is the name in? PyPi defines the default namespace for pip, conda-forge has a namespace for conda-forge, but that’s only two specific systems.

Maybe not insurmountable, but a challenge.

BrenBarn · February 23, 2023, 8:39am

That is understandable and (like @jagerber) I appreciate your taking the time to lay out your position here (and similar thoughts throughout the various threads) despite the many other demands on your time. In a related vein:

Thanks for this perspective. This is the spirit of my comments as well, and I hope people in these threads take them in that light.

The main thing is that I’m with @jagerber in that I don’t think the problem will be solved by taking “what incremental improvements can be made right now” as a starting point. We can, as you said, think about what we want things to be like, and then back off from that until we reach a point closer to reality.

It may well be that in the end there is a series of incremental steps to get to the utopian future, but personally I do not have confidence that we’ll take the right incremental steps unless we consider them, not just relative to where we are now, but to where we want to be. And that requires actually envisioning where we want to be. If it’s not “measure twice, cut once” it’s just going to be death by a thousand cuts.

Moving to some more concrete matters. . .

I agree it is different from what Paul was suggesting, and I agree it is different from what is currently the default package installer that comes with Python. But, again, I’m trying to take a broader view here. I’m sorry if I made people think that I’m advocating for immediate replacement of pip with conda, but what I am trying to do is get people to at least consider the possibility that, in the future, the default package installer provided with Python might be something other than pip (at least as we know it now), and the default package repository from which that default installer installs stuff might be different from PyPI (at least as we know it now).

Quite frankly, I don’t see much point in any of this discussion if those possibilities aren’t at least on the table for some time in the non-immediate future. The whole problem here is that the existing toolchains are a pain for users. A lot of people want to throw out their existing toolchains!

Not everyone wants to build for pip/PyPI either! I think there is a large class of people who don’t care what they’re building “for” as long it is something that is reasonably painless to use and can reach a reasonably large audience. As @PythonCHB said:

I’ll refrain from reiterating his many other important points, but this one is just essential. The reason I keep talking about conda this and that is just to kind of bring into this discussion that the goal here (at least as I see it) is “distributing and installing Python packages”, not “using pip” or “using PyPI”. A significant component of the reason people use PyPI and pip has nothing to do with how they operate; it’s simply that they are described on python.org as the official solutions. So I come at this from the (perhaps excessively) optimistic position that if a tool works well, simply having it endorsed by Python/PyPA would ease the transition for many people. We still need to come up with that good tool, and yeah, we can’t pull the rug out from those using the old tools, but I suspect the proportion of people who would gleefully jump ship to a new system if it just worked better is larger than some might think.

To illustrate my perspective on this: A few weeks ago I was at a local Python meetup. Some of the people there are regular Python users, others are using other languages but starting to learn Python, some are hobbyists, some are professional developers, etc. A few people gave brief presentations about some Python projects. One guy talked about Django and in his intro he mentioned some pros and cons of Python.

Two cons he mentioned were lack of static typing (which is arguably a pro ) and performance (which in many cases is not a practical obstacle). The third con was packaging. When he mentioned this, the half-dozen or so experienced Python users in the audience all shook their heads ruefully, while the Python neophytes looked around nervously, as if wondering “uh oh, what am I getting into”.

As a longtime Python user, booster, teacher, etc., it pains me to see this kind of thing.

b11c · February 23, 2023, 10:47am

A thought that came to mind skimming through this thread (apologies if it was already mentioned) - I wonder if there be room for a commercial curated index? I.e. a company which maintains a clearly curated PyPI alternative, with categories etc, for a fee?

pf_moore · February 23, 2023, 11:03am

Quite possibly. But as it’s commercial, the question is going to be “would it pay for itself”? I don’t think open source volunteers can answer that. Ultimately, someone interested in creating such an offering would need to go to companies that use Python and say “would you be willing to pay us for something like this?”

What I will say is that the evidence I’ve seen is that very few companies are willing to pay when they can get something for free. Sorry if that is a rather cynical view, but it’s my experience (and it’s a general problem with the sustainability of open source, not specific to Python packaging). So the added value in terms of curation, etc, would have to be significant in order to attract enough customers to sustain a commercial offering.