Actually, I think that’s 6 in addition to the 26 core members.
But it’s a totally different system – the actual packages are served up by anaconda.org, which I’m pretty sure is run by anaconda.com. So a commercial company is actually doing teh equivalent of running PyPy.
conda-forge is a system for building packages – and the “curators” are spending most of their time helping folks get their recipes in order, not “curating” per se.
Disclaimer: I’m a newcomer here trying to learn the ropes but am interested in trying to contribute. Take what I say with a grain of salt.
I fully understand the reaction against “burn the system” approaches. It is clear from experience and your nice summary why such approaches will fail in python. But when reading through these threads, when new ideas are shot down with “well that might be a good idea but it would be very challenging to build to given the current state of tools” or “that’s a good idea but we don’t have the resources for it” it’s very frustrating, especially in a “vision” conversation like this one. I would prefer to first have a free conversation about what the ideal tools or workflows would look like that is totally divorced from existing tools. Then once one or more distinct requirements/desiderata lists are agreed upon we could then assess each one for what it’s workload or feasibility would be. Likely more satisfying lists of requirements would be a heavier workload. But I think it would be nicer to assess those things once more possibilities are laid out in summary form.
Again for me, as an outsider looking in, it’s hard to tell why some ideas are discussed endlessly and some ideas are shot down with a “we don’t have the resources to implement that”. I’d rather see complete, digested, lists of ideas that can be compared and contrasted with each other wrt satisfactoryness and resources etc. all at once. For example, maybe the only set of ideas that is easy enough to implement in 2 years is a marginal enough improvement that it’s not even worth spending time on and it would be better to work on a set of ideas that will take 5 years to implement but will really be worth it in the end. Something like that…
Wanting one or more freely generated list of desiderata/requirements for later vetting is not the same as wanting to burn the system.
edit: For example, one of the requirements would be “At no point should the ~500k existing packages become inaccessible because of a change made on the python packing end without N years of notice” (where N could be infinity…)
I have a really hard time responding here, because at some level you are absolutely right. The problem is one of resources, though, but maybe not in the way that it seems at first.
The people involved in packaging are typically overwhelmed with “stuff that people want us to do”. And one of the things people want us to do is discuss stuff. Personally, I spend the vast majority of my packaging time these days just participating in discussions - I very rarely get time to write actual code. This is an example of that - I think your post deserves a response, so I’m spending some of my time trying to put something together that I hope will be helpful.
But I can’t respond to everything, so I have to prioritise. In doing so, I’m certain I come across as “shooting down” ideas. I hope I don’t say precisely “we don’t have the resources to do X”, but I definitely say “that’s hard, by all means look into it and come back if you can get it to work”. That’s shorthand for “we’ve thought about this and can’t work out how to do it” - but the latter invites follow-up questions like “can you point me at previous discussions?” And again, I simply don’t have the time to go and find links for that person.
To give a specific example, the following came to mind for me when looking at the recent discussion on a “curated PyPI” (something which isn’t even that radical an idea, TBH):
What counts as “curation”? I genuinely don’t know, but it looks like conda-forge treats it as “we can get it to build”. OK, so in one sense, everything on PyPI is curated to that extent, because you have to build at least a sdist to upload it.
In the conda-forge case, it’s a bit more, though - they build using a different process, so there’s a check there. But not everyone wants to build for conda (that’s a whole different debate) so we can’t mandate that “builds under conda” is a check that everything must pass. So once again, what is a tool-independent definition of what is a sufficient check here?
We could say “must build into a usable wheel”. There’s a long-running intent to add some sort of build farm to PyPI, but it’s really hard, because existing build systems (notably setuptools) can run arbitrary code when building - so sandboxing, protection against malware, etc, need to be considered. And in any case, what counts as “usable”? A Linux user can’t use a Windows wheel.
There’s discussion about coverage - we can talk forever about what proportion of the packages on PyPI are actually useful, but any curation proposal needs to have policies about what gets priority when the curators have too much to do. So how do we decide that?
And what’s the fallback when the curation isn’t enough and someone wants a package that hasn’t yet been dealt with (or maybe a critical bugfix of something that is included, if curation means approving every new release)?
Saying that project developers can do their own curation makes no sense, because that’s what PyPI is. There will always be fewer curators than project authors. So we have to look at bottlenecks, whether we want to or not.
Transition questions arise as well. Your edit mentioned not losing access to anything - that’s exactly the sort of question that someone needs to answer - how do we maintain access to the packages people are using, when we don’t know what those are for certain? (Download stats may help here, but mirrors confuse things).
Any alternative to PyPI needs hosting - the hosting costs for PyPI would be huge if not provided by sponsors. So how do we host a “PyPI next generation” while it’s being developed without omitting big chunks of stuff? Commercial support might be an answer, but will people accept a new PyPI hosted by (say) Microsoft?
Do we need all the history on PyPI? The 10-year old versions of packages that release weekly? How do we know? Who decides? How do we set a policy? If we say “nothing more than 5 years old”, how does that work for a critical, but stable, project that hasn’t needed a release in 5 years?
That was just a very quick brain dump, nothing more than thoughts I have on one current question. And it took me 20-30 minutes to write. And frankly, I’d be surprised if it’s of much use to anyone (it’s entirely negative, in the sense that it’s purely a list of potential problems - sorry to the people wanting to discuss curated repositories, I genuinely don’t want to stop that discussion happening, if it’s useful!)
If I said “we don’t have the resources to implement curation”, would that be shutting down the discussion? On the other hand, if all of the “insiders” simply ignored someone new coming in and saying “why not have a team reviewing package submissions”, would that be shutting down the suggestion?
I think the reality is that most of the “insiders” (I don’t like that term, but I understand that for a newcomer it feels like that) simply no longer have the time to engage in discussion on the more radical ideas. We’ve come to the conclusion, from bitter experience, that incremental improvement of what we have, while not ideal, is the only option that we can manage while supporting the existing infrastructure (which has to be our priority). That’s not to say that such ideas aren’t allowed, or that they aren’t useful, just that the people with years of packaging experience, and the people maintaining the current systems, haven’t got the bandwidth to explore them, so someone else is going to have to do a lot of work. And that will include taking on board “have you thought of X” style comments, which are likely intended to be helpful, but which will come across as negative, or as outright rejection, simply from lack of time to frame more positively.
OK, I have now spent 45 minutes on this post, and I really need to do some other stuff which I’ve been putting off. Please excuse the fact that this isn’t very well written - I genuinely don’t have time to edit it any further.
Thank you for taking the time to send your thoughtful response. I don’t find anything insufficient about this response. The only thing I’ll say in response is this. It’s clear that there is tons of work, that these are super hard problems, and the act of discussing the problems itself (including taking time to educate newcomers like me, like you’ve just done) is an additional burden of work. Acknowledging that burden, I’ll just make the statement (sort of a repeat of the sentiment of my last message that you already responded to) that, without such summaries of requirements, questions, acknowledged challenges etc. less “insider” users like myself can only look on and hope that the “insiders” (1) are actually making progress towards a shared understandings of things and (2) are actually moving towards identifying feasible and valuable incremental improvements that can be made.
By that, do you mean that you hope the current members of the packaging community will come up with such summaries? If so, I think you’ll be disappointed - more experience in packaging doesn’t make these questions clearer, it makes them harder. It’s the newcomers who think things are simple
Basically, my experience is that there is no behaviour of the current system that we could describe as “not needed” without someone screaming. Which is why the experienced users can’t define these requirements - we have too many battle scars. And without wishing to sound ungrateful to the people who created the survey, and the people who took the time to respond, the survey didn’t offer much help in that area either.
I’d love to see a set of requirements like you suggest. But I’m hoping someone with a fresh perspective will produce it (and do the research to make sure it’s reasonable, without the “everything must stay the same” biases of the “old hands”).
It’s actually kind of wild the sheer breadth of use cases that the current tooling is expected to handle, and I’m not sure that any one person even knows all of them. Trying to enumerate them all would be a pretty huge undertaking but if someone were to do that, would be pretty useful I think.
I’ve had a number of ideas for “simple” fixes to even small corners of packaging over time, and I think almost all of them, once I started to explore actually doing them, ended up producing several thousand words just to fully express the idea and factor in various workflows that people are using that seem generally reasonable.
I’m probably someone that gets viewed as “shutting down” conversation since I think I do often push back against “simple” ideas and try to expand into why those things aren’t workable without a lot more thought being put into them. Speaking for myself, I don’t actually mind that kind of discussion. I think it’s one of the best ways we have of doing knowledge transfer in this area right now, and if someone feels strongly enough about a suggestion, I hope they continue to push for it and try to come up with mitigations for the concerns, evidence that the concerns are overblown, or justifications that the change is worth the cost anyways.
I think this back and forth ultimately ends up making for better solutions to these problems as well. I think an interesting “demo” of this working in practice is these recent threads on dependencyconfusion. Where you can see “simple” ideas get thrown around, push back against them for not solving X, Y, Z use cases, and then refinement to the ideas until the idea got forged into something that was no longer quite as simple, but that actually handles most of the use cases. In participating in that discussion, I even personally ended up finding new use cases for our tooling that I didn’t even realize people were doing.
Most of this has no bad intent, and I think a lot is due to mistaken reading of the documentation on packaging – so many docs on how to make a package and put it on PyPi, that people do it because they think they should, or just to try it out.
Even a tiny bit of gatekeeping would help a lot.
NOTE: I talk a lot about conda-forge below – I’m not advocating that we should jsu tuse conda-forge, but it IS a good example that can be learned from,and it’s the one I know. And it works, by some definition of “works”
Well, not quite that simple – it builds, it works with conda-forge (i.e. all dependencies are available) and it meets certain basic standards of “quality” – includes license files, doesn’t vendor stuff it shouldn’t, etc.
But that sdist could be completely broken, and unusable – but other than quality and functionality, even a tiny barrier means folks ask themselves: “do I really need to do this?”
now that I think about it, most of the package son conda-forge are put up by someone other than the original author – which is curation in itself – at least one person finds this package useful.
I think that’s irrelevant – you are either using conda or you are not. I suppose the conda-forge channel is more like a wheelhouse than PyPi – but that may be a good thing, and another one to keep in mind – should the place to get source (sdists) be the same as the place to get ready to run packages? PyPi started before there were wheels, and before there were services like gitHub – maybe we on longer need a repository for sdists?
I think there are lessons to be learned from conda-forege there, too – the “build farm” is CI systems driven by gitHub. The actual serving up of the packages is anaconda.org – totally different systems.
And there are some efforts along those lines for wheels:
There absolutely needs to be other sources of packages that are easy to access. Private package repositories, probably a “free-for-all” repository, etc. (conda allows arbitrary “channels”). HAving a curated repo should not restrict any less curated use. Don’t the Linux distros do this?
I don’t think it should – once a package is accepted into conda forge, further maintenance is done by the package maintainers, not the core team
Of course, yes. But like a wiki, curatation doesn’t have to be as careful as, e.g. commiting code to a project. You could have a lot of curators.
See above a “legacy” channel would be helpful here.
Big problem, yes. conda-forge couldn’t have happened without anaconda.org (or binstar, or whatever it was called before). And Anaconda.com is a very Python-focused, open-source focused small company – and there are folks that still don’t trust it.
I think this shows some of the benefit of the “distribution” maintainer not having to be the package author – if the code still works, and there is no one maintaining it, someone else can maintain the distribution – and if you can’t find anyone to do that, maybe it really isn’t important to have it.
Just an additional data point no this: at $work, a package that hasn’t had a release in ~18 months is considered unmaintained and not suitable for us to rely on unless we have forked and built it ourselves. This is to ensure that if an issue does arise, we’re already in a position to patch and use the fixed version. There is definitely a business opportunity in offering, essentially, insurance to patch incredibly stable software as needed.
(Though anecdotally, when faced with this requirement, many teams decide that they don’t actually need the stable/unmaintained library that badly, and move onto something that is maintained. Generally this brings our teams into better alignment and they share more dependencies, so it’s arguably a bigger win than maintaining the old code would achieve.)
Good point – there are downsides to it being very easy to add anew dependency – I find folks on my team jump oin things a bit too quickly – they find something that does something useful, it works right now for the use case at hand – now we have a new dependency. – oops! itls buggy, it’s not maintained, it’s not available for all platforms …
Anyone remember “left pad” all over.
I don’t think we should make it impossible for folks to use old no-longer-mained software, but we don’t need to make it easy, or critically, the default.
Reading through this entire thread, somewhere halfway through it sounded as if the crux of the matter would be resolved if pip packages could declare runtime and build dependencies on the system (“you need libssl and a C compiler and Python with this ABI”) and conda, which would be a provider of such dependencies, could query pip package information. Am I getting that right?
Hmm – I’m not sure it’s the only “crux of the matter”, but I do like the idea. I was just thinking about that today, when working with one of my complex packages:
I’m a heavy conda user, so my python libs have a conda_requirements.txt file sitting right there in the repo. If that was standardized in some way, then a conda package could be auto-built for any python package.
Though now that I say that out loud, maybe that’s a matter of having a conda recipe meta.yaml file in the repo
The real challenge with this is knowing what the package name is for, e.g. libssl – what namespace is the name in? PyPi defines the default namespace for pip, conda-forge has a namespace for conda-forge, but that’s only two specific systems.
That is understandable and (like @jagerber) I appreciate your taking the time to lay out your position here (and similar thoughts throughout the various threads) despite the many other demands on your time. In a related vein:
Thanks for this perspective. This is the spirit of my comments as well, and I hope people in these threads take them in that light.
The main thing is that I’m with @jagerber in that I don’t think the problem will be solved by taking “what incremental improvements can be made right now” as a starting point. We can, as you said, think about what we want things to be like, and then back off from that until we reach a point closer to reality.
It may well be that in the end there is a series of incremental steps to get to the utopian future, but personally I do not have confidence that we’ll take the right incremental steps unless we consider them, not just relative to where we are now, but to where we want to be. And that requires actually envisioning where we want to be. If it’s not “measure twice, cut once” it’s just going to be death by a thousand cuts.
Moving to some more concrete matters. . .
I agree it is different from what Paul was suggesting, and I agree it is different from what is currently the default package installer that comes with Python. But, again, I’m trying to take a broader view here. I’m sorry if I made people think that I’m advocating for immediate replacement of pip with conda, but what I am trying to do is get people to at least consider the possibility that, in the future, the default package installer provided with Python might be something other than pip (at least as we know it now), and the default package repository from which that default installer installs stuff might be different from PyPI (at least as we know it now).
Quite frankly, I don’t see much point in any of this discussion if those possibilities aren’t at least on the table for some time in the non-immediate future. The whole problem here is that the existing toolchains are a pain for users. A lot of people want to throw out their existing toolchains!
Not everyone wants to build for pip/PyPI either! I think there is a large class of people who don’t care what they’re building “for” as long it is something that is reasonably painless to use and can reach a reasonably large audience. As @PythonCHB said:
I’ll refrain from reiterating his many other important points, but this one is just essential. The reason I keep talking about conda this and that is just to kind of bring into this discussion that the goal here (at least as I see it) is “distributing and installing Python packages”, not “using pip” or “using PyPI”. A significant component of the reason people use PyPI and pip has nothing to do with how they operate; it’s simply that they are described on python.org as the official solutions. So I come at this from the (perhaps excessively) optimistic position that if a tool works well, simply having it endorsed by Python/PyPA would ease the transition for many people. We still need to come up with that good tool, and yeah, we can’t pull the rug out from those using the old tools, but I suspect the proportion of people who would gleefully jump ship to a new system if it just worked better is larger than some might think.
To illustrate my perspective on this: A few weeks ago I was at a local Python meetup. Some of the people there are regular Python users, others are using other languages but starting to learn Python, some are hobbyists, some are professional developers, etc. A few people gave brief presentations about some Python projects. One guy talked about Django and in his intro he mentioned some pros and cons of Python.
Two cons he mentioned were lack of static typing (which is arguably a pro ) and performance (which in many cases is not a practical obstacle). The third con was packaging. When he mentioned this, the half-dozen or so experienced Python users in the audience all shook their heads ruefully, while the Python neophytes looked around nervously, as if wondering “uh oh, what am I getting into”.
As a longtime Python user, booster, teacher, etc., it pains me to see this kind of thing.
A thought that came to mind skimming through this thread (apologies if it was already mentioned) - I wonder if there be room for a commercial curated index? I.e. a company which maintains a clearly curated PyPI alternative, with categories etc, for a fee?
Quite possibly. But as it’s commercial, the question is going to be “would it pay for itself”? I don’t think open source volunteers can answer that. Ultimately, someone interested in creating such an offering would need to go to companies that use Python and say “would you be willing to pay us for something like this?”
What I will say is that the evidence I’ve seen is that very few companies are willing to pay when they can get something for free. Sorry if that is a rather cynical view, but it’s my experience (and it’s a general problem with the sustainability of open source, not specific to Python packaging). So the added value in terms of curation, etc, would have to be significant in order to attract enough customers to sustain a commercial offering.
Not quite. There are multiple independent, or loosely related, big picture topics discussed in this thread. System dependencies is one of those. UX for packaging tools and workflows folks need is another one. Those don’t overlap too much.
very few companies are willing to pay when they can get something for free
I’m not sure if this is as definite as we often think; also, while they can get the packages themselves for free, this would offer extra value in form of code reviews, security checks, built-in dependency auidits and the like.
But you’re right, there certainly is a lot of risk and uknowns involved, which can only be resolved by someone giving it a try…
There are many such offerings, but they’re not limited to PyPI packages and sdists/wheels. You might have heard of Anaconda or commercial Linux distros ;)
You usually get some kind of long-term maintenance guarantees for old versions, and vendor-specific compatibility/integration patches (so customers can pay for fixes that aren’t relevant upstream), so these need provide more than just a subset of PyPI.
By coincidence, on a minor “ideas” thread, I just made a comment very similar to this quote of yours:
By the very nature of things, the folks that tend to be most actively involved in open source development are those folks working on open source applications and shared abstraction layers.
The folks writing ad hoc scripts or designing educational exercises for their students often won’t even think of themselves as software developers - they’re teachers, system administrators, data analysts, quants, epidemiologists, physicists, biologists, business analysts, market researchers, animators, graphical designers, etc.
[mine was far less articulate]
We do need to keep that in mind – a huge portion of Python users do not have the same needs and experiences as most of the folks involved in these sorts of discussions.