Meta: how we evaluate / reach consensus on / approve ideas as a community

kknechtel · May 1, 2024, 12:09am

I wanted to pull this out to a separate thread because it’s off the topic of Hugo’s actual idea.

Continuing the discussion from Command-line interface for the random module:

Isn’t the usual answer - which I’ve received myself and seen given to many ideas I thought were excellent - “make a package on PyPI and see if it gets popular”? Reusing the name random would be tricky/error-prone for such a tool, but otherwise I see nothing about this idea that couldn’t be implemented externally.

Even things that would involve C extensions often get this treatment - as long as the implementation wouldn’t involve recompiling CPython itself. (For that matter: one of my own ideas involved changing a str method, and I recall seeing at least one other idea that would presumably work that way. Sure, I can define my own subclass, but that doesn’t affect string literals - unless I use a gc hack that I wouldn’t have known to be possible without that discussion. But I still got this stock response.)

Now, I don’t happen to think this is a particularly good answer - considering that a large fraction of ideas come from people who don’t already have any “presence” and would struggle to raise any awareness of their candidate implementations. But I’m a little confused that the current idea seems to be getting treated much differently from usual. (Is this a privilege of already being a core developer?)

But on the other hand, maybe getting functionality added to the standard library isn’t so valuable any more. Some have described the “included batteries” as “leaking” for years now (and this is only someone who was able to give a talk at the Python Language Summit - the underlying idea is surely much older). We’re finally removing some, rather conservatively (the newest thing being removed in 3.13 was added in 2006). Meanwhile, a lot of the standard library has fallen out of popular use in favour of third-party alternatives and wrappers (things like requests and click).

While I don’t mean to advocate for removing anything else specific, I generally wonder if it wouldn’t be better to commit to a general continued slimming down of the standard library. Doing so would reduce the “surface area” for more proposals to change or add things, and encourage more people to provide the functionality as third-party instead.

But it would be nice to have more ways to popularize such utilities when they come from people without an established reputation (or better yet, the backing of a large organization). And it would be nice to have a good place to propose ideas specifically for third-party packages, so that time spent considering them (which seems like it often greatly outweighs any conceivable future “maintenance” burden) isn’t spent by people who are distracted from something more important who will likely reject the idea anyway. (Sorry if I made the problem worse here!)

jamestwebber · May 1, 2024, 12:41am

I’m worried you’re going to fall into another “this thread is about too many things” hole

I think this question is tied to the state of packaging, actually. A simple packaging story makes it easier to move stuff from the stdlib to PyPI. If packaging is painful, included batteries are more valuable.

In the beginning, packaging was non-existent and having lots of batteries was super important. In some beautiful future, installing packages is trivial and a good Python implementation just needs to run code and install stuff.

Right now we’re somewhere in the middle, as packaging is evolving and the stdlib is shrinking, but both processes are happening slowly^[1]. And it’s different for different users–I’m always using third-party packages, so I’m generally on the side of “remove the extra stuff”–I’ll be setting up an env no matter what, and I like to customize.

For other people, installing packages is relatively tricky due to system or environment constraints. I wonder if this group is over-represented among core developers because there’s an overlap in the skillsets involved.

and carefully, as they should ↩︎

kknechtel · May 1, 2024, 12:58am

I really wish I knew how to pull it apart any better than this.

I don’t suppose you have specific examples off-hand of such constraints? (I know building wheels can be difficult, especially on Windows; but a lot of stuff being removed, or discussed as removal candidates, is pure Python - or at least, Python-only wrappers for things that aren’t getting removed, e.g. uu leveraging binascii.)

jamestwebber · May 1, 2024, 1:06am

I was vague in part because it’s not my experience but I was trying to cover that perspective. My impression from other discussions was that security policies can make it difficult to install from PyPI. And people who are bootstrapping a system or using Python as a utility scripting language (e.g. as part of an OS) might want to stick with the stdlib.

barry-scott · May 1, 2024, 6:53am

In the linux OS case many pypi packages are available as OS packages.
Which makes running python coded tools more secure than installing from pypi over the internet.

hugovk · May 1, 2024, 7:52am

The PEP 594 acceptance recommended regular reviews:

There was another non-PEP 594 module removed in 3.13 (lib2to3). I don’t see any other modules pending removal, but many other things are deprecated and set for future removal.

pf_moore · May 1, 2024, 9:12am

It depends on the cost/benefit proposal. Personally, I tend to respond like that when the proposal lacks use cases beyond the fact that the proposer finds the functionality useful and “thinks it would be worth having in the stdlib”. Proposing a PyPI package forces the user to answer the question “what benefits do stdlib inclusion in particular bring?”

I’ve already stated my objections to this idea on a number of occasions. Please let’s not have another “should we slim down the stdlib” debate…

I can’t comment on the question of over-representation, but in my experience the following issues are the most important:

Environment policies or constraints. This might be “the IT department won’t allow unapproved libraries”, or “my Python distribution doesn’t ship this package and I don’t want to mix package managers”, or “I don’t want to have to do the job of a systems integrator ensuring that this package works with the others I am using”. There are many variations on this point in practice.
The user (quite reasonably) doesn’t want to dump stuff randomly in their system Python. But virtual environments, neat as they are as a development tool, are an utterly horrible end user mechanism. Adding a PyPI dependency requires taking on the commitment of managing a virtual environment for what may well have been a small, throwaway script (and we all know that today’s throwaway script is tomorrow’s production application, so now you’re making a major infrastructure decision )
Distribution difficulties. Packaging and distributing libraries is well-supported in Python, but distributing applications is still very hit and miss. Dependencies make distribution harder. PEP 723 helps for simple scripts, but tooling to run PEP 723 enabled scripts is still not ubiquitous. Pipx is getting there, but it’s still a lot to expect pipx to be present on an arbitrary machine with Python installed.

These do overlap somewhat, but they are very often in my experience factors which make “use a library from PyPI” a lot less attractive than the “slim down the stdlib” advocates assume.

The above is, of course, coloured by my experience. For many years I worked in an IT consultancy where Python wasn’t an “official” tool, but it was used for automation, scripting, and data management. Many of the people who used such scripts (and a lot who wrote them!) were technically proficient, but when it came to running and installing software, anything that went beyond “install this package that has a standard OS installer”, or “copy this executable or script onto your PC and run it”, was a significant deployment problem - often enough to result in the software getting written in another language, or the process remaining manual rather than automated.

Without the “batteries included” stdlib, I’m not sure I’d even be a Python user^[1], much less a core dev and pip maintainer. Even as a pip maintainer, I still regularly find cases where I avoid PyPI in favour of stdlib-only solutions. I wish I had a better answer, but I find that the Python tool development community has a blind spot when it comes to that class of use cases.

Anyway, I said I didn’t want to rehash the “remove stuff from the stdlib” debate, and I don’t have the energy to argue the application deployment issue again, so I’ll leave it at that point.

I switched from Perl to Python because of the comprehensive stdlib. If I hadn’t made that switch, I might easily have found Java as my first “language with a large standard library” and got sucked into that ecosystem ↩︎

EpicWink · May 1, 2024, 9:22am

running natively in serverless (deployment size and boot time are important)
distributing scripts to users who can’t reasonably be expected to install pipx
creating a Docker image to be distributed to clients with an unknown security and intellectual property policy
creating tiny modules for use client-side via WebAssembly

Many of these use cases really only require argument parsing, HTTPS requesting, binary data handling, JSON/Zip/etc file handling, parallelism, and a few other odds and ends.

monk-time · May 1, 2024, 10:59am

Has there been a single improvement to stdlib that went through this route in the last, say, ten years? A suggestion is made, gets referred to PyPI, the author implements it, the package gets popular and then it’s merged into stdlib? The only case that comes to my mind is trio/curio and their influence on asyncio, but even that is not a clean example as asyncio was a quickly evolving new addition that was already part of stdlib, and trio/curio IIUC weren’t merged in completely.

This answer has always struck me as a bit misleading. Even if a library did prove itself to be stdlib-worthy as a popular standalone lib, it’s practically guaranteed that the next reaction to a request to add it to stdlib would not be “Sure, you’ve passed the PyPI test, now let’s merge you in” but rather “It would be better for you to remain on PyPI because you can evolve and react to user feedback much more rapidly there; after all, stdlib is where modules go to die”.

In addition to that, requiring a legitimately good idea to succeed first on PyPI means demanding that it also overcomes people’s reluctance to add an extra dependency, which might be related to a myriad packaging-related reasons as well as a general culture in Python to prefer large meaningful dependencies over small nice-to-haves (even if the latter is universally agreed to be a good addition). This isn’t Node/NPM after all.

(I want to clarify here that I’m not arguing here in favor of opening the floodgates and adding stuff to stdlib willy-nilly, only about the value of this “pass the trial by PyPI first” argument. It’d be much better in my opinion to use other more substantial arguments against new ideas than sending people on a path that never leads to the implied outcome of getting added to the stdlib.)

sinoroc · May 1, 2024, 12:06pm

I think dataclasses (came originally from the attrs 3rd party library). If I am not mistaken, a lot of the typing stuff is done as 3rd party first and maybe also some of the importlib.metadata and importlib.resources changes (or is it just there for backport reasons?).

If I recall correctly PEP 582 (__pypackages__) was ultimately supposed to be added to Python itself (or at least parts of it), and so it was tried as 3rd party first.

encukou · May 1, 2024, 12:17pm

tomllib, importlib.resources, various stuff from typing_extensions, zoneinfo, 3.9 compileall improvements, … and many more.
It’s true that some of these are by core devs and perhaps didn’t need to become popular, but do serve as a proof of concept and/or a backport library. (The PyPI package is not a trial, it’s useful on its own.)
If you count third-party libraries in general, iOS and Android support come to mind.
If you count reimplementations of a concept, there are things like exceptions groups (from Trio) and dataclasses (from attrs).

Yes, the “trial by PyPI” is not sufficient. If you succeed there, you usually find that you don’t need to add anything to stdlib.

If you want to build something, you don’t need to ask for permission first. Your work will valuable even if it’s not integrated into Python itself.

jamestwebber · May 1, 2024, 2:31pm

Points 2 and 3 get back to the packaging issue that I mentioned earlier–if virtual envs are a bad user experience, that’s a big problem that needs to be fixed!

I probably made a similar choice. But if you or I were choosing a scripting language now, the packaging ecosystem would be a bigger factor in the decision than it was back then. PyPI has evolved a lot over the past decade(s), plus the rise of GitHub and other code-sharing sites that make it much easier to install third-party code^[1].

the first Python project I worked on was hosted on sourceforge and used SVN until I moved it to Mercurial…things are better now ↩︎

pf_moore · May 1, 2024, 4:00pm

It can be taken as misleading, yes. It’s not often stated like this, without context, though. The reality is that in order to be added to the stdlib, proposals need to justify being added to the stdlib - and most proposals only manage to come up with justifications for being publicly available code, maintained by someone who has a commitment to the code. When pressed as to why the stdlib rather than PyPI, there’s typically no reason. (In fact, the proposer rarely even puts the package onto PyPI, suggesting that the actual reason is “because I was hoping someone else would create and maintain my idea for me” - but I may be being excessively cynical in thinking like that… )

kknechtel · May 1, 2024, 6:24pm

Thanks; you explained it far better than I could manage in the OP.

kknechtel · May 1, 2024, 7:24pm

First off: in a couple previous idea threads I’ve been annoyed at this hurdle because the idea was fundamentally not intended as a stdlib addition, but an enhancement for a built-in type. These can be subtyped, sure, but that doesn’t affect literal values without an obscure GC hack (and I’m not sure it’s intentional that it even works).

But the main point here: there seems to be a conflict here, and I think what’s missing is a clear sense of what makes a project stdlib-worthy. Which is to say: if I have an idea and I want to justify adding it to stdlib, how am I supposed to go about that? Putting an implementation on PyPI only seems to strengthen the argument that it could remain on PyPI. I don’t know which batteries you (or anyone else I might be trying to convince) found most useful; I don’t know what you find useful now; I don’t know why you wouldn’t agree that those things would have been better off third-party.

The only concrete argument I’m getting out of this is “the stdlib should contain the kinds of things that are needed by a consensus of people who work in environments that prevent/hinder/limit access to PyPI”. But how is anyone outside of such an environment supposed to have any intuition for that?

In my view: not cynical, but not necessarily reasonable either. Plenty of good ideas will come from people who lack the time, patience and/or know-how to implement them. For example, it doesn’t take familiarity with the CPython code base to come up with “let’s have error messages underline the whole erroring expression with ^ instead of just pointing at the start of it”, but it certainly does to implement it. Or for an older example, “maybe a try: that’s missing any corresponding except: or else: should explicitly describe this problem in the resulting SyntaxError, and maybe it should consistently do that instead of producing an IndentationError in more complex cases”.

I suspect that people also get discouraged because they read between the lines and see this as a rejection. I know anything I put on GitHub isn’t going to get thousands (or even tens) of stars overnight, and so does everyone else (except the few for whom it isn’t true).

But even if we only consider “library” features, it seems really awkward to implement things as a PyPI distribution when the proposed change isn’t a whole new package (or even a single module), but, say, an enhancement to the interface of an existing single function in the standard library. Not to mention, for a transparent experience, that code is going to have to either monkey-patch, duplicate or wrap the functionality it’s trying to enhance. In some cases (cough, random) that could be rather involved.

tjreedy · May 1, 2024, 7:56pm

On removals: The only module removals I see in the 3.13 list are lib2to3 and tkinter.tix. Both are somewhat special cases. I see none scheduled in the future or even deprecated. The current defacto consensus (middle position) seems to be to more pretty slowly in both removals and additions.

I agree that ‘put it on pypi’ should not be presented as a route to the stdlib but as an alternative route to public availability.

hugovk · May 1, 2024, 8:02pm

The 19 remaining “dead batteries” from PEP 594 are also removed in 3.13.

Rosuav · May 1, 2024, 8:52pm

While that’s true in theory, the trouble is that someone who doesn’t know the codebase has no idea how possible something is. How hard is it to implement that underlining? Well, it requires knowing the beginning and end of an expression. That actually isn’t very common among language parsers (and I don’t think it was the case in Python before that feature was implemented).

Ideas are worth approximately nothing. Implementation is worth everything.

dg-pb · May 1, 2024, 9:40pm

There was a touch on this in the e-mail group a while ago.

My personal insight from that was that with enough effort it is possible to devise a set of quantifiable dimensions on which the proposal can be evaluated.

I am still in progress on devising a similar thing for my own needs, but given the development stage which I am in there wasn’t a big need for this yet.

However, I think python community could potentially benefit from devising something along these lines.

E.g.:
STEP 1. Evaluation if the idea is desirable.

Necessity
a) Use-cases found by regexp searches in stdlib
b) Use-cases found by regexp in external popular libraries
c) Actual use case and manually collected examples from which the proposal stems
Poll of keenness of python community
a) poll results of python core-devs
b) poll results of general python community

STEP 2. Evaluation for best route for achieving desired result. In other words, comparison of different alternatives on different dimensions:

Implementation efficiency
a) Memory usage
b) CPU usage
Readability
Brevity and elegance of syntax which will be used to do desired thing

Weigh these by deemed importance which aligns with long-term objectives of python community and improve the process along the way.

This probably wouldn’t impact much of what core-devs are doing regardless, but could open up a portal for ideas that get unnoticed, forgotten or face unfair shutdown, due to various reasons. To name a few:

Headspace of people making decisions is not friendly to ideas that are not in line with how they see things at the moment.
Person that proposes idea gives up quickly due to uncertainty of how to proceed / be heard.
Unwillingness by community to endeavour more seriously with implementing their idea in cpython because there is uncertainty of how decision of whether to merge it in will be made. Thus, risk/reward ratio is unknown, which is uncertainty on top of uncertainty.

If there was a good framework for this, I think advantages would outweigh disadvantages. It could significantly increase willingness to contribute and if the process is robust enough to not let bad ideas get through, then bad ones will just not get merged. But exploring bad ideas could be as important as working on a good ones and I find that currently there is little motivation and even less encouragement to do so.

If there was a clear signal from python team:

Ok, interesting idea. We think it isn’t easily achievable / worth it, but if you want to voluntarily work on it - go ahead. If yes, then:
Prove that it is useful and needed. These are the things and tools to do so. If check then:
Go ahead and explore implementation. You will be judged according to these dimensions and we will try to destroy your idea in these ways. If check:
Ok, good work. Write a PEP. Standard procedure follows

So the benefit and time spent by someone would be much greater than supervision needed. And the risk of something going wrong would be minimised as the process improves with experience.

Maybe you would need to shut down the idea even if it checked all the points perfectly. Then it would uncover new dimensions that need to be included, etc…

pf_moore · May 1, 2024, 9:48pm

That’s fair. I don’t think there is an obvious way to say if a proposal is worthy of inclusion in the stdlib. But it does need a champion, and if they aren’t willing or able to respond to the question “why not just put it on PyPI?” they almost certainly won’t be able to take their idea through to implementation.

So who is going to implement them? Someone’s going to have to. If it’s not the original proposer, then that person will need to find someone to do the work - or at the very least, engage people in a sufficiently interesting and motivating discussion to encourage someone else to do it.

IMO, a lot of people proposing ideas don’t actually have any real sense of what is involved in adding a new feature to Python - whether it’s a language feature, a library module, or anything else. That’s fine, not everyone needs to be involved to that level. But conversely, if you don’t know what’s involved, you should probably be willing to accept the judgement of people who do have that knowledge - or at the very least be willing to take their concerns seriously. That’s not to say that we should tolerate gatekeeping - ideas have a right to be heard regardless of who they come from - but it’s basic politeness to accept that people who have got experience implementing features in Python will have a better understanding of what’s involved than you do, and to respect that experience.

Well, to some extent it is. It’s a rejection of the unjustified claim that “this should go into the stdlib” at least. If the proposer had successfully made the case that the idea needed to be in the stdlib in order to be effective, the suggestion to start by releasing on PyPI would clearly not be reasonable. And equally, it’s only a rejection to the extent that the proposer isn’t willing to challenge it - and such a challenge would involve clarifying why the feature needs to be in the stdlib, which was the point.

But equally, no one person has the say over what ends up in the stdlib. The process is to get a core dev to support you, write a PEP, address any objections, and get approval from the SC. Some random person like me on the ideas list can’t stop you doing that. But if you’re discouraged by that random person not liking your idea, you’re never going to make it through the actual process.