How do we want to manage additions/removals to the stdlib?

brettcannon · September 16, 2021, 8:11pm

What sort of requirements do we want to place on ourselves for adding modules to the stdlib as well as removing them (this is not about what to do with the stdlib in terms of what kinds of modules should be there, etc.)? Clarifying our views on this affects multiple PEPs such as 4, 411, and 594 (and the latter is the reason I’m asking now so that I can help @tiran get that PEP completed and hopefully accepted).

Additions

Do we want to require a PEP? graphlib went in with a discussion on bugs.python.org but no PEP, but adding the idea of adding a frozen map to collections has an (open) PEP all on its own.

Should all new modules be marked as provisional?

How long can a module be marked provisional? Is there a limit?

Removals

PEP 387 guarantees modules get a 2 release deprecation (unless the SC okays it being shorter). Is there anything else to do there?

My personal opinion

Require a PEP for new modules? Yes, because the stdlib is important enough to require a PEPs’ level of exposure and discussion.
Should all new modules be marked as provisional? Yes, as even for modules that have existed outside of the stdlib will still end up with way more usage and exposure once in the stdlib itself.
Is there a limit to being a provisional module? Yes, two releases. One for the initial release, and one more to make sure the feedback used to make initial changes are the right thing. Plus it gives slower adopters more time to get exposed to a module.
Anything else to do for removed modules? No, because we are dropping them for a reason: they aren’t being maintained and/or used enough anymore. Some have suggested copying the modules over to some “dead modules” repo and packaging them up, but that will still require maintenance as that packaging will eventually grow stale (packaging standards are still being created), and the original code as shipped in the stdlib is still available in CPython’s repo history. Plus people can always copy the source into their own project to vendor it or start a fork.

I would also argue that provisional modules should raise a ProvisionalWarning so that people know they are using something that is subject to change.

graingert · September 16, 2021, 8:21pm

Removal is quite a strong word, perhaps “promotion to pypi” would be a better term, for example, I’d prefer if asyncio were promoted to pypi

brettcannon · September 16, 2021, 8:26pm

From the perspective of the stdlib it isn’t as that is what’s happening.

Ah, but that’s assuming that’s what occurs to everything we remove, so that’s a position you are taking. Otherwise whether a specific module ends up on PyPI is a separate concern per module based on what people choose to do.

That’s a very specific case, so that can be considered an example of what one could choose to do for a specific module, but unless you’re explicitly advocating this for every module we remove then it seems out of scope for this general discussion.

graingert · September 16, 2021, 8:33pm

but unless you’re explicitly advocating this for every module we remove then it seems out of scope for this general discussion.

Yes this is what I’m advocating, in fact I’m explicitly advocating for the “kernel Python”, remove everything that isn’t needed for running get-pip: https://bootstrap.pypa.io/get-pip.py

Specially asyncio is a dependency that would benefit most from this promotion, currently I’m unable to run any asyncio code without either DeprecationWarnings or race conditions

See also Python Software Foundation News: Amber Brown: Batteries Included, But They're Leaking

brettcannon · September 16, 2021, 8:36pm

What to do with the stdlib is a separate discussion. In this discussion assume a stdlib exists and so we are talking about how to manage what goes in/out, not what sorts of modules should be in the stdlib (that’s going to be a separate discussion I start after this one as this discussion will influence what to do with any modules that may get slated for removal based on that future discussion).

steve.dower · September 16, 2021, 8:48pm

Yes. No question here. Ideally it would be (co-)authored by the initial (core developer) maintainer, but at least it should specify “codeowners” (or your choice of term here).

No. No more provisional modules at all. If it’s not ready, or we’re not sure, host it on PyPI until we are sure.

If the module requires some special interpreter hooks, I don’t have a problem with adding provisional private APIs specifically for one that we’re considering for inclusion (e.g. subinterpreters) or having PyPI-hosted packages that are tied to specific CPython versions. Just because it’s on PyPI doesn’t mean it has to be “pure”.

The biggest problem with provisional modules is that libraries which build on top of them now have to account for very messy changes over time. Just ask anyone who tried to maintain an async library over 3.4-3.7.

And if it turns out the module is doing just fine on PyPI without being in the stdlib, great! We don’t have to merge it in.

N/A.

No, same reasoning as Brett.

graingert · September 16, 2021, 8:59pm

“slated” is an even stronger word! Promotion to pypi is a honour and privilege!

tim.one · September 16, 2021, 9:10pm

IMO, adding a new module is one of the most minor things that can be done. It’s not like strings for naming new modules are a scarce resource .

For example, it’s long overdue for adding an imath module, to stop the ever-bloating conceptual confusion of adding integer functions to the math module, which was intended to be a relatively thin Python layer over the platform C’s libm facilities on native C doubles. Things like math.comb() and math.factorial() just don’t belong in math. imath would be an obvious home for them. For related “reasons”, math.gcd() was originally added to the fractions module instead - where it also didn’t belong. Another that would be perfectly natural in imath. Instead gcd() was first added to fractions, then later also added to math, and then “deprecated” in fractions, and finally removed from fractions. That’s the kind of silly thrashing you get when putting a new facility in an existing module where it never belonged to begin with.

I suppose the good news is that fractions itself wasn’t stuffed into math to avoid “adding a new module”.

I didn’t think more than twice about approving adding graphlib. The sole class it provides now - TopologicalSorter - is a best-of-breed implementation to address one of the FAQiest of FAQs among people looking for basic algorithmic building blocks. It was originally stuffed into the collections module, to avoid kneejerk objections to “adding a new module”. But it didn’t make a lick of sense to put it there, no more than it would, say, to add matrix multiplication to collections. It didn’t belong in any existing module. So we made up a new module name.

What are the potential downsides to adding a new module? “Slim to none” that I can see. As is, the new graphlib is a file of Python code that doesn’t cost anyone anything except:

a smidgen of disk space, unless they go out of their way to import it
a line in the stdlib doc’s table of contents
“head arguments” trying to cast its existence as a big deal instead of the triviality it is

smontanaro · September 16, 2021, 9:14pm

“Promotion to PyPI” might help soften the blow for those people whose instincts are to not remove anything from the stdlib (“it will break my app” or “it will make installation harder for my users”). This harks back to “batteries included,” which is a bit less important now that PyPI and associated packaging tools exist.

In some sense, promoting a module to PyPI would be a firm of eating your own dog food.

Sorry, enough walking down the Python cliche memory lane.

brettcannon · September 16, 2021, 9:16pm

Maintenance. Every line we keep has some cost. Add on people requesting new features, etc. and even something that is seemingly bug-free still has overhead.

brettcannon · September 16, 2021, 9:18pm

Sure, but who is going to keep that package functioning? If we had this policy in the Python 2 days would we be expected to update all of those removed modules for Python 3 all the way back to the beginning of Python? What about changes to packaging (e.g. PEP 621)?

pf_moore · September 16, 2021, 9:22pm

Those are downsides of adding new functionality, not of adding a new module.

On the other hand, adding a new module does have one downside - the name will shadow any 3rd party module of that name (because the stdlib takes priority over site-packages).

I agree with Tim’s point that adding new functionality to the “wrong place” in an effort to avoid adding a new module to the stdlib is potentially more disruptive in the long run than just adding a suitable module to house that functionality. So I don’t think we should add modules lightly, but conversely I don’t think we should make “adding a module” the key factor in whether new functionality needs a PEP.

tim.one · September 16, 2021, 9:28pm

But that has nothing specific to do with adding a new module - there are potential costs for any addition of any kind whatsoever. My claim was “adding a new module is one of the most minor things that can be done”. Are you going to, e.g., also suggest that a PEP be required for every new function added to an existing module? If not, then I’m lost as to why “adding a new module” is a bigger deal than “adding a new function”. I just don’t see any sense in which it is.

I already gave concrete examples of cases where not adding a new module for new functions created other kinds of problems (like the conceptual mess math has become, and the multi-release-cycle pointless dance the gcd() function went through).

BTW, I saw a post today suggesting we generalize math.factorial() to accept float arguments. Why? Well, it’s the math module, and int->int functions just don’t make conceptual sense there. If it had been in an imath module instead, I doubt we’d have seen that random suggestion. So putting a function in a module it doesn’t belong in also incurs its own kinds of costs.

tim.one · September 16, 2021, 10:13pm

Thanks for pointing that out! That is a real cost, and more serious than that adding a new function/class/name to an existing module M can create nasty surprises in the presence of from M import *.

brettcannon · September 16, 2021, 11:32pm

You are both right, and I was making too broad of an assumption that “new module” == “new functionality”.

Yes, I think the stdlib is a very important namespace at the top-level.

There is also the possibility of saying, “any new PEP needs SC approval” due to the namespace/shadowing concern, but the requirement of a PEP is more aligned with the size of functionality being added and thus isn’t necessarily needed (e.g. graphlib).

tim.one · September 16, 2021, 11:33pm

My mistake! It was originally stuffed into the functools module - which, if anything, was even more senseless than stuffing it into collections .

tim.one · September 16, 2021, 11:59pm

I certainly don’t object to reviews. In the case of graphlib, despite that it contains only one relatively small class, the name alone all but solicits a universe of additional graph functions and classes.

Which was fine by me! Graphs are ubiquitous in non-numeric programming, and Python’s stdlib has somehow gotten away with ignoring that area of CompSci.

But I’m overwhelmingly pragmatic about such things, and would restrict additions to the clearest and most universally applicable graph algorithms (like topsort). No interest at all, on my side, in competing with, e.g., NetworkX’s expansive functionality.

“Batteries included - but not nuclear reactors” .

That’s open to debate, though. So probably should have been debated.

uranusjr · September 17, 2021, 7:44am

On the provisional topic, I think it’s been mentioned before, but perhaps an intermediate state of “not in stdlib but installed by default” (a la setuptools and pip) is still useful. This means putting the project on PyPI but we generalise ensurepip to be able to populate other wheels out of the box, but those projects themselves still can have a different release schedule and can be upgraded by users separately from normal Python core backward compatibility policies, hence privisional. This also kind of ties back to the “what to do with stdlib” thing.

graingert · September 17, 2021, 7:57am

This means putting the project on PyPI but we generalise ensurepip to be able to populate other wheels out of the box, but those projects themselves still can have a different release schedule and can be upgraded by users separately from normal Python core backward compatibility policies, hence privisional.

It’s actually (although rarely) quite handy to have a bunch of deps shipped with python, and technically this is already the case, on a few occasions I’ve used:

>>> from pip._vendor import requests
>>> requests.get("https://example.com")
<Response [200]>

because I couldn’t get pip install requests approved

tiran · September 17, 2021, 8:46am

+1

I’d like to amend the removed modules policy: In case a well-established group of trustworthy people (say Jazzband) volunteers to take over maintenance of a removed stdlib module, then we might officially endorse the fork and hand the PyPI package name to the group.