How do we want to manage additions/removals to the stdlib?

pradyunsg · September 17, 2021, 9:25am

That’s an underscore-prefixed name for indicating that it’s not a supported thing.

It is not a good idea to do this, nor is it an example for how to do this “properly”. The fact that pip uses a vendored requests is an implementation detail that folks should NOT rely upon.

In other words, don’t do that.

(Yes, I know this wasn’t what @graingert is directly suggesting but it seems worth calling out explicitly)

pf_moore · September 17, 2021, 10:02am

Just a note, setuptools is not in this category. ensurepip is only intended to preinstall pip. The fact that it needs (needed) setuptools is just as much an implementation detail as pip’s vendored libraries. In fact, haven’t we switched pip to use PEP 517 (isolated) builds by default these days? Maybe we should bite the bullet and finally stop preinstalling setuptools, forcing users who still need to use the legacy path to install setuptools manually?

I’m not keen on extending ensurepip as you suggest, because (as the comment by @graingert implies) it’ll very quickly start triggering suggestions that we bundle even more libraries, and the cost of shutting down such requests will outweigh any benefit from adding an intermediate “installed by default” stage to the existing “develop library on PyPI, propose for inclusion in the stdlib” route.

pradyunsg · September 17, 2021, 12:57pm

Not yet, no.

Hmm… I disagree. While I do agree there would be more requests for adding packages to the “installed by default” set, a large part of this can be mitigated by having clear rules and process around it. I’m personally fine with a high bar for this, same as what we have for what goes into the standard library.

IMO having intermediate stages goes a long way to help ease migrations and reduce the amount of effort needed for actually completing each individual step is vastly more beneficial. It could also make managing provisional modules easier, although at the complexity of treating them like regular third-party packages and all the nuances around that apply.

It is also worthwhile to note, that we can also use this mechanism the other way – to remove packages from the current standard library, if we want to trim what is a part of the standard library. That also feeds in well with the idea of putting them in a graveyard or transferring them to different groups that are interested in actively maintaining themselves packages.

The thing that makes me a bit nervous though, is what downstream redistributors will do. We’re not in a good situation on that front today, with the variety of ways that redistributors modify (and break) Python’s existing standard library and ensurepip to comply with their policies – and I don’t trust that they’d end up in a good place if we go down this road unless we work closely with downstream distributors to nudge them to not provide a degraded experience to users.

pitrou · September 17, 2021, 1:09pm

Well… at least the fact there was no PEP discussed made me entirely unaware of the existence of this new module.

(I’m entirely fine with it, by the way. No discussion there.)

pf_moore · September 17, 2021, 1:13pm

Rats, I’m losing track of what state the various in-progress pieces of work are at. The ongoing “resolver is too slow” traffic is distracting me…

Regarding your other point, OK, I may be being pessimistic. Your point about redistributors is relevant, too. And more generally, the whole question of “what constitutes a Python installation” becomes blurred if people can upgrade, uninstall, or generally mess with, packages that other people consider to be “standard” (we already see this with cases where vendors split Python up into separately-installable pieces, such as venv).

Generally though, I think that details aside, we’re both broadly uncomfortable with changing ensurepip to ensurelibs and making it a general mechanism. Is that fair?

uranusjr · September 17, 2021, 2:00pm

Yeah it’s certainly not good if this triggers suggestions bundling all the things, but that’s pretty much what is already sort of going on with stdlib anyway. What I’m thinking is, instead of adding a new layer of criterion below the existing “this goes into stdlib”, this installed-by-default status should be this will be stdlib at some point, but we’re both allowing it to evolve more quickly, replacing the provisional status we had.

ambv · September 17, 2021, 2:38pm

I have a few thoughts on the matter. I think a “kernel Python” isn’t particularly feasible at this point:

python3.9 -IS imports 21 modules on startup, and if you add site.py so that you can actually import your pip-installed modules, that grows to 59;
a lot of the standard library is interdependent, keeping the entire body compatible while distributing it externally on PyPI would mean versioning all little libraries and recording compatibility: be too restrictive and people complain of version conflicts, be too lax and you get subtle bugs;
the existence of a rich standard library provides a form of builtin testing for the core interpreter; trimming the library without embracing “testing main with PyPI packages” will decrease our release quality;
“Promoting a library to PyPI” sounds to me like calling being fired “being promoted to a customer”; I don’t think it’s particularly useful to social-engineer core developers into being more open to removals with such clever language. Removals break compatibility. We should be responsible about that. The library being a pip download away still requires manual action from end developers.
“Removing from the standard library” definitely is not the same as “moving to PyPI”. You can’t count on the existing core Python team to perform this move, continue maintenance, keep CI for each of those projects up, maintain a compatibility matrix, set up an equivalent of the PSRT for each library, elect new release managers for each one, and so on, and so on. It is simply not feasible.

MicroPython proved that it’s perfectly feasible to call some runtime a Python even if it’s not a verbatim copy of CPython. I believe this “kernel Python” idea is interesting and should be pursued for many reasons (cloud computing, embedding for mobile and/or frontend Web, change velocity, and so on…). But I strongly believe this is not CPython’s job.

Moving back to Brett’s original questions, I have a lot of chaotic thoughts on the matter so let me just state my intuitions at the moment:

How to handle removals?

Long-term I think we should stop removing things, period. Deprecations should be deprecations, but they should no longer lead to removals, ever. This is because Python, as an interpreted language, provides a runtime platform for user code to run, which is a dependency tree where usage of deprecated features often happens pretty far from the end-user. Currently each Python release that removes deprecated features is a micro “Python 2-to-3 transition” and library authors have to keep this entire state in their heads and/or CI. And end users gotta hope that the library authors do a perfect job here.

Note that backwards incompatibilities will always be the bread and butter of coding in Python because often even bug fixes to core require changes to user code. But what I’m talking about is decreasing the surface of future incompatibilities.

The current policy described in PEP 387 allows us to mark a feature as deprecated and remove it two releases later. Traditionally we did that with the “PendingDeprecationWarning” → “DeprecationWarning” → removal dance. PEP 594 has a good list of candidates for removal. I think we should look into that PEP and in 3.11 remove the libraries that were already deprecated in past versions. 3.11 is already a micro-Python 4 in more ways than one.

And then stop removing stuff. “Stop breaking userspace”.

How to handle additions?

Don’t do it, as much as you can. Require PEPs for new modules, require BPO issues with discussion, as well as peer-approved PRs for function/class additions.

I mean, it would be convenient for a random Python user if every Python version shipped with NumPy, or Pillow, or requests, and so on. But we know very well it’s not something we want to do. Should we then rewrite Pillow and put a cleanroom implementation in 3.11? Also no.

pradyunsg · September 17, 2021, 4:32pm

I’m actually in favor of something like that!

FWIW, this exact approach (changing standard library stuff to install-by-default packages) and nuances around it were extensively discussed about an year ago, in:

Notably, Python wouldn’t be the first language to do something like this.

If Python started moving more code out of the stdlib and into PyPI packages, what technical mechanisms could packaging use to ease that transition?

I did some more reading on how Ruby approached this, and it turns out they actually have 4 different levels, not just 3:

Classic stdlib: code that ships as “part of” the interpreter, no package metadata

What they call “default gems”: treated like 3rd party packages that are installed by default – they have metadata, versions, can be upgraded – BUT there’s a flag set saying that you can’t remove them, and if you try to pip uninstall then it just errors out.

What they call “bundled gems”: ditto, but without the magic flag, so you can uninstall them if you want.

Actual 3rd party packages you get from their version of PyPI

I think there are some clear benefits to doing this – which all come down to the fact that we get a much more gradual removal story.

Maybe. To me, what we’re doing isn’t blurring the lines of what “Python installation” means, but changing what it means – though, that’s probably wordsmithing.

isidentical · September 17, 2021, 5:59pm

Require a PEP for new modules?

Definitely yes. A new standard library module is a big commitment, in terms of having one of the highest potentials of turning something more and more complicated on each release unlike a new class or a function which has more definitive boundaries.

Generally a simple discussion on the bugs.python.org with even one (but more commonly a few) core developers is enough to add a new public API to an existing module. A new library on the stdlib should draw it’s goals clearly and come with something very close to it’s final state. This would:

a) prevent people from adding something on every chance to ‘expand the module further’ (which might help them on the short term, but something that would be a burden for us for a very long time).
b) provide a unified API on every version, so downstream users shouldn’t worry about whether a method is available at the X version or not

Should all new modules be marked as provisional?

Even though it is reducing the burden for us, it is making usage of the certain modules very tricky for the end users (e.g importlib.metadata’s entry point selections) and introduce conditional code paths to use simple APIs. As @tiran commented out, it might be a wiser idea to just let the modules grow out on the PyPI until they reach a certain level of maturity which would also help with the answer of the first comment to better know the full scope of that library.

Anything else to do for removed modules?

No, the core team should not be responsible with those. If people want to maintain it, they can simply create a PyPI package with <orgname>-<module> and do so. I am not really sure if we should endorse a certain package over others.

brettcannon · September 17, 2021, 6:01pm

Even in major version bumps?

So basically the stdlib eventually becomes a graveyard of unmaintained stuff? Does that mean our distribution size could never shrink, but only grow due to the fact that there’s no way the stdlib won’t get additions going forward (unless we gut it)? What about if they break due to some bug fix or change in some other stdlib module? Do we have to still fix them to keep them working, which means they aren’t really deprecated/dead, just not getting anything new?

So what I’m taking from this is freeze the stdlib since adding something can never be taken away.

ambv · September 17, 2021, 9:04pm

Honestly, I think yes if we can help it. Sure, some things we will have to remove because it will be impossible or insecure to keep it running.

We’re not particularly bad about this even today so it wouldn’t be a huge change anyway but it would be a big change in optics: putting higher priority on being able to run old Python programs on newer and newer Pythons.

I mean, I can still run ant build and run my first Java programs from 20 years ago on my current macOS with no changes. And there’s OpenGL there, and music, and GUI, and so on. I can’t run my first Python programs the same way anymore.

“Graveyard” sounds kind of dramatic for what it would actually be: a base of things that still work the same way in 2039 as they did in 2022.

Shrinking the distribution shouldn’t be a goal in and of itself. If stuff breaks beyond all repair or is considered insecure enough that we need to remove it or alter it in incompatible ways, we do it. That’s the way it works today as well. But removing a thing just because it’s no longer the recommended way – that we could stop doing.

steven.daprano · September 19, 2021, 6:06am

Steve Dower said:

“”"
No more provisional modules at all. If it’s not ready, or we’re
not sure, host it on PyPI until we are sure.
“”"

and then:

“”"
The biggest problem with provisional modules is that libraries which
build on top of them now have to account for very messy changes over
time. Just ask anyone who tried to maintain an async library over
3.4-3.7.
“”"

And moving the unstable library to PyPI solves that problem does it?

Can you explain why you felt it was necessary to rely on a provisional
library with an unstable API? (I’m not saying you shouldn’t have, I just
don’t understand your reasoning.)

If that library was on PyPI, would you have made that same choice, and
if so, how would it your experience have been different?

“And if it turns out the module is doing just fine on PyPI without being
in the stdlib, great! We don’t have to merge it in.”

What’s the objective criteria for “doing just fine” on PyPI?

If the intention is to use that library in the stdlib, does that mean
that externally maintained third-party libraries will become a hard
dependency for the stdlib?

steven.daprano · September 19, 2021, 7:45am

Pradyun Gedam discussed the Ruby strategy:

“”"

Classic stdlib: code that ships as âpart ofâ the interpreter, no package metadata
What they call âdefault gemsâ: treated like 3rd party packages that are installed by default â they have metadata, versions, can be upgraded â BUT thereâs a flag set saying that you canât remove them, and if you try to pip uninstall then it just errors out.
What they call âbundled gemsâ: ditto, but without the magic flag, so you can uninstall them if you want.
Actual 3rd party packages you get from their version of PyPI

“”"

So if I understand correctly, Ruby has a model where there are:

modules that come packaged with the interpreter;
modules that come packaged with the interpreter, but they pretend it

is just like a third-party package, except that you can’t opt-out of

installing them, or remove it; the metadata is just dead weight;

modules that come packaged with the interpreter, and are pre-

installed, but you can uninstall them if for some reason you want to;

and actual third-party modules.

Is this correct?

And does that increased complexity reduce the level of maintenance for

the Ruby core devs, or improve the out of the box experience for Ruby

users?

If a core library in category 1 depends on another core library in

category 3, and the user removes the dependency, what happens?

Pradyun Gedam said:

"I think there are some clear benefits to doing this – which all come

down to the fact that we get a much more gradual removal story."

The benefits are not clear to me.

In what way is this a “much” more gradual removal? Please spell it out.

Perhaps you can walk us through a user-story. I’m a user of the

trunnion library, and it gets moved from category 1 to 2 to 3 and then

removed. What do I do at each step? Let’s assume that, like 90% of users

of the stdlib, I don’t read (carefully, or at all) the What’s New that

comes out each time I upgrade.

If I understand correctly, moving from category 1 to 2 doesn’t effect

users at all. The library is still installed. And likewise from 2 to 3.

It would make no difference to me at all, unless I explicitly go looking

I wouldn’t even know what category the library is in. It is still going

to come as a nasty shock when it gets removed and is suddenly not

installed any more, breaking my code.

pradyunsg · September 19, 2021, 8:58am

FYI @steven.daprano: your posts are not properly formatted for presentation on Discourse. They get rendered as Markdown blocks and the way you’ve quoted folks + wrapped your comments make it very difficult to read and respond to, on the main Discourse web UI.

I’d really appreciate it if you could please investigate improving the formatting of your responses and, if you feel comfortable using the web UI, edit them to fix the formatting.

(I’ll respond to your comments separately)

smontanaro · September 19, 2021, 12:40pm

I’m not sure putting a module in PyPI (along with some corresponding GitHub or GitLab repo) really needs an official maintainer. It might not even need to continue to function. It serves, if nothing else, as an easy-to-find copy of the source. Maybe PyPI could grow an “unmaintained” user that owns such stuff. That would at least identify it as available to be picked up by anyone interested.

pradyunsg · September 19, 2021, 12:51pm

Yes, because the library can then evolve independently of the CPython’s release cycle, and downstream packages that depend on it while also setting compatibility constraints (like you can with Python packages).

Are provisional modules to be treated as “never rely on this” by users? If so, then why do they deserve the privileged position of being in the standard library, if it’s not usable / reliable?

The rhetorical-sounding-but-actually-very-serious questions aside – Provisional modules have been fairly fundamental to Python’s approach for introducing “new” features lately. They provide functionality or benefits that aren’t possible without using them (that’s kinda the reason we built them after all!), so there’s usually a good reason to use them. Off the top of my head, typing and asyncio are both examples of this. You can write working code without them, but they’re tackling a problem that’s been deemed worth tackling. And, by being in the standard library + CPython, they serve as a “blessed solution” for the problem in the Python ecosystem. All related projects in the ecosystem work toward building upon or being compatible with that solution. This means they have to adopt that solution, even while the module providing that solution is provisional.

Most library authors maintain compatibility across various versions of an evolving module that they cannot control, because that’s actionable on their end. Pushing back on CPython’s policy for what goes into the standard library – and what the promises the core developers make around them – is not something most users will do; even if it directly affects them.

(aside) We do have a mechanism for solving these sorts of compatibility issues in the ecosystem – by declaring compatibility constraints for what you depend on, and using the the packaging tooling to handle those constraints.

The broad strokes, yes – because it’s mostly rephrasing what I’d quoted from @njs, but you’ve got some of the important nuances wrong IMO.

By definition, it is not a third-party package – it’s a first-party package provided by the same folks as your interpreter. Yes, I know you said “like”, because it has metadata to be more visible to the packaging tooling – but I want to note this explicitly.
No, the metadata is also not dead weight – it can be used by downstream projects to constrain against, which means they can properly declare their requirements/assumptions about which variants of that package they can work with. You don’t need to declare “needs Python >= 3.7.6” and can use something like “needs dataclasses > X.Y”. This is useful as you migrate to a lower stack, when that package can change versions. This also means that failures don’t happen at runtime but when you’re installing packages – earlier in the process where it’s easier to “catch”.
The exercise of packaging up the standard library module also makes it much easier to migrate to pulling it outside of the standard library, and making it usable once it is outside of the standard library via PyPI. A third-party package can say it depends on stdlib_module>1.7 and, if it’s in the standard library today, it’ll work. Once stdlib_module moves to lower stages (3 / 4) that third party package will still work, and once in stage 4, the labor of maintaining that package becomes external to the CPython core devs.

See the last bullet above.

With the current model of provisional modules, honestly, it is a PITA for package maintainer – you have to maintain compatibility with an evolving (standard library) module, that makes no promises on backwards compatibility; and yet their users will expect things to work well. Yes, there’s an expectation managment thing here regarding support of provisional modules throughout the ecosystem. That said, for “foundational” bits of the ecosystem maintained outside of the standard library – like pip, urllib3, numpy etc – they usually don’t have much choice in this area and have to be as broadly compatible as they can reasonably be.

The package maintainers don’t directly control the CPython interpreter that the code is run on, and currently, they don’t have any “clear” mechanism to declare what “version” of the standard library module they’re compatible with – beyond discovering weird breakages when the standard library module is modified in a patch version (usually, becaue a user reports it to them – or because they proactively look for how things change) and restricting it with Python-Requires.

You’re correct about most of this, but you’re also looking at the wrong user persona.

You’re looking at the user who installs the Python interpretter and runs code on it – I’mma call them the “end-user”. The fact of the matter is that almost no end-users just use the Python interpreter on its own – they also use the ecosystem of third-party packages built on it. The audience that benefits the most from having this staged removal of standard library modules is NOT the end-users. Rather, it’s the authors of those third-party packages. I gave an example of how earlier in this post.

@ambv: Would it be fair to say that you want to never break currently-working Python code, at the cost of keeping code in the standard library that no one has the intention of triaging issue tracker tickets for, reviewing PRs for, or generally doing any sort of maintainance on? And that you feel this additional maintainance cost is worth the stability benefits afforded by it to Python developers?

Most Python projects tend to depend a lot more on third-party libraries that not maintained by CPython developers. There’s a very decent chance that currently-working Python code stops working because a third-party library no longer supports it. Does that not dilute the value of any such stability guarentees that CPython would provide? If you feel that it’s still worthwhile despite that, could you elaborate a bit more on that?

brettcannon · September 20, 2021, 11:55pm

PyPI already has that via Search results · PyPI, but I bet no one pays attention to it.

Because some people don’t notice the fact that a module is provisional, while others feel like a module has been provisional too long or has become critical for certain things. See Submission for SC consideration: do all standard library modules abide by standard library policies · Issue #68 · python/steering-council · GitHub as an example where expectations didn’t align between folks for a provisional module.

pdxjohnny · September 22, 2021, 4:21pm

asyncio has keywords that are fundamentally linked to it. How would a developer make use of those keywords without installing additional packages if asyncio were removed?

pdxjohnny · September 22, 2021, 4:48pm

With regards to PEP 387. Might we provide a specific discussion thread for users to comment in for each deprecation warning introduced?

I have noticed a very nonchalant attitude about removal of stdlib packages lately in Discussions. As was pointed out, most users don’t read the “What’s New” section of each release. I would go so far as to say most users don’t understand the open source development process Python goes through. I would also say that most don’t understand why things get deprecated.

Many of them would probably jump at the opportunity to give feedback on their usage and why they don’t want something removed, but I would imagine few of them know where to begin to give that feedback.

These same users (or their organization) may also be in a position to direct funding towards CPython maintenance. If the Python Software Foundation made it more straightforward for users to provide feedback and made it easy for them to provide funding (some kind of issue bounty system?) then end users could avoid potentially expensive efforts within their organization to migrate way from deprecated functionality.

brettcannon · September 22, 2021, 11:18pm

I purposefully don’t want to get into a discussion about what should be removed from the stdlib or how much should be there as that’s not the point of this topic. I would prefer to stay focused on the mechanisms used when a module is proposed to be added or removed.

So is your suggestion then to require a public discussion somewhere for any module proposed for deprecation?