Sometimes people suggest splitting out parts of the current Python stdlib into independent packages. This could have a number of technical advantages (e.g. being able to ship fixes quicker, creating a supported way for users to slim down their environments by removing unused packages… this is important for containers and mobile apps). But it could also be disruptive for users. So it’s controversial.
For example: imagine that
multiprocessing gets split out into a standalone package. Whenever a new version of Python is released, we bundle in the latest
multiprocessing wheel, and install it by default in the environment. But after that it’s treated like any other third-party package, and can be upgraded, downgraded, or even removed, independently of the interpreter itself. (This is exactly how pip works right now, via the
ensurepip mechanism. There’s also prior art in Ruby.)
This is an extremely hypothetical discussion. Currently there are absolutely no plans to move
multiprocessing out of the stdlib. But, if we did, how could we mitigate the problems this would cause for users?
I see two big ways where this might be disruptive and packaging could help:
Problem 1: if there’s code that assumes
multiprocessing is always available, then it will mostly work (because
multiprocessing is installed by default), but it will break when run in a “slimmed down” environment. For interactive use, I think this isn’t a big deal – if someone does
pip uninstall multiprocessing and then tries to
import multiprocessing and gets an error, they can just
pip install multiprocessing. But if someone is trying to automatically generate a slimmed-down container or redistributable app, then they need some automated way to figure out which packages to include and which to leave out. Normally we do this via package requires metadata. So, we need some way to move from a world where no-one declares their dependencies on
multiprocessing, to a world where packages that use
multiprocessing declare that in their metadata.
The good news is this doesn’t have to be perfect – if 99% of packages have the right metadata, then the rest can be fixed by hand based on user feedback. But if 99% of packages have the wrong metadata, then it’ll be a mess.
setuptools can probably detect with 99% accuracy whether a package uses
multiprocessing, just by scanning all the
.py files its bundling up to see if any say
import multiprocessing. So maybe
setuptools would start doing that by default? Plus there’d be some way to explicitly disable the checking for devs who want to have full manual control over their dependencies, maybe a shared library to implement the checking so flit and setuptools can share the code, …
I think that would get us pretty far. It takes care of future uploads, and existing sdists on PyPI (assuming you use a new setuptools when you build them). The main flaw I see is that it doesn’t help with existing wheels on PyPI. Maybe PyPI could also do a scan of existing wheels (using that shared library mentioned above), to add in extra requirements for formerly-stdlib packages? Or maybe pip should do that when downloading pre-built wheels? I guess this would argue for having a bit of metadata in the wheel to say explicitly “The creator of this wheel was aware that
multiprocessing is no longer part of the stdlib, and took that into account when creating its
requires metadata”, so that pip/PyPI would know whether to apply a heuristic or not.
Problem 2: Right now, removing things from the stdlib is complicated and super controversial (see the discussions about PEP 594). Some of this is inherent in the problem, but it would help if we had a smoother path for doing this. Having the infrastructure to bundle wheels with Python would already help with this. For example, if the Python core devs wanted to stop maintaining
nntplib, they could do a multistep process:
nntplibfrom a stdlib library → bundled wheel, so it’s still available by default but can be upgraded/removed independently
- Use the tools described above to fix up metadata in any packages that require
- Finally, flip the switch so that it’s no longer installed by default, and users have to either declare a dependency or manually
pip install nntplib
In a perfect world, I think there would be a step 2.5, where we start issuing deprecation warnings for anyone who imports
nntplib but hasn’t declared a dependency. (Since these are the cases that will break in step 3.) This suggests we might want some metadata in
nntplib.dist-info to track whether it’s only installed because it’s bundled by default, or whether it’s been pulled in by a manual install or other dependencies? And then when
nntplib is imported, issue a
DeprecationWarning iff it’s only installed because of being bundled.
If anyone else is willing to join me in this thought experiment: what other problems do you see? And what do you think of the solutions sketched above?