Sometimes people suggest splitting out parts of the current Python stdlib into independent packages. This could have a number of technical advantages (e.g. being able to ship fixes quicker, creating a supported way for users to slim down their environments by removing unused packages… this is important for containers and mobile apps). But it could also be disruptive for users. So it’s controversial.
For example: imagine that multiprocessing
gets split out into a standalone package. Whenever a new version of Python is released, we bundle in the latest multiprocessing
wheel, and install it by default in the environment. But after that it’s treated like any other third-party package, and can be upgraded, downgraded, or even removed, independently of the interpreter itself. (This is exactly how pip works right now, via the ensurepip
mechanism. There’s also prior art in Ruby.)
This is an extremely hypothetical discussion. Currently there are absolutely no plans to move multiprocessing
out of the stdlib. But, if we did, how could we mitigate the problems this would cause for users?
I see two big ways where this might be disruptive and packaging could help:
Problem 1: if there’s code that assumes multiprocessing
is always available, then it will mostly work (because multiprocessing
is installed by default), but it will break when run in a “slimmed down” environment. For interactive use, I think this isn’t a big deal – if someone does pip uninstall multiprocessing
and then tries to import multiprocessing
and gets an error, they can just pip install multiprocessing
. But if someone is trying to automatically generate a slimmed-down container or redistributable app, then they need some automated way to figure out which packages to include and which to leave out. Normally we do this via package requires metadata. So, we need some way to move from a world where no-one declares their dependencies on multiprocessing
, to a world where packages that use multiprocessing
declare that in their metadata.
The good news is this doesn’t have to be perfect – if 99% of packages have the right metadata, then the rest can be fixed by hand based on user feedback. But if 99% of packages have the wrong metadata, then it’ll be a mess.
One idea: setuptools
can probably detect with 99% accuracy whether a package uses multiprocessing
, just by scanning all the .py
files its bundling up to see if any say import multiprocessing
. So maybe setuptools
would start doing that by default? Plus there’d be some way to explicitly disable the checking for devs who want to have full manual control over their dependencies, maybe a shared library to implement the checking so flit and setuptools can share the code, …
I think that would get us pretty far. It takes care of future uploads, and existing sdists on PyPI (assuming you use a new setuptools when you build them). The main flaw I see is that it doesn’t help with existing wheels on PyPI. Maybe PyPI could also do a scan of existing wheels (using that shared library mentioned above), to add in extra requirements for formerly-stdlib packages? Or maybe pip should do that when downloading pre-built wheels? I guess this would argue for having a bit of metadata in the wheel to say explicitly “The creator of this wheel was aware that multiprocessing
is no longer part of the stdlib, and took that into account when creating its requires
metadata”, so that pip/PyPI would know whether to apply a heuristic or not.
Problem 2: Right now, removing things from the stdlib is complicated and super controversial (see the discussions about PEP 594). Some of this is inherent in the problem, but it would help if we had a smoother path for doing this. Having the infrastructure to bundle wheels with Python would already help with this. For example, if the Python core devs wanted to stop maintaining nntplib
, they could do a multistep process:
- Convert
nntplib
from a stdlib library → bundled wheel, so it’s still available by default but can be upgraded/removed independently - Use the tools described above to fix up metadata in any packages that require
nntplib
- Finally, flip the switch so that it’s no longer installed by default, and users have to either declare a dependency or manually
pip install nntplib
In a perfect world, I think there would be a step 2.5, where we start issuing deprecation warnings for anyone who imports nntplib
but hasn’t declared a dependency. (Since these are the cases that will break in step 3.) This suggests we might want some metadata in nntplib.dist-info
to track whether it’s only installed because it’s bundled by default, or whether it’s been pulled in by a manual install or other dependencies? And then when nntplib
is imported, issue a DeprecationWarning
iff it’s only installed because of being bundled.
If anyone else is willing to join me in this thought experiment: what other problems do you see? And what do you think of the solutions sketched above?