PEP 594: Removing dead batteries from the standard library

I don’t doubt that the approach is bad, but it does illustrate the wider problem - yes these wrong solutions should go away, the question is how to make sure the right solutions are then readily available so that people have a pleasant experience with the proposed changes. If the experience is: find SO solution => solution is depreciated and gone => can’t find new solution => brick wall, there will be tears before bedtime :wink:

P.S/Edit: further example of the documentation problem: some docs suggest the solution is to use the ‘python-libuser’ package, but trying to search for it is a maze of depreciated pages and dead links. A good example of how a seemingly simple problem (‘create a handful of linux users using a python script’) becomes a time sink when code and documentation don’t line up.

A tender way I think can be like this:

  • First, move those deprecated batteries from standard library to PyPI
  • And for different OS:
    • Windows installer: after new official Python distribution installed, install them from PyPI
    • Unix-like packaging: work with package maintainers to provide similar installation
  • Now we can update our official documentation to notify people to use PyPI instead of standard library
  • Wait for months and get the download stats of those batteries
  • Add back those still popular batteries to standard library without argument
  • Or keep them all in PyPI
  • Remove the extra installation of those unused batteries from Windows installer and Unix-like packaging
  • And remove the documentation of them from official documentation

Sounds like spending a long time :cold_sweat:

This kind of tracks with my thinking as well: I think it would assuage a lot of concerns if all of the packages to be removed were added to PyPI by the core team and given at least one “final” release, representing its final state as it was in the stdlib, with wheels built for at least those platforms that Python still supports at all (e.g. Windows, manylinux for binary wheels). That would come with absolutely zero promise of future maintenance, but it would be an easy way to get those packages (ok I just have to pip install audioop now) and would be an easy way to turn over maintenance of the package if someone were willing to step up to the plate for it.

I think this could mostly be scripted as well.

PyPI download stats are known to not be a good measurement of use due to the fact that one overly anxious CI can download a package a lot.

You mentioned audioop, so let’s use that an example. Let’s say we do wheels targetting Python 3.10 when the PEP suggests to remove audioop (which still requires us to set up the infrastructure for at least each extension module to do this appropriately, so this isn’t free-as-in-time). Those wheels are then unusable 18 months later when Python 3.11 comes out for people relying on those PyPI packages because we wouldn’t be building anymore wheels since we are not supporting those packages anymore. So all we have done is bought people one more release of use which would have been equivalent to simply postponing removal by one more release. And you might argue targeting the stable ABI maybe, but that assumes audioop already supports the stable ABI. And others I’m sure will step in saying “but the module I want to keep is pure Python”, but that’s not going to make everyone else happy who want an extension module maintained and the Python code isn’t version-specific in some way. :wink: Plus that Python wheel will only work for Python 3, so if we ever do a Python 4 that wheel won’t carry over unless we decide Python 3 wheels will work in Python 4 (and who knows if PyPA will do that or not).

Or what if there turns out to be a security vulnerability in audioop (which if you browse through the module’s history isn’t out of the question)? It’s written in C and 1927 lines long so there’s the possibility of a CVE coming up after the Python 3.9 drops out of security fix support. Now what? The responsible thing is to remove the package, but that does require watching for CVEs involving these packages indefinitely which, once again, isn’t a free-as-in-time thing to do.

We could say that as loudly as possible, but there will bound to be someone who doesn’t read that notice and then comes asking for support, so it still doesn’t absolve us of future work.

I personally think the event-stream incident showed you have to be very careful when handing over the keys to a package’s PyPI entry (i.e. if it isn’t a core dev we probably wouldn’t want to risk it and we already don’t use the same package name for backports of modules from the stdlib so I don’t’ see why we would make these modules a special exception).

I understand where people’s request to do one last release on PyPI after each of these modules are removed comes from, but hopefully I’ve pointed out in multiple ways how this isn’t quite as straightforward as one might think.

1 Like

Well said, and I agree with all your points. Thank you for expanding on this.

I thought it might at least buy anyone who cares a bit of transitional aid, so that they would have at least a Python version on which to move toward a new workflow that involves retrieving that package from PyPI. Of course, they also already have the option of using the last Python release (say, 3.9) that included the package, but my point was to provide a transitional aid and show a way forward for anyone still needing to maintain that package for future Python versions. That said, as you point out this has its own problems and is probably not worth the effort.

I think the proposed legacylib repository is a reasonable middle ground.

I would love to hear how that repo would be beneficial to you if you don’t mind sharing? In my head it doesn’t get anybody much because all you need to do is go back to the CPython 3.9 branch in git, find the module, and then copy it. I’m not quite sure how putting it into a separate repository actually benefits people in terms of access and such, so I personally would like to hear about potential use-cases.

In fact, I’m a little worried about that repo going stale. Since we will have to support these modules for bugfixes for 18 months after they are removed – due to supporting e.g. Python 3.9 if they are removed in Python 3.10 – and then another 3.5 years for security fixes, the code in the CPython 3.9 branch will be the most up-to-date version from the Python core team (unless we are extra diligent in copying code over ASAP to the legacylib repo, but then if we’re just blindly copying what’s the benefit?). Now if people fork the code and start maintaining their own versions then that’s great and they will have their own versions, but once again that makes the proposed legacylib repo not that helpful as people will copy from it once and never look at the repo again.

3 Likes

Interesting proposal!

This might be very nitpicky, but there doesn’t seem to be a “PEP 3818” - did you mean “PEP 3108”, which has the title you mentioned (“Standard Library Reorganization”)?

Thanks!

There’s a lot of sense in this proposal, but the proposal to remove of the chunk module seems inconsistent: it is used by the wave module which is intended to be kept (and also by the aifc module being discussed above).

If wave is to be kept, chunk should also be kept since the code will still be needed (wherever it is located) and so will still have to be maintained, and at that point breaking code by removing it seems pointless.

More generally, although IFF may be no longer relevant, many other file formats do use this basic chunk format and can be parsed by the chunk module (eg. PNG, WebP, AVI, and a number of Windows RIFF-based formats). While serious usage of these formats would wrap a C library, tools that extract metadata about these formats (eg. the size of a PNG image) can be easily written on top of chunk and so the module may have independent value.

Removal isn’t pointless when viewed from the point of maintenance. If chunk is made private then we can break the API at any point based on our needs. But if chunk is kept public then we have to maintain backwards-compatibility, keep the docs in good order, etc. which doesn’t come for free.

1 Like

This PEP is interesting.

This might be slightly off-topic, but still I want to drop my ideas. I’m mainly developing rustpython, and we face a lot of work with the standard library. It mainly involves copy pasting python files from cpython, which is lame and error prone. I like the idea of a repository legacylibs which can be shared over python implementations. Even better would be a split of the standard library into several parts. Say, a pure python part which can be bundled with python implementation during the creation of a x-Python release (for x in [java, rust, c#, c, javascript]). So we would have several layers of repositories: cpython , python-std-libs, python-legacy-libs. All of them combined give the full python experience. But one can also stack them like this: rustpython, python-std-libs, python-legacy-libs.

All in all, I would like to say that pip is pretty good, the packaging situation in python is still a bit weird (pip, conda, flit, poetry, setuptools, distutils, eggs?), and being able to install python alone and run scripts is really powerful.

Further adding to this point, the justification of keeping a module around solely as a dependency seems like it could easily create a rather vicious cycle of indefinite backwards compatibility. Turning it into a private module seems like the smoothest first step after deprecation, but it’s also important to ensure that it’s removed entirely after a reasonable period of time. It’s important for the standard library to not have too many cobwebs. Even if it’s at a significantly reduced cost, private modules still incur a maintenance fee.

Edit: If the functionality of chunk was still required, would it be incorporated privately into wave, or remain as an entirely separate module that is made private?

From the stdlib’s perspective that’s a technical detail, so who knows until actual work is done.

I just wanted to add about AIFF that it is still the de-facto Mac standard for holding PCM audio in the music production and editing world, e.g. all the major DAWs use AIFF on Macs to import and export uncompressed audio etc.