PEP 594: Removing dead batteries from the standard library

I don’t doubt that the approach is bad, but it does illustrate the wider problem - yes these wrong solutions should go away, the question is how to make sure the right solutions are then readily available so that people have a pleasant experience with the proposed changes. If the experience is: find SO solution => solution is depreciated and gone => can’t find new solution => brick wall, there will be tears before bedtime :wink:

P.S/Edit: further example of the documentation problem: some docs suggest the solution is to use the ‘python-libuser’ package, but trying to search for it is a maze of depreciated pages and dead links. A good example of how a seemingly simple problem (‘create a handful of linux users using a python script’) becomes a time sink when code and documentation don’t line up.

A tender way I think can be like this:

  • First, move those deprecated batteries from standard library to PyPI
  • And for different OS:
    • Windows installer: after new official Python distribution installed, install them from PyPI
    • Unix-like packaging: work with package maintainers to provide similar installation
  • Now we can update our official documentation to notify people to use PyPI instead of standard library
  • Wait for months and get the download stats of those batteries
  • Add back those still popular batteries to standard library without argument
  • Or keep them all in PyPI
  • Remove the extra installation of those unused batteries from Windows installer and Unix-like packaging
  • And remove the documentation of them from official documentation

Sounds like spending a long time :cold_sweat:

This kind of tracks with my thinking as well: I think it would assuage a lot of concerns if all of the packages to be removed were added to PyPI by the core team and given at least one “final” release, representing its final state as it was in the stdlib, with wheels built for at least those platforms that Python still supports at all (e.g. Windows, manylinux for binary wheels). That would come with absolutely zero promise of future maintenance, but it would be an easy way to get those packages (ok I just have to pip install audioop now) and would be an easy way to turn over maintenance of the package if someone were willing to step up to the plate for it.

I think this could mostly be scripted as well.

1 Like

PyPI download stats are known to not be a good measurement of use due to the fact that one overly anxious CI can download a package a lot.

You mentioned audioop, so let’s use that an example. Let’s say we do wheels targetting Python 3.10 when the PEP suggests to remove audioop (which still requires us to set up the infrastructure for at least each extension module to do this appropriately, so this isn’t free-as-in-time). Those wheels are then unusable 18 months later when Python 3.11 comes out for people relying on those PyPI packages because we wouldn’t be building anymore wheels since we are not supporting those packages anymore. So all we have done is bought people one more release of use which would have been equivalent to simply postponing removal by one more release. And you might argue targeting the stable ABI maybe, but that assumes audioop already supports the stable ABI. And others I’m sure will step in saying “but the module I want to keep is pure Python”, but that’s not going to make everyone else happy who want an extension module maintained and the Python code isn’t version-specific in some way. :wink: Plus that Python wheel will only work for Python 3, so if we ever do a Python 4 that wheel won’t carry over unless we decide Python 3 wheels will work in Python 4 (and who knows if PyPA will do that or not).

Or what if there turns out to be a security vulnerability in audioop (which if you browse through the module’s history isn’t out of the question)? It’s written in C and 1927 lines long so there’s the possibility of a CVE coming up after the Python 3.9 drops out of security fix support. Now what? The responsible thing is to remove the package, but that does require watching for CVEs involving these packages indefinitely which, once again, isn’t a free-as-in-time thing to do.

We could say that as loudly as possible, but there will bound to be someone who doesn’t read that notice and then comes asking for support, so it still doesn’t absolve us of future work.

I personally think the event-stream incident showed you have to be very careful when handing over the keys to a package’s PyPI entry (i.e. if it isn’t a core dev we probably wouldn’t want to risk it and we already don’t use the same package name for backports of modules from the stdlib so I don’t’ see why we would make these modules a special exception).

I understand where people’s request to do one last release on PyPI after each of these modules are removed comes from, but hopefully I’ve pointed out in multiple ways how this isn’t quite as straightforward as one might think.

1 Like

Well said, and I agree with all your points. Thank you for expanding on this.

I thought it might at least buy anyone who cares a bit of transitional aid, so that they would have at least a Python version on which to move toward a new workflow that involves retrieving that package from PyPI. Of course, they also already have the option of using the last Python release (say, 3.9) that included the package, but my point was to provide a transitional aid and show a way forward for anyone still needing to maintain that package for future Python versions. That said, as you point out this has its own problems and is probably not worth the effort.

I think the proposed legacylib repository is a reasonable middle ground.

I would love to hear how that repo would be beneficial to you if you don’t mind sharing? In my head it doesn’t get anybody much because all you need to do is go back to the CPython 3.9 branch in git, find the module, and then copy it. I’m not quite sure how putting it into a separate repository actually benefits people in terms of access and such, so I personally would like to hear about potential use-cases.

In fact, I’m a little worried about that repo going stale. Since we will have to support these modules for bugfixes for 18 months after they are removed – due to supporting e.g. Python 3.9 if they are removed in Python 3.10 – and then another 3.5 years for security fixes, the code in the CPython 3.9 branch will be the most up-to-date version from the Python core team (unless we are extra diligent in copying code over ASAP to the legacylib repo, but then if we’re just blindly copying what’s the benefit?). Now if people fork the code and start maintaining their own versions then that’s great and they will have their own versions, but once again that makes the proposed legacylib repo not that helpful as people will copy from it once and never look at the repo again.

4 Likes

Interesting proposal!

This might be very nitpicky, but there doesn’t seem to be a “PEP 3818” - did you mean “PEP 3108”, which has the title you mentioned (“Standard Library Reorganization”)?

Thanks!

There’s a lot of sense in this proposal, but the proposal to remove of the chunk module seems inconsistent: it is used by the wave module which is intended to be kept (and also by the aifc module being discussed above).

If wave is to be kept, chunk should also be kept since the code will still be needed (wherever it is located) and so will still have to be maintained, and at that point breaking code by removing it seems pointless.

More generally, although IFF may be no longer relevant, many other file formats do use this basic chunk format and can be parsed by the chunk module (eg. PNG, WebP, AVI, and a number of Windows RIFF-based formats). While serious usage of these formats would wrap a C library, tools that extract metadata about these formats (eg. the size of a PNG image) can be easily written on top of chunk and so the module may have independent value.

1 Like

Removal isn’t pointless when viewed from the point of maintenance. If chunk is made private then we can break the API at any point based on our needs. But if chunk is kept public then we have to maintain backwards-compatibility, keep the docs in good order, etc. which doesn’t come for free.

2 Likes

This PEP is interesting.

This might be slightly off-topic, but still I want to drop my ideas. I’m mainly developing rustpython, and we face a lot of work with the standard library. It mainly involves copy pasting python files from cpython, which is lame and error prone. I like the idea of a repository legacylibs which can be shared over python implementations. Even better would be a split of the standard library into several parts. Say, a pure python part which can be bundled with python implementation during the creation of a x-Python release (for x in [java, rust, c#, c, javascript]). So we would have several layers of repositories: cpython , python-std-libs, python-legacy-libs. All of them combined give the full python experience. But one can also stack them like this: rustpython, python-std-libs, python-legacy-libs.

All in all, I would like to say that pip is pretty good, the packaging situation in python is still a bit weird (pip, conda, flit, poetry, setuptools, distutils, eggs?), and being able to install python alone and run scripts is really powerful.

Further adding to this point, the justification of keeping a module around solely as a dependency seems like it could easily create a rather vicious cycle of indefinite backwards compatibility. Turning it into a private module seems like the smoothest first step after deprecation, but it’s also important to ensure that it’s removed entirely after a reasonable period of time. It’s important for the standard library to not have too many cobwebs. Even if it’s at a significantly reduced cost, private modules still incur a maintenance fee.

Edit: If the functionality of chunk was still required, would it be incorporated privately into wave, or remain as an entirely separate module that is made private?

2 Likes

From the stdlib’s perspective that’s a technical detail, so who knows until actual work is done.

I just wanted to add about AIFF that it is still the de-facto Mac standard for holding PCM audio in the music production and editing world, e.g. all the major DAWs use AIFF on Macs to import and export uncompressed audio etc.

4 posts were split to a new topic: Moving all stdlib packages into wheels

Thanks Christian for driving this! I think clearing dead batteries out will be quite helpful.

A few smallish comments:

  • In the list of substitutes to getopt and optparse, I would list not only argparse but Click. I think it’s a good illustration of the point that for many problems the community can produce better solutions when working outside the stdlib than working within it, with its (necessary) constraints on release cycles and stability.

    (But I agree with keeping them.)

  • For aifc, I find the linked feedback on python-dev persuasive that it’s best to keep it, unless we have reason to think it comes with more of a maintenance burden than the average old and stable module does.

    In particular, I think given what we learned from that feedback it doesn’t make sense to call this a “dead” battery. It might be a good candidate for finding a way to hand off to the community that uses it, but it’s one that clearly works well for its use case today.

  • In a few places you highlight how long ago something was first introduced. This line in the rationale particularly stuck out at me:

    30-year-old multimedia formats like the sunau audio format, which was used on SPARC and NeXT workstations in the late 1980s.

    Python didn’t exist in the late 1980s, so this clearly isn’t why the module was added – which makes this line feel not entirely fair. (Indeed the module was added in 1993.) You could make your point just as well by saying it was used on such workstations “in the 1980s and 1990s”.

    That way the argument isn’t parallel to one that says “C was used on PDP-11 minicomputers in the early 1970s” and concludes that platforms should stop supporting it. :slight_smile:

2 Likes

Hello,

Thank to the pep we have cleaned the codebase of a quite large python projet which was still using the legacy email API (btw, you guys really did a great job with the new API !!!) but there are some deprecated utils functions that we are using internally that don’t seem to have (yet ?) any replacement.

The functions we are using are email.utils.formataddr and email.utils.getaddresses, both format/parse RFC-2822 compliant email addresses (email addresses are described by a 24-rules long bnf grammar). We would like to know if there is any existing alternative planned or if you suggest a pypi package.

Regards,

You might need to ask a wider audience about this as the people maintaining the email package like @maxking and @barry might not be monitoring this topic. Probably an email to python-dev is your best bet for the proper audience if my looping in of Abhilash and Barry doesn’t work.

1 Like

Just so you know, I’ve volunteered to take on maintaining cgi/cgitb in my copious spare time (ho ho) since I started using it in a project shortly before this PEP appeared.

For the anecdotal evidence files, the target was an embedded Linux system that needed some http-based controls. Since this is a system with limited RAM and very limited disc space (more RAM), the usual big flexible server solutions were utterly inappropriate. We settled on thttpd since we’ve used it before for similar things. I quickly knocked together a Python script as a proof of concept and refined it into the final version fairly easily.

Had cgi not been in the standard library, we wouldn’t have used Python. Period. Bludgeoning the build system into acquiring PyPI packages is non-trivial, and frankly rather daunting compared with writing the equivalent script as a C program.

2 Likes

This kind of data point concerns me, as if you have a build/deploy step anywhere then it should be trivial to inject a package.

Note that there are very few cases where you must run pip on the target machine. Most can be satisfied by having it install packages as part of build and then deploying them as if they were part of your own sources. (The exceptions will be files that you can’t deploy, possibly links or things required outside of the package files. Native extensions are trickier, but totally doable.)

People are already choosing to not use Python for this reason, so I think we need to market this flexibility better. Do we know what gives people the idea that package install is a post-deployment step? Is it just the proliferation of quick and easy web tutorials?

Probably the hundreds of project READMEs that say, pip install mypackage. Beginners come in, run it and boom everything magically works.

Naturally, when people learn this in their initial Python usage it sticks. Contrast this to the number of tutorials that explain how to “[trivially] inject a package” for the thousands of build/deploy systems out there.

3 Likes