PEP 594: Removing dead batteries from the standard library

Removal isn’t pointless when viewed from the point of maintenance. If chunk is made private then we can break the API at any point based on our needs. But if chunk is kept public then we have to maintain backwards-compatibility, keep the docs in good order, etc. which doesn’t come for free.

2 Likes

This PEP is interesting.

This might be slightly off-topic, but still I want to drop my ideas. I’m mainly developing rustpython, and we face a lot of work with the standard library. It mainly involves copy pasting python files from cpython, which is lame and error prone. I like the idea of a repository legacylibs which can be shared over python implementations. Even better would be a split of the standard library into several parts. Say, a pure python part which can be bundled with python implementation during the creation of a x-Python release (for x in [java, rust, c#, c, javascript]). So we would have several layers of repositories: cpython , python-std-libs, python-legacy-libs. All of them combined give the full python experience. But one can also stack them like this: rustpython, python-std-libs, python-legacy-libs.

All in all, I would like to say that pip is pretty good, the packaging situation in python is still a bit weird (pip, conda, flit, poetry, setuptools, distutils, eggs?), and being able to install python alone and run scripts is really powerful.

Further adding to this point, the justification of keeping a module around solely as a dependency seems like it could easily create a rather vicious cycle of indefinite backwards compatibility. Turning it into a private module seems like the smoothest first step after deprecation, but it’s also important to ensure that it’s removed entirely after a reasonable period of time. It’s important for the standard library to not have too many cobwebs. Even if it’s at a significantly reduced cost, private modules still incur a maintenance fee.

Edit: If the functionality of chunk was still required, would it be incorporated privately into wave, or remain as an entirely separate module that is made private?

2 Likes

From the stdlib’s perspective that’s a technical detail, so who knows until actual work is done.

I just wanted to add about AIFF that it is still the de-facto Mac standard for holding PCM audio in the music production and editing world, e.g. all the major DAWs use AIFF on Macs to import and export uncompressed audio etc.

4 posts were split to a new topic: Moving all stdlib packages into wheels

Thanks Christian for driving this! I think clearing dead batteries out will be quite helpful.

A few smallish comments:

  • In the list of substitutes to getopt and optparse, I would list not only argparse but Click. I think it’s a good illustration of the point that for many problems the community can produce better solutions when working outside the stdlib than working within it, with its (necessary) constraints on release cycles and stability.

    (But I agree with keeping them.)

  • For aifc, I find the linked feedback on python-dev persuasive that it’s best to keep it, unless we have reason to think it comes with more of a maintenance burden than the average old and stable module does.

    In particular, I think given what we learned from that feedback it doesn’t make sense to call this a “dead” battery. It might be a good candidate for finding a way to hand off to the community that uses it, but it’s one that clearly works well for its use case today.

  • In a few places you highlight how long ago something was first introduced. This line in the rationale particularly stuck out at me:

    30-year-old multimedia formats like the sunau audio format, which was used on SPARC and NeXT workstations in the late 1980s.

    Python didn’t exist in the late 1980s, so this clearly isn’t why the module was added – which makes this line feel not entirely fair. (Indeed the module was added in 1993.) You could make your point just as well by saying it was used on such workstations “in the 1980s and 1990s”.

    That way the argument isn’t parallel to one that says “C was used on PDP-11 minicomputers in the early 1970s” and concludes that platforms should stop supporting it. :slight_smile:

1 Like

Hello,

Thank to the pep we have cleaned the codebase of a quite large python projet which was still using the legacy email API (btw, you guys really did a great job with the new API !!!) but there are some deprecated utils functions that we are using internally that don’t seem to have (yet ?) any replacement.

The functions we are using are email.utils.formataddr and email.utils.getaddresses, both format/parse RFC-2822 compliant email addresses (email addresses are described by a 24-rules long bnf grammar). We would like to know if there is any existing alternative planned or if you suggest a pypi package.

Regards,

You might need to ask a wider audience about this as the people maintaining the email package like @maxking and @barry might not be monitoring this topic. Probably an email to python-dev is your best bet for the proper audience if my looping in of Abhilash and Barry doesn’t work.

1 Like

Just so you know, I’ve volunteered to take on maintaining cgi/cgitb in my copious spare time (ho ho) since I started using it in a project shortly before this PEP appeared.

For the anecdotal evidence files, the target was an embedded Linux system that needed some http-based controls. Since this is a system with limited RAM and very limited disc space (more RAM), the usual big flexible server solutions were utterly inappropriate. We settled on thttpd since we’ve used it before for similar things. I quickly knocked together a Python script as a proof of concept and refined it into the final version fairly easily.

Had cgi not been in the standard library, we wouldn’t have used Python. Period. Bludgeoning the build system into acquiring PyPI packages is non-trivial, and frankly rather daunting compared with writing the equivalent script as a C program.

2 Likes

This kind of data point concerns me, as if you have a build/deploy step anywhere then it should be trivial to inject a package.

Note that there are very few cases where you must run pip on the target machine. Most can be satisfied by having it install packages as part of build and then deploying them as if they were part of your own sources. (The exceptions will be files that you can’t deploy, possibly links or things required outside of the package files. Native extensions are trickier, but totally doable.)

People are already choosing to not use Python for this reason, so I think we need to market this flexibility better. Do we know what gives people the idea that package install is a post-deployment step? Is it just the proliferation of quick and easy web tutorials?

Probably the hundreds of project READMEs that say, pip install mypackage. Beginners come in, run it and boom everything magically works.

Naturally, when people learn this in their initial Python usage it sticks. Contrast this to the number of tutorials that explain how to “[trivially] inject a package” for the thousands of build/deploy systems out there.

3 Likes

I guess I’ll write up a blog post describing the approach I use, and see how far it gets.

At the very least, this is a scenario that’s only really supported or explained by Docker, but it’s easy to think that the freeze tools are the best alternative (they’re not) or that pip-running-on-the-target is the only okay approach (it’s not).

Maybe I’ll get some time during the sprints next week… :thinking:

4 Likes

That would probably be good. I definitely don’t consider myself a beginner (:slightly_smiling_face:) and I’m not entirely clear what you’re suggesting here…

1 Like

In this case we’re dealing with buildroot (well, a whole lash-up of stuff that uses buildroot for the Linux half of its life). If the stars align, adding a Debian-style package is easy. If they don’t, it’s a voyage of discovery :frowning:

The short version is a CI setup that looks like:

  • git clone (my project)
  • (depending on project, extract Python distro into .\python)
  • pip install -r requirements.txt --target .
  • tests
  • zip .\**\*
  • copy zipped package to target machine and extract it

And obviously there are 100 things to watch out for here, primarily in your dependencies, but provided you do look out for them this is totally reliable. I’ve done it with Linux web apps and Windows GUI apps (where the app was in a self-extracting ZIP that then launched itself).

An even simpler view of what Steve is advocating is if you can deploy your own code then you can deploy 3rd-party code as if it was your own code. Basically 3rd-party code is deployed differently from your own code only if you choose to treat it differently. Otherwise treat it the same and thus deploy it the same/part of your code.

As such, there’s only 3 scenarios where not having something in the stdlib is a true blocker.

  1. Issues with getting legal approval (I know Paul is familiar with this :wink:)
  2. It’s an extension module and you don’t have access to the compiler toolchain for some reason
  3. You can’t deploy a single line of code and you’re just have access to the REPL

After those it seems to me it’s a matter of convenience for certain users whether something is included in the stdlib or not. (And I would argue that even the above scenarios are still about convenience as Python can’t solve everyone’s deployment and legal problems.)

2 Likes

That’s a very optimistic way of looking at it.

https://duckduckgo.com/?q=pip+not+working

https://duckduckgo.com/?q=pip+install+doesn't+work+3.7

People who have pip working for them consistently underestimate the
difficulty many people have in getting pip working.

A data point: I just tried to run ensurepip from Python3.8a and got a
series of warnings followed by an exception:

SyntaxWarning: ‘str’ object is not callable; perhaps you missed a comma?
SyntaxWarning: ‘str’ object is not callable; perhaps you missed a comma?
SyntaxWarning: invalid escape sequence \w
SyntaxWarning: invalid escape sequence \w

ModuleNotFoundError: No module named ‘_ctypes’

I do have a successfully installed pip running under 3.5, so I just
tried pip3.5 install numpy and got:

No matching distribution found for numpy

I’m not looking for advice on how to solve this problem. I’m just
demonstrating that pip is not a magic bullet. Getting pip working is not
a trivial step, for many people it will be a significant barrier to
entry.

I understand that people in marketing and customer service expect that
for every complaint they receive made directly to them, there could be
anything from 10 to 100 unhappy customers who simply went away
unsatisfied and won’t be back. For every person who asks a question on
Stackoverflow “why isn’t pip working”, we should expect that there are
probably a hundred who silently experienced the same problems but didn’t
complain about it anywhere we can see.

Some proportion of those will have solved their problems by just giving
up and doing without. We rarely hear from them, like the customers who
receive bad service but don’t say anything and never come back. As a
result, we suffer from survivorship bias.

Because we are surrounded by those who have pip working and can use it,
but not from those who can’t use it, we’re biased to think that pip
“Just Works” and that installation of third party libraries is trivial.

“Boom everything magically works” – except when it doesn’t, but we
don’t see those cases. Out of sight, out of mind.

3 Likes

Ah, sorry. I was thinking about the problem from a different angle, specifically this post from @njs

When developing a new project, I would expect the initial Write a script" stage to start with using just stdlib features, but transition fairly rapidly to using 3rd party packages. Whether the user uses a virtualenv or just the system Python is a matter of preference/experience, but at this point pip install stuff is the easy and obvious way to go 1.

The problem comes at the transition to “sharing with others” stage. At that point, package install is a post-deploy step, in that you say to the recipient, “here’s my script - you’ll need to install Python and a few dependencies, here’s a requirements.txt file you can use to do that, give me a shout if you have problems”. It’s still a bit fiddly and manual, but you have a direct interaction with the people you’re sharing your code with, and “oh yes, I forgot I have foo installed” is just troubleshooting, not a “failed deployment”.

It’s only when you get to step 3 or @njs’s description (deploy a webapp/distribute a standalone app) that you need to consider “deployment” as a formal thing. And at this point you quite possibly already have a routine of “ship the code, install dependencies in the target environment”, so switching to shipping dependencies with the code is a big change - not only do you need to include the dependencies, you also need to add code to your application to fix up sys.path, and you need to test all this as it’s an architectural change to your code. Not hard, maybe, but it’s a fairly big shift in how you think about what you’re doing (“sharing a script” vs “deploying an application”).

There’s very little in the Python packaging documentation that really helps with this final step. Not many descriptions of how to do it, little or no “best practice” recommendations, and essentially no help for people trying to get things working. So people stick with “what they know works”, the stage 2 “dependency installation as post-deployment step” approach, because it’s a known problem - maybe not easy, but at least familiar.

Having said all of the above, to bring the focus back to “Removing dead batteries from the standard library”, it’s very rare in my experience that the problem is fundamental, insurmountable problems where you can’t use external modules, but rather a succession of annoying road blocks, that add up to the point where it’s just not worth using Python for the task at hand2. Bundling dependencies is pretty much just as annoying a stumbling block in that case as having to get a 3rd party tool added to the target environment, so I’m not sure it makes much difference there.

1 And yes, pip install stuff isn’t always as easy as we’d like it to be, but there’s lots of people and resources who will help you solve issues, whereas there are very few places you can go for help on bundling stuff…

2 My experience here is with Python as a support tool, not the core business language - if your business is based on Python and you haven’t formulated a workable policy on PyPI modules, let’s just say I’m surprised…

1 Like

What is the current state of this PEP?

Is it still realistic that the dead batteries will be deprecated with 3.8 and the removal will take place in 3.10?

cgi.FieldStorage is used by Zope and thus Plone, and there is new discussion about whether vendoring it or migrating to https://pypi.org/project/multipart/

Thank you!