PEP 594, take 2: Removing dead batteries from the standard library

brettcannon · February 28, 2022, 8:24pm

This PEP has already gone to the SC, so it’s considered done unless the SC asks for changes.

gpshead · March 11, 2022, 6:49pm

On behalf of the Python Steering Council,

We are accepting PEP-594 Removing dead batteries from the standard library.

It removes a non-controversial set of very old unmaintained or obsolete libraries from the Python standard library. We expect this PEP to be a one time event, and for future deprecations to be handled differently.

One thing we’d like to see happen while implementing it: Document the status of the modules being deprecated and removed and backport those deprecation updates to older CPython branch documentation (at least back to 3.9). That gets the notices in front of more people who may use the docs for their specific Python version.

Particular care should also be taken during the pre-release cycles that remove deprecated modules. If it turns out the removal of a module proves to be a problem in practice despite the clear deprecation, deferring the removal of that module should be considered to avoid disruption.

Doing a “mass cleanup” of long obsolete modules is a sign that we as a project have been ignoring rather than maintaining parts of the standard library, or not doing so with the diligence being in the standard library implies they deserve. Resolving ongoing discussions around how we define the stdlib for the long term does not block this PEP. It seems worthwhile for us to conduct regular reviews of the contents of the stdlib every few releases so we can avoid accumulating such a large pile of dead batteries, but this is outside the scope of this particular PEP.

– Greg for the PSC

jmr · March 17, 2022, 6:08am

For cgi and cgitb, I just pulled these out into a separate PyPI package: https://pypi.org/project/legacy-cgi/

I’ve got a number of old CGI scripts I’d like to keep working with future Python releases, and I imagine there’s others in the same boat. Hopefully the community can collaborate future maintenance efforts here (and I’ll keep cherry-picking from the CPython tree until it’s dropped there too).

smontanaro · March 17, 2022, 8:55am

You could make a snapshot of cgi and cgitb, toss them up on GitHub then make them available through PyPI. I did that years ago with bsddb185 (well, at least the GitHub part). That at least keeps the code from being lost, and people who are motivated can keep it working.

steve.dower · March 17, 2022, 8:26pm

This was discussed pretty thoroughly already. If they remain on GitHub under “python” or “psf”, we still have the obligation to maintain them, and they probably get harder rather than easier. The code is never lost - it’ll always be in the history of cpython.

Having someone willing to actually maintain them (or at least merge PRs) is a much better option. For that, they need to be in their own repositories. (If they were willing to maintain them in core, we assume they would have volunteered at some point in the last decade.)

smontanaro · March 17, 2022, 8:48pm

That wasn’t at all what I suggested. You could make a snapshot of cgi and cgitb , toss them up on GitHub was meant to imply that the interested user could decide to maintain it, make a snapshot of the code and host it as a separate repo on GitHub. I did not intend to imply the code would live on somehow connected to the python/cpython repo. To wit:

My apologies for not searching for the bsddb185 code first. As it turns out that was long before CPython was hosted on GitHub. I doubt I created a repo anywhere, just snagged the code and uploaded it to PyPI.

Maybe there’s some way to fork just the modules of interest from python/cpython. I’m not at all a git wizard. If that’s possible, perhaps part of the dead batteries PEP should show people how to do that so they don’t lose the dead batteries’ histories when they decide to tilt at windmills.

steve.dower · March 17, 2022, 9:30pm

I know this wasn’t the intent, and nobody else who suggested it earlier intended it either. It’s just the practicalities of how it works out. If “we” put it on GitHub, it’ll be attached to someone’s account, and it’s hard to disown at that point. If “we” put it on PyPI, it’ll be attached to someone’s account - same as bsddb185 - and only that person can update it until it gets transferred.

Getting it out of the repo is just git checkout 3.10 (or whichever branch it is last in) and then copying the files. Or browse to an earlier branch in GitHub and download the file directly. It’s really no less obvious than trying to find another repository somewhere else, and it’s far more obvious what you get out of it (i.e. the file, and not somewhere to file issues, or someone to contact about it).

If you personally feel passionate enough about it, then you can be the person who does it. There’s nothing wrong with that, and nobody will stop you. But we decided not to do it “officially” for those reasons.

smontanaro · March 18, 2022, 12:54am

Who do you meant by “we?” I meant the guy who wants to keep cgiand cgitb alive.

CAM-Gerlach · March 18, 2022, 2:04am

I’m not sure I understand the response here. It sounds like @smontanaro was just suggesting to @jmr how to pull out the code out to their own personal repository, publish it themselves to PyPI, and maintain it independently, which it seems @jmr had in fact already done on his own. Looking at the package name and description, both on PyPI and GitHub, prominent mention is made of the fact that it is a fork, and the standard library version is stated to be deprecated and slated for removal. Indeed, per the Wikipedia definition of “fork”:

In software engineering, a project fork happens when developers take a copy of source code from one software package and start independent development on it, creating a distinct and separate piece of software.

Therefore, given the meaning of “fork” is quite clear, I don’t see any real risk of user confusion as to whether the CPython core dev team is responsible for the forked version.

As to

it appears @smontanaro was specifically asking about ways to do so while preserving Git history of the existing project in the new one, which can often be very useful when maintaining the code. Unfortunately, while I am aware of a few possible ways to do it (and have done it on occasion), all the ways I know of involve a lot of time, effort and Git black magic, particularly for a repo as large and long-lived as CPython, so I don’t think its really in scope for the PEP (but could be brought up elsewhere, such as this thread).

steve.dower · March 18, 2022, 11:37am

You may be right. I noticed it had already been done, which made Skip’s suggestion seem more generic (like the past ones had been) rather than specifically intended for Jack.

fungi · March 18, 2022, 1:27pm

Sure, git filter-branch is a bit black magic, with scary
warnings in its manpage. The alternative it suggests though, is
actually pretty great and not all that hard to use:
GitHub - newren/git-filter-repo: Quickly rewrite git repository history (filter-branch replacement) (bonus points, it’s
written in Python).

CAM-Gerlach · March 18, 2022, 6:23pm

That looks like a great solution, thanks—might come in handy in the future. The last time I had to do this (a couple years+ ago), those warnings weren’t there and git-filter-repo had been effectively unmainatined for many years, so I had to make do with git filter-branch, BFG and manual patching, which were all terribly suited for what I was trying to, which git filter repo is explicitly designed for.

arhadthedev · March 18, 2022, 7:07pm

Shameless plug: I’ve updated the test for os.sendfile that also used asyncore (GH-31876), a review is needed.

I message here because that PR was spamming rebase notifications last days while I was hunting down problems on the Ubuntu runner, so I suspect that its undrafting could pass unnoticed.

jmr · March 19, 2022, 3:44pm

The commands I used to start GitHub - jackrosenthal/python-cgi: Fork of the standard library cgi and cgitb modules, being deprecated in PEP-594 :

git clone https://github.com/python/cpython
cd cpython
git filter-repo --force --path Lib/cgi.py --path Lib/cgitb.py --path LICENSE --path Doc/library/cgitb.rst --path Doc/library/cgi.rst
git mv Lib/cgi.py cgi.py
git mv Lib/cgitb.py cgitb.py
mkdir docs
git mv Doc/library/cgi.rst docs/cgi.rst
git mv Doc/library/cgitb.rst docs/cgitb.rst
git commit -m "Move files from their cpython paths"

Then the rest was throwing in a pyproject.toml, README.rst, and publishing a package.

So yeah … the git history was preserved

zitterbewegung · March 23, 2022, 8:18pm

I am thinking about making a python package that would maintain these dead battery’s as a pip package but have no mention of python , the psf and anything python related and not have any trademarks and make no claim or direction to interact with any psf members . I plan on calling this dead batteries is there anything I haven’t thought of or things I should also do ?

CAM-Gerlach · March 23, 2022, 8:46pm

I would strongly urge you consider making the modules you need separate (distribution) packages, one for each module, and preferably separate source repositories (perhaps under a single dead-batteries GH org), perhaps with some form of automation as to the generation and maintenance of the packaging/deployment infrastructure. They are a collection of otherwise mostly-unrelated code with unrelated purposes, and conflating them doesn’t seem to have much benefit aside from a modest reduction in the initial overhead of creating separate repos with boilerplate infra. On the other hand:

Any given user is likely to only want/need ≈one specific module, and having to install all of them just to get one is rather undesirable, since not only does consume many times more resources, but also pulls in a substantial amount of old, unmaintained, vulnerable code (much of it security-relevant) in the process.
This means that many modules and/or packages will get installed with a single distribution package, which (particularly the former, which is very rare) is uncommon, generally discouraged and can result in unintuitive, unexpected or unintended behaviors when creating, installing and using the package. Furthermore, it means that all of them will be exposed as top-level import packages, instead of only the one the user actually needs.
Whenever any of them is updated, you’ll have to release an update with all of them, which is inefficient, leads to higher update churn and can bottleneck improvements getting out to users, since it means you can’t release specific subcomponets separately; furthermore, this leads to the version number becoming less meaningful wrt changes in a specific module
There are additional difficulties on the source repo side, as you can’t delegate access to specific maintainers/contributors to be responsible for only specific modules, its more difficult to apply different coding standards, documentation and packaging methods to each one, and you cannot easily drop maintenance of specific modules without removing them from the codebase and distribution
Contributors are likely to only be interested in a specific module, so this increases the size and complexity of the codebase, as well as the overall overhead with single-module changes

So as mentioned, my recommendation is pull out the modules you want to actively maintain to separate GitHub/GitLab repos under a common organization with a common boilerplate template, and then release them as separate PyPI (distribution) packages. You can use tools like All-Repos, cookietemple and cruft to easily automate common boilerplate changes, minimizing any extra overhead this incurs past initial creation (which can be done relatively quickly with tools like hub and gh).

merwok · March 23, 2022, 10:03pm

Hello,

I’m agreeing with the bulk of what you said, but wanted to raise this point:

This means that many modules and/or packages will get installed with
a single distribution package, which (particularly the former, which
is very rare) is uncommon, generally discouraged

I don’t think this specific point is true, and hope it’s not the consensus.

Python distributions have always been allowed to contain combinations of
zero or more modules, zero or more packages (with optional package data
files), zero or more scripts, and let’s ignore data files here

It’s true that it is very common to have one distribution install one
top-level package, with code neatly organized in sub-modules, and a good
chance to avoid module naming clashes (helped when the package name is
the same as the distribution project name, but even then not
guaranteed), but it should be fine to install more than one packages, or
even a few modules if that’s the organization that makes sense for the
project or the author. I don’t think it should be discouraged (but it’s
good if tutos show the typical thing, and that’s enough IMO).

Regards

brettcannon · March 23, 2022, 10:45pm

Only thing you really have to make sure to do is to keep the license with the code.

Yeah, I think disparate modules shouldn’t be shipped together, but not restrict the general size of anything (else Django has issues ). The middle ground is to pull modules together based on their grouping at The Python Standard Library — Python 3.8.17 documentation (I picked 3.8 because the docs are going to be listing the deprecations and thus not grouped in 3.9 and newer). Otherwise it’s open source and if people want a different grouping they can do the work to group it separately.

fungi · May 8, 2022, 8:27pm

With the cgi import warning in 3.11.0b1 now, it’s become apparent that pip is relying on cgi.parse_header() in a couple of places in order to parameterize HTTP header values in responses. I see that http.client has a parse_headers() function but it needs to operate on a bytestream of the raw response and doesn’t actually chop up the params from the header values anyway. I had hoped requests was a way out, but the bits which looked promising to me there aren’t considered part of its public API.

Is there a recommended alternative to cgi.parse_header(), or is it better to just forklift the code from that module directly into pip?

fungi · May 8, 2022, 8:37pm

Sorry, I should have kept digging. I found an answer in the old PEP 594 thread which suggests using email.message.Message objects.

Edit: link here in case anyone else gets stumped like I was… PEP 594: Removing dead batteries from the standard library - #14 by mjpieters