PEP 594, take 2: Removing dead batteries from the standard library

I have submitted the PEP to the SC for consideration: PEP 594 -- Removing dead batteries from the standard library · Issue #109 · python/steering-council · GitHub .

12 Likes

I realise I’m very late to this discussion, but scanning through the PEP, there are a couple of deprecations I was (mildly) surprised to see:

cgi: Yes, launching a new process for every request is inefficient, but there are cases where it’s good enough. It’s conceptually much easier than writing an application with a proper web framework - it’s the web equivalent of scripting. And there are (or used to be?) cheap web hosts would give you CGI support as the only way to run custom code on the server, and certainly don’t make it easy to install something from PyPI.

Reading bits of the previous discussion and making educated guesses, I think it will still be possible to write CGI scripts using only the standard library even without the cgi module, but I’m not 100% sure. If that is right, maybe it’s worth spelling it out in the PEP?

telnetlib: Telnet is old hat, but still used at times for devices on a local network. I ran into it when playing with an old e-reader: you could enable a telnet server to get shell access. A quick look on PyPI turns up two different packages to control VLC over telnet, packages for controlling network devices and other things, and a package adding SOCKS proxy support. These are all using telnetlib and released in the last couple of years.

There’s probably no particular reason this needs to be in the standard library, and I see there are already several alternatives on PyPI (telnetlib3 appears to have the most attention of the ones I found). Maybe the PEP can explain this - the current detail gives no rationale for removing the module, and its brevity implies that telnet is self-evidently useless, which doesn’t seem right.

2 Likes

Agreed, as someone who maintains an RFC-compliant Telnet server in
Python (a MUD framework specifically), I’m mildly disappointed to
see this basic implementation being dropped from the stdlib. In my
project’s case it’s not going to imply significant impact at least,
since I’m only using it for automated testing and even then only
rely on telnetlib’s client implementation and a few of its constants
for sending raw option negotiations, but as protocols go it’s no
more venerable than, say, ftplib or poplib.

Since this is for minimizing maintenance on Python itself we would need to explicitly know how widely CGI as a hosting platform is before considering this in reverting that part of the PEP.

If you have some suggested wording I would happily take a PR for that to clarify things.

No one said some of us wouldn’t like to drop those modules as well. :wink: But telnetlib is not widely used and shouldn’t be used unless you know what you’re doing due to the security risk. And everyone is going to have differing opinions for all of this as to whether is more used or not. But the key point with both cgi and telnetlib is we didn’t get enough pushback to exclude them from the PEP and we wouldn’t accept them into the stdlib today (which is what I believe Christian used as a guideline and I agree with).

4 Likes

This PEP has already gone to the SC, so it’s considered done unless the SC asks for changes.

1 Like

On behalf of the Python Steering Council,

We are accepting PEP-594 Removing dead batteries from the standard library.

It removes a non-controversial set of very old unmaintained or obsolete libraries from the Python standard library. We expect this PEP to be a one time event, and for future deprecations to be handled differently.

One thing we’d like to see happen while implementing it: Document the status of the modules being deprecated and removed and backport those deprecation updates to older CPython branch documentation (at least back to 3.9). That gets the notices in front of more people who may use the docs for their specific Python version.

Particular care should also be taken during the pre-release cycles that remove deprecated modules. If it turns out the removal of a module proves to be a problem in practice despite the clear deprecation, deferring the removal of that module should be considered to avoid disruption.

Doing a “mass cleanup” of long obsolete modules is a sign that we as a project have been ignoring rather than maintaining parts of the standard library, or not doing so with the diligence being in the standard library implies they deserve. Resolving ongoing discussions around how we define the stdlib for the long term does not block this PEP. It seems worthwhile for us to conduct regular reviews of the contents of the stdlib every few releases so we can avoid accumulating such a large pile of dead batteries, but this is outside the scope of this particular PEP.

– Greg for the PSC

21 Likes

For cgi and cgitb, I just pulled these out into a separate PyPI package: https://pypi.org/project/legacy-cgi/

I’ve got a number of old CGI scripts I’d like to keep working with future Python releases, and I imagine there’s others in the same boat. Hopefully the community can collaborate future maintenance efforts here (and I’ll keep cherry-picking from the CPython tree until it’s dropped there too).

6 Likes

You could make a snapshot of cgi and cgitb, toss them up on GitHub then make them available through PyPI. I did that years ago with bsddb185 (well, at least the GitHub part). That at least keeps the code from being lost, and people who are motivated can keep it working.

This was discussed pretty thoroughly already. If they remain on GitHub under “python” or “psf”, we still have the obligation to maintain them, and they probably get harder rather than easier. The code is never lost - it’ll always be in the history of cpython.

Having someone willing to actually maintain them (or at least merge PRs) is a much better option. For that, they need to be in their own repositories. (If they were willing to maintain them in core, we assume they would have volunteered at some point in the last decade.)

4 Likes

That wasn’t at all what I suggested. You could make a snapshot of cgi and cgitb , toss them up on GitHub was meant to imply that the interested user could decide to maintain it, make a snapshot of the code and host it as a separate repo on GitHub. I did not intend to imply the code would live on somehow connected to the python/cpython repo. To wit:

My apologies for not searching for the bsddb185 code first. As it turns out that was long before CPython was hosted on GitHub. I doubt I created a repo anywhere, just snagged the code and uploaded it to PyPI.

Maybe there’s some way to fork just the modules of interest from python/cpython. I’m not at all a git wizard. If that’s possible, perhaps part of the dead batteries PEP should show people how to do that so they don’t lose the dead batteries’ histories when they decide to tilt at windmills.

I know this wasn’t the intent, and nobody else who suggested it earlier intended it either. It’s just the practicalities of how it works out. If “we” put it on GitHub, it’ll be attached to someone’s account, and it’s hard to disown at that point. If “we” put it on PyPI, it’ll be attached to someone’s account - same as bsddb185 - and only that person can update it until it gets transferred.

Getting it out of the repo is just git checkout 3.10 (or whichever branch it is last in) and then copying the files. Or browse to an earlier branch in GitHub and download the file directly. It’s really no less obvious than trying to find another repository somewhere else, and it’s far more obvious what you get out of it (i.e. the file, and not somewhere to file issues, or someone to contact about it).

If you personally feel passionate enough about it, then you can be the person who does it. There’s nothing wrong with that, and nobody will stop you. But we decided not to do it “officially” for those reasons.

1 Like

Who do you meant by “we?” I meant the guy who wants to keep cgiand cgitb alive.

I’m not sure I understand the response here. It sounds like @smontanaro was just suggesting to @jmr how to pull out the code out to their own personal repository, publish it themselves to PyPI, and maintain it independently, which it seems @jmr had in fact already done on his own. Looking at the package name and description, both on PyPI and GitHub, prominent mention is made of the fact that it is a fork, and the standard library version is stated to be deprecated and slated for removal. Indeed, per the Wikipedia definition of “fork”:

In software engineering, a project fork happens when developers take a copy of source code from one software package and start independent development on it, creating a distinct and separate piece of software.

Therefore, given the meaning of “fork” is quite clear, I don’t see any real risk of user confusion as to whether the CPython core dev team is responsible for the forked version.

As to

it appears @smontanaro was specifically asking about ways to do so while preserving Git history of the existing project in the new one, which can often be very useful when maintaining the code. Unfortunately, while I am aware of a few possible ways to do it (and have done it on occasion), all the ways I know of involve a lot of time, effort and Git black magic, particularly for a repo as large and long-lived as CPython, so I don’t think its really in scope for the PEP (but could be brought up elsewhere, such as this thread).

You may be right. I noticed it had already been done, which made Skip’s suggestion seem more generic (like the past ones had been) rather than specifically intended for Jack.

1 Like

Sure, git filter-branch is a bit black magic, with scary
warnings in its manpage. The alternative it suggests though, is
actually pretty great and not all that hard to use:
GitHub - newren/git-filter-repo: Quickly rewrite git repository history (filter-branch replacement) (bonus points, it’s
written in Python).

4 Likes

That looks like a great solution, thanks—might come in handy in the future. The last time I had to do this (a couple years+ ago), those warnings weren’t there and git-filter-repo had been effectively unmainatined for many years, so I had to make do with git filter-branch, BFG and manual patching, which were all terribly suited for what I was trying to, which git filter repo is explicitly designed for.

Shameless plug: I’ve updated the test for os.sendfile that also used asyncore (GH-31876), a review is needed.

I message here because that PR was spamming rebase notifications last days while I was hunting down problems on the Ubuntu runner, so I suspect that its undrafting could pass unnoticed.

2 Likes

The commands I used to start GitHub - jackrosenthal/python-cgi: Fork of the standard library cgi and cgitb modules, being deprecated in PEP-594 :

git clone https://github.com/python/cpython
cd cpython
git filter-repo --force --path Lib/cgi.py --path Lib/cgitb.py --path LICENSE --path Doc/library/cgitb.rst --path Doc/library/cgi.rst
git mv Lib/cgi.py cgi.py
git mv Lib/cgitb.py cgitb.py
mkdir docs
git mv Doc/library/cgi.rst docs/cgi.rst
git mv Doc/library/cgitb.rst docs/cgitb.rst
git commit -m "Move files from their cpython paths"

Then the rest was throwing in a pyproject.toml, README.rst, and publishing a package.

So yeah … the git history was preserved :slight_smile:

9 Likes

I am thinking about making a python package that would maintain these dead battery’s as a pip package but have no mention of python , the psf and anything python related and not have any trademarks and make no claim or direction to interact with any psf members . I plan on calling this dead batteries is there anything I haven’t thought of or things I should also do ?

I would strongly urge you consider making the modules you need separate (distribution) packages, one for each module, and preferably separate source repositories (perhaps under a single dead-batteries GH org), perhaps with some form of automation as to the generation and maintenance of the packaging/deployment infrastructure. They are a collection of otherwise mostly-unrelated code with unrelated purposes, and conflating them doesn’t seem to have much benefit aside from a modest reduction in the initial overhead of creating separate repos with boilerplate infra. On the other hand:

  • Any given user is likely to only want/need ≈one specific module, and having to install all of them just to get one is rather undesirable, since not only does consume many times more resources, but also pulls in a substantial amount of old, unmaintained, vulnerable code (much of it security-relevant) in the process.
  • This means that many modules and/or packages will get installed with a single distribution package, which (particularly the former, which is very rare) is uncommon, generally discouraged and can result in unintuitive, unexpected or unintended behaviors when creating, installing and using the package. Furthermore, it means that all of them will be exposed as top-level import packages, instead of only the one the user actually needs.
  • Whenever any of them is updated, you’ll have to release an update with all of them, which is inefficient, leads to higher update churn and can bottleneck improvements getting out to users, since it means you can’t release specific subcomponets separately; furthermore, this leads to the version number becoming less meaningful wrt changes in a specific module
  • There are additional difficulties on the source repo side, as you can’t delegate access to specific maintainers/contributors to be responsible for only specific modules, its more difficult to apply different coding standards, documentation and packaging methods to each one, and you cannot easily drop maintenance of specific modules without removing them from the codebase and distribution
  • Contributors are likely to only be interested in a specific module, so this increases the size and complexity of the codebase, as well as the overall overhead with single-module changes

So as mentioned, my recommendation is pull out the modules you want to actively maintain to separate GitHub/GitLab repos under a common organization with a common boilerplate template, and then release them as separate PyPI (distribution) packages. You can use tools like All-Repos, cookietemple and cruft to easily automate common boilerplate changes, minimizing any extra overhead this incurs past initial creation (which can be done relatively quickly with tools like hub and gh).