I’m not sure I understand the response here. It sounds like @smontanaro was just suggesting to @jmr how to pull out the code out to their own personal repository, publish it themselves to PyPI, and maintain it independently, which it seems @jmr had in fact already done on his own. Looking at the package name and description, both on PyPI and GitHub, prominent mention is made of the fact that it is a fork, and the standard library version is stated to be deprecated and slated for removal. Indeed, per the Wikipedia definition of “fork”:
Therefore, given the meaning of “fork” is quite clear, I don’t see any real risk of user confusion as to whether the CPython core dev team is responsible for the forked version.
it appears @smontanaro was specifically asking about ways to do so while preserving Git history of the existing project in the new one, which can often be very useful when maintaining the code. Unfortunately, while I am aware of a few possible ways to do it (and have done it on occasion), all the ways I know of involve a lot of time, effort and Git black magic, particularly for a repo as large and long-lived as CPython, so I don’t think its really in scope for the PEP (but could be brought up elsewhere, such as this thread).
That looks like a great solution, thanks—might come in handy in the future. The last time I had to do this (a couple years+ ago), those warnings weren’t there and git-filter-repo had been effectively unmainatined for many years, so I had to make do with git filter-branch, BFG and manual patching, which were all terribly suited for what I was trying to, which git filter repo is explicitly designed for.
I am thinking about making a python package that would maintain these dead battery’s as a pip package but have no mention of python , the psf and anything python related and not have any trademarks and make no claim or direction to interact with any psf members . I plan on calling this dead batteries is there anything I haven’t thought of or things I should also do ?
I would strongly urge you consider making the modules you need separate (distribution) packages, one for each module, and preferably separate source repositories (perhaps under a single dead-batteries GH org), perhaps with some form of automation as to the generation and maintenance of the packaging/deployment infrastructure. They are a collection of otherwise mostly-unrelated code with unrelated purposes, and conflating them doesn’t seem to have much benefit aside from a modest reduction in the initial overhead of creating separate repos with boilerplate infra. On the other hand:
Any given user is likely to only want/need ≈one specific module, and having to install all of them just to get one is rather undesirable, since not only does consume many times more resources, but also pulls in a substantial amount of old, unmaintained, vulnerable code (much of it security-relevant) in the process.
This means that many modules and/or packages will get installed with a single distribution package, which (particularly the former, which is very rare) is uncommon, generally discouraged and can result in unintuitive, unexpected or unintended behaviors when creating, installing and using the package. Furthermore, it means that all of them will be exposed as top-level import packages, instead of only the one the user actually needs.
Whenever any of them is updated, you’ll have to release an update with all of them, which is inefficient, leads to higher update churn and can bottleneck improvements getting out to users, since it means you can’t release specific subcomponets separately; furthermore, this leads to the version number becoming less meaningful wrt changes in a specific module
There are additional difficulties on the source repo side, as you can’t delegate access to specific maintainers/contributors to be responsible for only specific modules, its more difficult to apply different coding standards, documentation and packaging methods to each one, and you cannot easily drop maintenance of specific modules without removing them from the codebase and distribution
Contributors are likely to only be interested in a specific module, so this increases the size and complexity of the codebase, as well as the overall overhead with single-module changes
So as mentioned, my recommendation is pull out the modules you want to actively maintain to separate GitHub/GitLab repos under a common organization with a common boilerplate template, and then release them as separate PyPI (distribution) packages. You can use tools like All-Repos, cookietemple and cruft to easily automate common boilerplate changes, minimizing any extra overhead this incurs past initial creation (which can be done relatively quickly with tools like hub and gh).
I’m agreeing with the bulk of what you said, but wanted to raise this point:
This means that many modules and/or packages will get installed with
a single distribution package, which (particularly the former, which
is very rare) is uncommon, generally discouraged
I don’t think this specific point is true, and hope it’s not the consensus.
Python distributions have always been allowed to contain combinations of
zero or more modules, zero or more packages (with optional package data
files), zero or more scripts, and let’s ignore data files here
It’s true that it is very common to have one distribution install one
top-level package, with code neatly organized in sub-modules, and a good
chance to avoid module naming clashes (helped when the package name is
the same as the distribution project name, but even then not
guaranteed), but it should be fine to install more than one packages, or
even a few modules if that’s the organization that makes sense for the
project or the author. I don’t think it should be discouraged (but it’s
good if tutos show the typical thing, and that’s enough IMO).
Only thing you really have to make sure to do is to keep the license with the code.
Yeah, I think disparate modules shouldn’t be shipped together, but not restrict the general size of anything (else Django has issues ). The middle ground is to pull modules together based on their grouping at The Python Standard Library — Python 3.8.13 documentation (I picked 3.8 because the docs are going to be listing the deprecations and thus not grouped in 3.9 and newer). Otherwise it’s open source and if people want a different grouping they can do the work to group it separately.
With the cgi import warning in 3.11.0b1 now, it’s become apparent that pip is relying on cgi.parse_header() in a couple of places in order to parameterize HTTP header values in responses. I see that http.client has a parse_headers() function but it needs to operate on a bytestream of the raw response and doesn’t actually chop up the params from the header values anyway. I had hoped requests was a way out, but the bits which looked promising to me there aren’t considered part of its public API.
Is there a recommended alternative to cgi.parse_header(), or is it better to just forklift the code from that module directly into pip?
Yes, thanks! I’m clearly just going blind. I’m sure I read straight
past that several times over the last year and completely forgot it
was in there, then later failed to even expect something so specific
would be covered within the text of the PEP itself.
Anyway, to wrap it up, this does seem to satisfy pip’s use case for
the cgi module quite nicely.
Questions about the fate of/a replacement for cgi.parse_headers() (and several other other cgi utility functions) seem to have been by a large margin the most asked-about item deprecated by this PEP on various threads. While the PEP contain a good chunk of useful information that helps address this, it is evidently not that easy to find deep in the body text, particularly for the important case of users coming from the cgi module docs, which (unlike the PEP, as you note) is the canonical, up to date documentation once the PEP is accepted.
Right now, the only mention in the docs of the deprecation, much less potential replacements, is just a note to see the PEP for details, with the link pointing to the top level of such. Therefore, users are going to be scrolling through the PEP to find more information and what they should do about it/replace it with, and the first thing they will come across mentioning cgi is the table, which indicates there is no replacement, nor does it link to the cgi section several pages further down containing that information.
Therefore, the docs should directly link to the relevant sections in the PEP and any stdlib alternatives, and the PEP table should link the respective subsections for accessible navigation. I’ve opened issue python/cpython#92611 and PR python/cpython#92612 to do the former.
As discussed above, what people are overwhelmingly asking about are not the module itself, but replacements for a handful of small utility functions, some undocumented, that are part of CGI, and have drop-in or close to drop-in replacements as described in the PEP, can be replaced 1:1 by simply switching to the aforementioned legacy-cgi forward-port, or can even just be copied into one’s code.
An alternative might be to keep a “stub” cgi module around that only provides that handful of utility functions. They might be implemented by calling the recommended replacements. This would keep a lot of code working, and stem the tide of questions, without incurring much maintenance. We’ve done things like this before, I recall the string module. We can be flexible.