Handling modules on PyPI that are now in the standard library?

dstufft · May 25, 2023, 4:54am

A number of modules lived on PyPI prior to being added to the standard library, or they have backports that are no longer relevant. Unfortunately these tend to garner a fair amount of downloads for various reasons and can often cause people confusion when they’re used. This is particularly a problem in situations where the import name in the PyPI package is the same name as is used in the standard library, since the standard library will typically shadow the installed package ^[1].

In the backport cases, with modern packaging you can set requires-python at the start of the backport in order to prevent that project from being installed on versions where the package is already a part of the standard library, at least when the import names match.

However that leaves the cases of packages that were developed on PyPI first, or in older backports where the requires-python metadata wasn’t added, where people can still install these packages, but not actually use them because the stdlib takes precedence.

This outcome of people doing this tends to range from “harmless but silly” to "actively causing issues, but confusing everyone involved because there’s two versions of the same thing installed (PyPI and standard library) and a bunch of expectations are being violated all over the place.

So the question then becomes, what should the recommended approach be for projects in these situations? An obvious suggestion going forward is properly setting requires-python for all backports, but beyond that?

I see a few options:

Do nothing, let the projects just sit there unmaintained and just accept the occasional issue that crops up from the confusion.
Delete the files that have been uploaded, forcing everyone using it to stop using it even on those older (presumably no longer supported) versions of Python.
Utilize yanking to yank the files, allowing people to still pin to those versions (with a warning message) but otherwise act like those versions have been deleted.
- This also comes with a yanked marker in the PyPI Web Interface.
Decide none of these work, and come up with something better and try and write a PEP and get it approved.

The behavior of the first two options are pretty easily understood, however the behavior of the yanked option can be a tad subtle, and it actually depends on what version of pip is being used to do the install, and how the thing is being referenced to be installed.

For pip 22+, the behavior is basically:

If the version specifier is NOT == or ===, then ignore yanked versions.
If the version specifier is == or === and a yanked file is the only available file, then use it but print a warning (that can include a message to the user from the package) otherwise use the unyanked version.

For pip prior to 22.0, the behavior is basically:

Prefer unyanked versions over yanked versions, assuming they both match the version specifier.
If the only versions matching the specifier (regardless of what it is, even no specifier) is a yanked version, then use it but print a warning (that can include a message from the package).

To see the output, see:

pip 22+, not using == or ===

❯ pip download 'pip>=21.2,<21.2.1'
ERROR: Could not find a version that satisfies the requirement pip<21.2.1,>=21.2 (from versions: 0.2, 0.2.1, 0.3, 0.3.1, 0.4, 0.5, 0.5.1, 0.6, 0.6.1, 0.6.2, 0.6.3, 0.7, 0.7.1, 0.7.2, 0.8, 0.8.1, 0.8.2, 0.8.3, 1.0, 1.0.1, 1.0.2, 1.1, 1.2, 1.2.1, 1.3, 1.3.1, 1.4, 1.4.1, 1.5, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.5.5, 1.5.6, 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.1.0, 6.1.1, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.1.0, 7.1.1, 7.1.2, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.1.0, 8.1.1, 8.1.2, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 10.0.0b1, 10.0.0b2, 10.0.0, 10.0.1, 18.0, 18.1, 19.0, 19.0.1, 19.0.2, 19.0.3, 19.1, 19.1.1, 19.2, 19.2.1, 19.2.2, 19.2.3, 19.3, 19.3.1, 20.0, 20.0.1, 20.0.2, 20.1b1, 20.1, 20.1.1, 20.2b1, 20.2, 20.2.1, 20.2.2, 20.2.3, 20.2.4, 20.3b1, 20.3, 20.3.1, 20.3.2, 20.3.3, 20.3.4, 21.0, 21.0.1, 21.1, 21.1.1, 21.1.2, 21.1.3, 21.2, 21.2.1, 21.2.2, 21.2.3, 21.2.4, 21.3, 21.3.1, 22.0, 22.0.1, 22.0.2, 22.0.3, 22.0.4, 22.1b1, 22.1, 22.1.1, 22.1.2, 22.2, 22.2.1, 22.2.2, 22.3, 22.3.1, 23.0, 23.0.1, 23.1, 23.1.1, 23.1.2)
ERROR: No matching distribution found for pip<21.2.1,>=21.2

pip 22+, using == or ===

❯ pip download 'pip==21.2'
Collecting pip==21.2
  Downloading pip-21.2-py3-none-any.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 34.3 MB/s eta 0:00:00
WARNING: The candidate selected for download or install is a yanked version: 'pip' candidate (version 21.2 at https://files.pythonhosted.org/packages/03/0f/b125bfdd145c1d018d75ce87603e7e9ff2416e742c71b5ac7deba13ca699/pip-21.2-py3-none-any.whl (from https://pypi.org/simple/pip/) (requires-python:>=3.6))
Reason for being yanked: See https://github.com/pypa/pip/issues/8711
Saved ./pip-21.2-py3-none-any.whl
Successfully downloaded pip

pip < 22, using anything

$ pip download 'pip>=21.2,<21.2.1'
Collecting pip<21.2.1,>=21.2
  Downloading pip-21.2-py3-none-any.whl (1.6 MB)
     |████████████████████████████████| 1.6 MB 6.4 MB/s
WARNING: The candidate selected for download or install is a yanked version: 'pip' candidate (version 21.2 at https://files.pythonhosted.org/packages/03/0f/b125bfdd145c1d018d75ce87603e7e9ff2416e742c71b5ac7deba13ca699/pip-21.2-py3-none-any.whl#sha256=71f447dff669d8e2f72b880e3d7ddea2c85cfeba0d14f3307f66fc40ff755176 (from https://pypi.org/simple/pip/) (requires-python:>=3.6))
Reason for being yanked: See https://github.com/pypa/pip/issues/8711
Saved ./pip-21.2-py3-none-any.whl
Successfully downloaded pip

I don’t personally have a preference, but others might! If a strong consensus doesn’t form, most likely it makes sense to just accept the status quo and let “do nothing” stand.

But not always! Users are ever inventive in finding ways to do unexpected things, and can get the PyPI version installed so it shadows the standard library. ↩︎

CAM-Gerlach · May 25, 2023, 7:46am

Dustin’s great summary already covers most of the relevant details, thanks.

For reference, here’s a few relevant examples of backports that were brought up:

Name	Links	Stdlib	Backport	Maintained	Wheel	Uppercap	Downloads/mo.
argparse	pypistats pepy	2.7/3.2	2.3/3.1	2015	Yes	No	6.7 M
asyncio	pypistats pepy	3.4	3.3	2015	Yes	No	2.1 M
typing	pypistats pepy	3.5	2.7/3.4	2019/2021	Yes	Latest	7.9 M

Observations:

The latest version of typing is uppercapped (unlike the other two), but while this is likely to change in the near future, pip just backsolves to older ones. You can see the effect of this in the pepy stats—the previous non-uppercapped version shows a proportion of downloads (≈38%, vs. 52% for the latest, and 10% for older versions) nearly exactly consistent with the total percentage of downloads by Python versions would be affected by the upper cap (42%) minus the percentage of those latter downloads (10%, or 4 percentage points) separately pinned or constrained to other older versions.
Something else interesting is going one with typing: the proportions of users on older Python versions, particularly 2.7, is far higher for typing than for the others. Given the version usage patterns, it seems probably it is some specific set of CI services or similar.
By contrast, an very large majority of asyncio downloads (≈>95%) and overwhelming majority of argparse downloads (>98%) were for versions in which the stdlib version is present, indicating they are almost certainly by mistake. The distribution of Python versions and OSes, however, was generally consistent with most installations being driven by deep-in-the-stack dependencies and automated processes rather than users manually installing such by mistake.

In any case In addition to those above, there’s two other options to handle this:

Release a new, sdist-only version lower-capped to only Python versions on which the module is in the stdlib, and that just raises an informative error when installing
Do the above, except making it a warning rather than an error and still installing as normal
- Or, perhaps better, it could just install an empty import package

My personal conclusions:

Writing, discussing, deciding and implementing a PEP:
- Seems like a large amount of work for relatively little practical gain
- Would likely take a substantial amount of time to see any of that benefit
Yanking seems strictly superior to deleting, as it:
- Behaves similarly on newer pip versions (which people will progressively upgrade to)
- Allows raising an informative, customizable warning on older pip versions Python versions where it actually might be desired) without erroring
- Even older pip versions (that don’t support yanking) are unaffected, which are least important here
- Preserves history
- Clearly marks the versions as yanked on PyPI
- Breaks valid (old Python) unpinned use cases but allows unbreaking them
Releasing a new stdlib-version-only sdist that errors on install seems in turn mostly superior to yanking, as relative to that, it:
- Shows a more helpful warning on recent pip versions
- Has consistent behavior on older pip versions (and other installers)
- Only affects Python versions that don’t have the stdlib module
- Avoids backsolving for earlier versions that are upper capped
- Downside: Older pip versions will also error rather than warn
The status quo seems generally preferable to releasing a stdlib-only sdist that errors, as it:
- Will immediately break users of libraries or applications that inadvertently depend on these packages
- Many of those users won’t be able to do much about it except flood the package developers with issues
- It might never get fixed, as many such packages may be unmaintained, and thus permanently broken on all Python versions
- The practical benefits/status quo harms seems relatively small, at least compared to the large breakage
Releasing a new stdlib-version-only sdist that warns but doesn’t error seems generally superior to the status quo, as it has all the advantages of erroring, and in addition:
- Doesn’t actually break anything, avoiding the problem above
- Still provides an informative warning of the problem likely to reach developers
- Minor downside relative to status quo: a relatively small-ish increase in install time for users on newer Python versions
Best of all, IMO, is releasing a sdist that warns and installs an empty package, as it is strictly superior to the previous given it has all the same advantages but also:
- Minimizes the bandwidth and space costs of the status quo
- Only breaks in highly unlikely pathlogical cases (that are probably broken, perhaps more subtly, anyway)
- Reduces (though not eliminates) the one downside of the previous (modestly increased install time)

So, TL;DR, I suggest for these packages we release a new sdist-only version lower-capped to the Python versions that contain the stdlib package that raises a warning on install and only installs an empty package, because it raises a helpful warning on relevant versions, minimizes most of the costs of the status quo (network bandwidth, local disk space), and avoids upper-cap backsolving while not breaking any working use cases or affecting non-stdlib Python versions at all, at only a small cost of install time.

CAM-Gerlach · May 25, 2023, 8:52am

As a followup, I programmatically investigated how many packages on PyPI have project names matching stdlib top-level package/module names. I used the following quick n’ dirty code:

import sys

import requests

stdlib_module_names_filtered = {name.strip("_") for name in sys.stdlib_module_names}
results = {name: requests.get(f"https://pypi.org/project/{name}/").status_code
           for name in stdlib_module_names_filtered}
pypi_packages = {name for name, status in results.items() if status == 200}
print("\n".join(sorted(list(pypi_packages))))

which produced a total of 59 project results (a combination of backports and mostly un-maintained squatters—I haven’t yet done a complete manual survey):

List of PyPI projects sharing names with stdlib modules

antigravity
argparse
ast
asyncio
blake2
calendar
chunk
configparser
contextvars
csv
ctypes
dataclasses
datetime
dis
distutils
elementtree
email
enum
functools
future
graphlib
hashlib
hmac
html
http
importlib
io
ipaddress
logging
mailbox
modulefinder
multiprocessing
nntplib
numbers
pathlib
pydecimal
pyio
readline
resource
secrets
select
selectors
sha1
sha256
sha3
shelve
signal
ssl
statistics
time
token
turtle
typing
unittest
uuid
wave
weakrefset
winapi
wsgiref

One notable example of the latter is the turtle PyPI package, which came up in the discussion of whether to allow someone maintaining nntplib to claim the otherwise-protected name on PyPI. It is an ancient, unmaintained package that has a totally different purpose to the stdlib one that only ever had two releases, 0.0.1 and 0.0.2 both on the same day in 2009, has its homepage URL now blocked as a malware site, and was only ever Python 2 (maybe even 2.6) compatible, yet has 40k downloads per month.

I’ve analyzed it in more detail in the case study that follows (uncollapse to view):

Case study of the `turtle` package

Given it is mostly just an empty placeholder anyway, cannot work with any Python version with the stdlib turtle installed, nor any newer than 2.7 (and potentially even 2.6), and only gets a few dozen downloads per day on the latter, implying that with the former it is basically infintesimal, it seems it should just be deleted per PEP 541 as a placeholder and the name reserved like the others. However, it is perhaps a useful case study to note, in case there are more such packages.

It’s worth nothing that for turtle, around 95-98% of downloads are on Python 3, for which the package will error out on import, with half a percent on Python 2 and around 3% unknown (which I’m guessing may also be mostly Python 2). This is substantially higher than other packages like requests, which is at around 93-94% Python 3, 3.5% unknown and 2.5% Python 2 (the proportions of which appear to be mostly dominated by containers, CIs, services, etc. rather than directly installed, being very deep in the stack), or at the other end of the spectrum, Spyder (an end user application directly installed by users, mostly via means other than pip) at around 85% 3, 10% Null and 5% 2.

Looking at Python minor version for turtle, 3.11 has a large plurality, at over 30% on average, with 3.10 taking 2nd at modestly over 20%, and 3.8 and 3.9 being tied for third at around 15% each, 3.7 at 5-10%, with the remaining being Python 2 and increasingly smaller amounts in earlier versions. By contrast, both requests and spyder follow the conventional Python adoption curve, of a normal distribution with a peak around Python 3.9 and trailing off to both sides. This, coupled with the high proportion of Windows users for turtle (60%) relative to spyder (around 35%) seems to suggest that this is likely a high proportion of confused beginners downloading Python for the first time (rather than, as some initially speculated, users trying to download turtle on Linux distros that didn’t ship it).

pf_moore · May 25, 2023, 9:48am

One (relatively minor?) disadvantage is that users installing with --only-binary or --prefer-binary will not see the sdist. It’s quite possible that at some stage, pip will switch to --only-binary by default (see here) and then we’d be back where we started.

barry · May 25, 2023, 4:46pm

What about the use case where modules are removed from the stdlib and end up with the canonical version back on PyPI?

dstufft · May 25, 2023, 5:01pm

Presumably they would just upload new versions that aren’t yanked or deleted with appropriate requires python metadata?

kknechtel · May 25, 2023, 5:15pm

I joined the forum because of issues related to that, so thank you for additional analysis.

Turtle will break on Windows if you uncheck the option in the installer to install IDLE and Tkinter, which is not going to be intuitive for a lot of beginners. While the people doing this are probably mostly beginners, I don’t think we can assume that they were expecting turtle to be third-party. A lot of them probably ran into an error before trying the download. (Granted, some of them will have tried naming their source file turtle.py; but the fact that this causes problems is also not ideal IMO.)

CAM-Gerlach · May 25, 2023, 7:50pm

Yeah, that’s a good point (and one that did cross my mind at one point). However, since as framed now this mostly just affects a bunch of existing packages, and it can be avoided in the future with requires-python upper-capped on all non-pre-release versions, if we implemented the solution now it should presumably get the problem mostly solved long before that becomes a factor—all it takes is a fraction of users to see it in order to report it to the maintainer to fix it. And even if that is implemented, it is still is strictly no worse than the status quo.

At least for things that aren’t already there as backports, stdlib names are blocked for new PyPI uploads for to protect against major security implications and user mistakes, and (given the nntplib discussion I linked above),

IMO they should remain that way, to avoid having to pick who gets ownership of the name, user confusion over whether the core team is still involved, the continuing security implications (and potential liability), and user mistakes installing them when they don’t mean to (and do their own due diligence like with any other package). Community members who decide to take on their own maintenance of a stdlib module after it is removed can simply publish it under a different name (and the Python docs can link to one, if deemed appropriate, as was done for nntplib).

fungi · May 25, 2023, 8:22pm

So how does this idea mesh with the prior discussions about how
upper bounds on requires-python metadata don’t really work the way
people expect and should be discouraged (or even generate warnings,
or maybe be ignored by the dep solver completely)?

CAM-Gerlach · May 25, 2023, 11:42pm

Linking the thread for others’ reference:

I [posted about it over there:

But for others’ reference, current thinking appears to be leaning toward just documenting that upper bounds shouldn’t be used and having them generate a warning, which doesn’t help this scenario at all and if anything is somewhat worse. If --only-binary is made the default and we don’t want a warning every time users try to install the package (not to mention attempting to backtrack through every version, which is already the case, and get the requires-python metadata correct from the start), then it seems we don’t have any real options here than either break all current users or live with this problem forever.

By contrast, Option 2 there would be close to the perfect fix for new backported packages over time, as all it would require is a post-release of the most recent release with a requires-python upper bound to produce the same result as our sdist solution (aside from a not quite as nice error message), except cleaner, simpler and compatible with --only-binary and non-Setuptools build backends.

CAM-Gerlach · May 27, 2023, 6:00am

Since it seems as I just posted about on the relevant nntplib thread that at least one exception to the standard PyPI policy was apparently made recently to hand over an existing stdlib name to a third party project, and this topic focuses specifically on PyPI projects that were superseded by standard library modules and I don’t want to drag things further off topic, I’ve opened a new thread in Packaging to discuss and hopefully come to consensus on a coherent policy on this “opposite” case, in collaboration with the PyPI admins involved:

Also, @dstufft any particular reason this is in Core Development and shouldn’t be moved to Packaging ? Seems like its much more of a packaging concern than something directly related to the development of those modules in CPython, and would attract more relevant interest being moved there, but there could be something I’m missing here.

dstufft · May 27, 2023, 7:48am

I was on the fence where to put it, and ended up here since its specifically about stdlib Python module names on PyPI so it felt slightly more likely that core devs would have an opinion. Im happy to have it moved to packaging if you think people might have more thoughts there.

I also just suspect nobody really feels strongly one way or the other

ppentchev · May 31, 2023, 4:43pm

Hi, and sorry if what I’m about to write is already well-known and taken into account.

There is another side to this: static checkers that parse Python source code. At least the mypy type checker, when presented with a Python source file that contains the following lines:

if sys.version_info >= (3, 11):                                                                                        
    import tomllib                                                                                                     
else:                                                                                                                  
    import tomli as tomllib

…will expect to be able to import the “tomli” module and examine its interface even if mypy itself is being run with Python 3.11. I assume that this is because mypy wants to make sure that my code will also work on earlier Python versions, but the fact remains that mypy expects the tomli module to be installable and to at least export all the symbols that my program tries to use, even on Python 3.11. One obvious workaround - mark the tomli import as # type: ignore[import] - would not really be very friendly to potential contributors who want to hack on this code using Python 3.10 - their run of mypy will not flag any incorrect use of tomli’s functions and methods.

Of course, “this is a mypy problem” is a valid answer; I just thought I’d make sure that people are aware of this aspect.

Thanks for all you people are doing for Python itself and for the ecosystem as a whole!

G’luck,
Peter

ppentchev · May 31, 2023, 4:47pm

…and I just realized that none of what I wrote makes sense if the module on PyPI has the exact same name as the one in the standard library. So if that’s what makes my point moot, okay, I can accept that.

G’luck,
Peter

Jelle · May 31, 2023, 5:43pm

I don’t think that’s true; mypy (1) never imports your code, it’s purely static and (2) understands the sys.version_info check, so it checking in 3.11+ mode it won’t care about the else branch.

Perhaps you have mypy’s python_version setting set to a lower version of Python, so mypy will type check your code for that version.

kknechtel · June 2, 2023, 1:17am

I just closed a question on Stack Overflow as a duplicate of this reference Q&A:

It seems that the basic setup in this case is:

The user installs typing for a more recent version of Python that doesn’t require it.
Some environments are in some way subtly incompatible with this backport; lower-level stuff (either read from a .pth file by site.py, or bootstrapped by Jupyter, etc.) will cause mysterious errors wherein stack traces end up at:
```
import typing
File "/path/to/site-packages/typing.py", line 1356, in 
class Callable(extra=collections_abc.Callable, metaclass=CallableMeta):
File "/path/to/python3.7/site-packages/typing.py", line 1004, in new
self._abc_registry = extra._abc_registry
AttributeError: type object 'Callable' has no attribute '_abc_registry'
```
That is to say, something or other imports the backport of typing rather than the standard library typing, but in a context that relies on implementation details (i.e., collections.abc.Callable having a _abc_registry attribute, where collections.abc seems to have been aliased as collections_abc).
According to user reports, newer versions of Pip cannot uninstall the package. (In the question I closed, it seems that OP can’t install basically anything with Pip, so maybe attempts to uninstall also run into this problem?)

I tried installing typing in a virtualenv based on 3.8, and found that it successfully installed (and uninstalled) 3.7.4.3 (the current version being 3.10.0). The current version demands a Python version before 3.5, but it seems like this requirement wasn’t applied to most earlier releases.

Has anyone here seen anything like this? Any particular insight? (Is it possible that some versions of Pip would break in this situation, but the most up-to-date ones are okay again?) I’d like to be able to improve the information available on Stack Overflow about the problem.

CAM-Gerlach · June 3, 2023, 5:59pm

Most of the folks who can actually do something about this don’t browse the #users category, and there’s already an active thread about this, so I’ve merged these posts with that one. Its particularly interesting to see cases where installing the backport apparently actually breaks things, beyond being a minor nuisance or a source of confusion.

In fact, browsing the #users section as I occasionally do, I noticed at least three instance in the past week of problems and general confusion directly due to these old backports:

@kknechtel 's post here linking the python - AttributeError: type object 'Callable' has no attribute '_abc_registry' - Stack Overflow SO post, where installing the typing backport unnecessarily can actually cause breakage in certain edge cases
The post Pip install hashlib help where the user attempted to install the ancient (<Python <2.5) hashlib backport, but fails to install on Python 3 due to incompatible syntax and results in a relatively cryptic error.
The post Conda Remove error on creation of new .exe file where a conda environment with the pathlib backport installed resulted Pyinstaller erroring out (with an apparently clear error, but the user was unsure of what to do about it). This also linked to a Stack Overflow post where a number of users had the same problem with that backport.

Seems there’s more motivation than I thought to do something about this…if the discussion trickles off here, I can move this to #packaging to get more feedback there.

mwichmann · June 13, 2023, 2:59pm

Ran across one of these just now - someone failed to install multiprocessing, which was “Python 2.5/2.4 back port of the multiprocessing package”. The error in this case was just like the hashlib one - an error due to syntax (print usage) that didn’t really make the user in question any wiser.

Any thought of just deleting some of these old and at the time well-meaning backports?

gpshead · June 13, 2023, 9:15pm

I just yanked the hashlib release on PyPI - I wonder what old dusty configs that is going to “break”.

It has been in the stdlib since Python 2.5 so anything with that still listed as a requirement is in serious need of maintenance.

vstinner · June 13, 2023, 10:29pm

FYI pip is no longer usable on Python 2.6 since it doesn’t implement crypto used by pypi.org TLS security. Maybe it’s even Python 2.7, I forgot the details.