Handling modules on PyPI that are now in the standard library?

A number of modules lived on PyPI prior to being added to the standard library, or they have backports that are no longer relevant. Unfortunately these tend to garner a fair amount of downloads for various reasons and can often cause people confusion when they’re used. This is particularly a problem in situations where the import name in the PyPI package is the same name as is used in the standard library, since the standard library will typically shadow the installed package [1].

In the backport cases, with modern packaging you can set requires-python at the start of the backport in order to prevent that project from being installed on versions where the package is already a part of the standard library, at least when the import names match.

However that leaves the cases of packages that were developed on PyPI first, or in older backports where the requires-python metadata wasn’t added, where people can still install these packages, but not actually use them because the stdlib takes precedence.

This outcome of people doing this tends to range from “harmless but silly” to "actively causing issues, but confusing everyone involved because there’s two versions of the same thing installed (PyPI and standard library) and a bunch of expectations are being violated all over the place.

So the question then becomes, what should the recommended approach be for projects in these situations? An obvious suggestion going forward is properly setting requires-python for all backports, but beyond that?

I see a few options:

  • Do nothing, let the projects just sit there unmaintained and just accept the occasional issue that crops up from the confusion.
  • Delete the files that have been uploaded, forcing everyone using it to stop using it even on those older (presumably no longer supported) versions of Python.
  • Utilize yanking to yank the files, allowing people to still pin to those versions (with a warning message) but otherwise act like those versions have been deleted.
    • This also comes with a yanked marker in the PyPI Web Interface.
  • Decide none of these work, and come up with something better and try and write a PEP and get it approved.

The behavior of the first two options are pretty easily understood, however the behavior of the yanked option can be a tad subtle, and it actually depends on what version of pip is being used to do the install, and how the thing is being referenced to be installed.

For pip 22+, the behavior is basically:

  1. If the version specifier is NOT == or ===, then ignore yanked versions.
  2. If the version specifier is == or === and a yanked file is the only available file, then use it but print a warning (that can include a message to the user from the package) otherwise use the unyanked version.

For pip prior to 22.0, the behavior is basically:

  1. Prefer unyanked versions over yanked versions, assuming they both match the version specifier.
  2. If the only versions matching the specifier (regardless of what it is, even no specifier) is a yanked version, then use it but print a warning (that can include a message from the package).

To see the output, see:

pip 22+, not using == or ===

❯ pip download 'pip>=21.2,<21.2.1'
ERROR: Could not find a version that satisfies the requirement pip<21.2.1,>=21.2 (from versions: 0.2, 0.2.1, 0.3, 0.3.1, 0.4, 0.5, 0.5.1, 0.6, 0.6.1, 0.6.2, 0.6.3, 0.7, 0.7.1, 0.7.2, 0.8, 0.8.1, 0.8.2, 0.8.3, 1.0, 1.0.1, 1.0.2, 1.1, 1.2, 1.2.1, 1.3, 1.3.1, 1.4, 1.4.1, 1.5, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.5.5, 1.5.6, 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.1.0, 6.1.1, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.1.0, 7.1.1, 7.1.2, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.1.0, 8.1.1, 8.1.2, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 10.0.0b1, 10.0.0b2, 10.0.0, 10.0.1, 18.0, 18.1, 19.0, 19.0.1, 19.0.2, 19.0.3, 19.1, 19.1.1, 19.2, 19.2.1, 19.2.2, 19.2.3, 19.3, 19.3.1, 20.0, 20.0.1, 20.0.2, 20.1b1, 20.1, 20.1.1, 20.2b1, 20.2, 20.2.1, 20.2.2, 20.2.3, 20.2.4, 20.3b1, 20.3, 20.3.1, 20.3.2, 20.3.3, 20.3.4, 21.0, 21.0.1, 21.1, 21.1.1, 21.1.2, 21.1.3, 21.2, 21.2.1, 21.2.2, 21.2.3, 21.2.4, 21.3, 21.3.1, 22.0, 22.0.1, 22.0.2, 22.0.3, 22.0.4, 22.1b1, 22.1, 22.1.1, 22.1.2, 22.2, 22.2.1, 22.2.2, 22.3, 22.3.1, 23.0, 23.0.1, 23.1, 23.1.1, 23.1.2)
ERROR: No matching distribution found for pip<21.2.1,>=21.2

pip 22+, using == or ===

❯ pip download 'pip==21.2'
Collecting pip==21.2
  Downloading pip-21.2-py3-none-any.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 34.3 MB/s eta 0:00:00
WARNING: The candidate selected for download or install is a yanked version: 'pip' candidate (version 21.2 at https://files.pythonhosted.org/packages/03/0f/b125bfdd145c1d018d75ce87603e7e9ff2416e742c71b5ac7deba13ca699/pip-21.2-py3-none-any.whl (from https://pypi.org/simple/pip/) (requires-python:>=3.6))
Reason for being yanked: See https://github.com/pypa/pip/issues/8711
Saved ./pip-21.2-py3-none-any.whl
Successfully downloaded pip

pip < 22, using anything

$ pip download 'pip>=21.2,<21.2.1'
Collecting pip<21.2.1,>=21.2
  Downloading pip-21.2-py3-none-any.whl (1.6 MB)
     |████████████████████████████████| 1.6 MB 6.4 MB/s
WARNING: The candidate selected for download or install is a yanked version: 'pip' candidate (version 21.2 at https://files.pythonhosted.org/packages/03/0f/b125bfdd145c1d018d75ce87603e7e9ff2416e742c71b5ac7deba13ca699/pip-21.2-py3-none-any.whl#sha256=71f447dff669d8e2f72b880e3d7ddea2c85cfeba0d14f3307f66fc40ff755176 (from https://pypi.org/simple/pip/) (requires-python:>=3.6))
Reason for being yanked: See https://github.com/pypa/pip/issues/8711
Saved ./pip-21.2-py3-none-any.whl
Successfully downloaded pip

I don’t personally have a preference, but others might! If a strong consensus doesn’t form, most likely it makes sense to just accept the status quo and let “do nothing” stand.


  1. But not always! Users are ever inventive in finding ways to do unexpected things, and can get the PyPI version installed so it shadows the standard library. ↩︎

4 Likes

Dustin’s great summary already covers most of the relevant details, thanks.

For reference, here’s a few relevant examples of backports that were brought up:

Name Links Stdlib Backport Maintained Wheel Uppercap Downloads/mo.
argparse pypistats pepy 2.7/3.2 2.3/3.1 2015 Yes No 6.7 M
asyncio pypistats pepy 3.4 3.3 2015 Yes No 2.1 M
typing pypistats pepy 3.5 2.7/3.4 2019/2021 Yes Latest 7.9 M

Observations:

  • The latest version of typing is uppercapped (unlike the other two), but while this is likely to change in the near future, pip just backsolves to older ones. You can see the effect of this in the pepy stats—the previous non-uppercapped version shows a proportion of downloads (≈38%, vs. 52% for the latest, and 10% for older versions) nearly exactly consistent with the total percentage of downloads by Python versions would be affected by the upper cap (42%) minus the percentage of those latter downloads (10%, or 4 percentage points) separately pinned or constrained to other older versions.
  • Something else interesting is going one with typing: the proportions of users on older Python versions, particularly 2.7, is far higher for typing than for the others. Given the version usage patterns, it seems probably it is some specific set of CI services or similar.
  • By contrast, an very large majority of asyncio downloads (≈>95%) and overwhelming majority of argparse downloads (>98%) were for versions in which the stdlib version is present, indicating they are almost certainly by mistake. The distribution of Python versions and OSes, however, was generally consistent with most installations being driven by deep-in-the-stack dependencies and automated processes rather than users manually installing such by mistake.

In any case In addition to those above, there’s two other options to handle this:

  • Release a new, sdist-only version lower-capped to only Python versions on which the module is in the stdlib, and that just raises an informative error when installing
  • Do the above, except making it a warning rather than an error and still installing as normal
    • Or, perhaps better, it could just install an empty import package

My personal conclusions:

  • Writing, discussing, deciding and implementing a PEP:
    • Seems like a large amount of work for relatively little practical gain
    • Would likely take a substantial amount of time to see any of that benefit
  • Yanking seems strictly superior to deleting, as it:
    • Behaves similarly on newer pip versions (which people will progressively upgrade to)
    • Allows raising an informative, customizable warning on older pip versions Python versions where it actually might be desired) without erroring
    • Even older pip versions (that don’t support yanking) are unaffected, which are least important here
    • Preserves history
    • Clearly marks the versions as yanked on PyPI
    • Breaks valid (old Python) unpinned use cases but allows unbreaking them
  • Releasing a new stdlib-version-only sdist that errors on install seems in turn mostly superior to yanking, as relative to that, it:
    • Shows a more helpful warning on recent pip versions
    • Has consistent behavior on older pip versions (and other installers)
    • Only affects Python versions that don’t have the stdlib module
    • Avoids backsolving for earlier versions that are upper capped
    • Downside: Older pip versions will also error rather than warn
  • The status quo seems generally preferable to releasing a stdlib-only sdist that errors, as it:
    • Will immediately break users of libraries or applications that inadvertently depend on these packages
    • Many of those users won’t be able to do much about it except flood the package developers with issues
    • It might never get fixed, as many such packages may be unmaintained, and thus permanently broken on all Python versions
    • The practical benefits/status quo harms seems relatively small, at least compared to the large breakage
  • Releasing a new stdlib-version-only sdist that warns but doesn’t error seems generally superior to the status quo, as it has all the advantages of erroring, and in addition:
    • Doesn’t actually break anything, avoiding the problem above
    • Still provides an informative warning of the problem likely to reach developers
    • Minor downside relative to status quo: a relatively small-ish increase in install time for users on newer Python versions
  • Best of all, IMO, is releasing a sdist that warns and installs an empty package, as it is strictly superior to the previous given it has all the same advantages but also:
    • Minimizes the bandwidth and space costs of the status quo
    • Only breaks in highly unlikely pathlogical cases (that are probably broken, perhaps more subtly, anyway)
    • Reduces (though not eliminates) the one downside of the previous (modestly increased install time)

So, TL;DR, I suggest for these packages we release a new sdist-only version lower-capped to the Python versions that contain the stdlib package that raises a warning on install and only installs an empty package, because it raises a helpful warning on relevant versions, minimizes most of the costs of the status quo (network bandwidth, local disk space), and avoids upper-cap backsolving while not breaking any working use cases or affecting non-stdlib Python versions at all, at only a small cost of install time.

As a followup, I programmatically investigated how many packages on PyPI have project names matching stdlib top-level package/module names. I used the following quick n’ dirty code:

import sys

import requests

stdlib_module_names_filtered = {name.strip("_") for name in sys.stdlib_module_names}
results = {name: requests.get(f"https://pypi.org/project/{name}/").status_code
           for name in stdlib_module_names_filtered}
pypi_packages = {name for name, status in results.items() if status == 200}
print("\n".join(sorted(list(pypi_packages))))

which produced a total of 59 project results (a combination of backports and mostly un-maintained squatters—I haven’t yet done a complete manual survey):

List of PyPI projects sharing names with stdlib modules
antigravity
argparse
ast
asyncio
blake2
calendar
chunk
configparser
contextvars
csv
ctypes
dataclasses
datetime
dis
distutils
elementtree
email
enum
functools
future
graphlib
hashlib
hmac
html
http
importlib
io
ipaddress
logging
mailbox
modulefinder
multiprocessing
nntplib
numbers
pathlib
pydecimal
pyio
readline
resource
secrets
select
selectors
sha1
sha256
sha3
shelve
signal
ssl
statistics
time
token
turtle
typing
unittest
uuid
wave
weakrefset
winapi
wsgiref

One notable example of the latter is the turtle PyPI package, which came up in the discussion of whether to allow someone maintaining nntplib to claim the otherwise-protected name on PyPI. It is an ancient, unmaintained package that has a totally different purpose to the stdlib one that only ever had two releases, 0.0.1 and 0.0.2 both on the same day in 2009, has its homepage URL now blocked as a malware site, and was only ever Python 2 (maybe even 2.6) compatible, yet has 40k downloads per month.

I’ve analyzed it in more detail in the case study that follows (uncollapse to view):

Case study of the `turtle` package

Given it is mostly just an empty placeholder anyway, cannot work with any Python version with the stdlib turtle installed, nor any newer than 2.7 (and potentially even 2.6), and only gets a few dozen downloads per day on the latter, implying that with the former it is basically infintesimal, it seems it should just be deleted per PEP 541 as a placeholder and the name reserved like the others. However, it is perhaps a useful case study to note, in case there are more such packages.

It’s worth nothing that for turtle, around 95-98% of downloads are on Python 3, for which the package will error out on import, with half a percent on Python 2 and around 3% unknown (which I’m guessing may also be mostly Python 2). This is substantially higher than other packages like requests, which is at around 93-94% Python 3, 3.5% unknown and 2.5% Python 2 (the proportions of which appear to be mostly dominated by containers, CIs, services, etc. rather than directly installed, being very deep in the stack), or at the other end of the spectrum, Spyder (an end user application directly installed by users, mostly via means other than pip) at around 85% 3, 10% Null and 5% 2.

Looking at Python minor version for turtle, 3.11 has a large plurality, at over 30% on average, with 3.10 taking 2nd at modestly over 20%, and 3.8 and 3.9 being tied for third at around 15% each, 3.7 at 5-10%, with the remaining being Python 2 and increasingly smaller amounts in earlier versions. By contrast, both requests and spyder follow the conventional Python adoption curve, of a normal distribution with a peak around Python 3.9 and trailing off to both sides. This, coupled with the high proportion of Windows users for turtle (60%) relative to spyder (around 35%) seems to suggest that this is likely a high proportion of confused beginners downloading Python for the first time (rather than, as some initially speculated, users trying to download turtle on Linux distros that didn’t ship it).

2 Likes

One (relatively minor?) disadvantage is that users installing with --only-binary or --prefer-binary will not see the sdist. It’s quite possible that at some stage, pip will switch to --only-binary by default (see here) and then we’d be back where we started.

What about the use case where modules are removed from the stdlib and end up with the canonical version back on PyPI?

Presumably they would just upload new versions that aren’t yanked or deleted with appropriate requires python metadata?

I joined the forum because of issues related to that, so thank you for additional analysis.

Turtle will break on Windows if you uncheck the option in the installer to install IDLE and Tkinter, which is not going to be intuitive for a lot of beginners. While the people doing this are probably mostly beginners, I don’t think we can assume that they were expecting turtle to be third-party. A lot of them probably ran into an error before trying the download. (Granted, some of them will have tried naming their source file turtle.py; but the fact that this causes problems is also not ideal IMO.)

Yeah, that’s a good point (and one that did cross my mind at one point). However, since as framed now this mostly just affects a bunch of existing packages, and it can be avoided in the future with requires-python upper-capped on all non-pre-release versions, if we implemented the solution now it should presumably get the problem mostly solved long before that becomes a factor—all it takes is a fraction of users to see it in order to report it to the maintainer to fix it. And even if that is implemented, it is still is strictly no worse than the status quo.

At least for things that aren’t already there as backports, stdlib names are blocked for new PyPI uploads for to protect against major security implications and user mistakes, and (given the nntplib discussion I linked above),

IMO they should remain that way, to avoid having to pick who gets ownership of the name, user confusion over whether the core team is still involved, the continuing security implications (and potential liability), and user mistakes installing them when they don’t mean to (and do their own due diligence like with any other package). Community members who decide to take on their own maintenance of a stdlib module after it is removed can simply publish it under a different name (and the Python docs can link to one, if deemed appropriate, as was done for nntplib).

So how does this idea mesh with the prior discussions about how
upper bounds on requires-python metadata don’t really work the way
people expect and should be discouraged (or even generate warnings,
or maybe be ignored by the dep solver completely)?

Linking the thread for others’ reference:

I [posted about it over there:

But for others’ reference, current thinking appears to be leaning toward just documenting that upper bounds shouldn’t be used and having them generate a warning, which doesn’t help this scenario at all and if anything is somewhat worse. If --only-binary is made the default and we don’t want a warning every time users try to install the package (not to mention attempting to backtrack through every version, which is already the case, and get the requires-python metadata correct from the start), then it seems we don’t have any real options here than either break all current users or live with this problem forever.

By contrast, Option 2 there would be close to the perfect fix for new backported packages over time, as all it would require is a post-release of the most recent release with a requires-python upper bound to produce the same result as our sdist solution (aside from a not quite as nice error message), except cleaner, simpler and compatible with --only-binary and non-Setuptools build backends.

Since it seems as I just posted about on the relevant nntplib thread that at least one exception to the standard PyPI policy was apparently made recently to hand over an existing stdlib name to a third party project, and this topic focuses specifically on PyPI projects that were superseded by standard library modules and I don’t want to drag things further off topic, I’ve opened a new thread in #packaging to discuss and hopefully come to consensus on a coherent policy on this “opposite” case, in collaboration with the PyPI admins involved:

Also, @dstufft any particular reason this is in #core-dev and shouldn’t be moved to #packaging ? Seems like its much more of a packaging concern than something directly related to the development of those modules in CPython, and would attract more relevant interest being moved there, but there could be something I’m missing here.

I was on the fence where to put it, and ended up here since its specifically about stdlib Python module names on PyPI so it felt slightly more likely that core devs would have an opinion. Im happy to have it moved to packaging if you think people might have more thoughts there.

I also just suspect nobody really feels strongly one way or the other :slight_smile: