Handling modules on PyPI that are now in the standard library?

As a followup, I programmatically investigated how many packages on PyPI have project names matching stdlib top-level package/module names. I used the following quick n’ dirty code:

import sys

import requests

stdlib_module_names_filtered = {name.strip("_") for name in sys.stdlib_module_names}
results = {name: requests.get(f"https://pypi.org/project/{name}/").status_code
           for name in stdlib_module_names_filtered}
pypi_packages = {name for name, status in results.items() if status == 200}
print("\n".join(sorted(list(pypi_packages))))

which produced a total of 59 project results (a combination of backports and mostly un-maintained squatters—I haven’t yet done a complete manual survey):

List of PyPI projects sharing names with stdlib modules
antigravity
argparse
ast
asyncio
blake2
calendar
chunk
configparser
contextvars
csv
ctypes
dataclasses
datetime
dis
distutils
elementtree
email
enum
functools
future
graphlib
hashlib
hmac
html
http
importlib
io
ipaddress
logging
mailbox
modulefinder
multiprocessing
nntplib
numbers
pathlib
pydecimal
pyio
readline
resource
secrets
select
selectors
sha1
sha256
sha3
shelve
signal
ssl
statistics
time
token
turtle
typing
unittest
uuid
wave
weakrefset
winapi
wsgiref

One notable example of the latter is the turtle PyPI package, which came up in the discussion of whether to allow someone maintaining nntplib to claim the otherwise-protected name on PyPI. It is an ancient, unmaintained package that has a totally different purpose to the stdlib one that only ever had two releases, 0.0.1 and 0.0.2 both on the same day in 2009, has its homepage URL now blocked as a malware site, and was only ever Python 2 (maybe even 2.6) compatible, yet has 40k downloads per month.

I’ve analyzed it in more detail in the case study that follows (uncollapse to view):

Case study of the `turtle` package

Given it is mostly just an empty placeholder anyway, cannot work with any Python version with the stdlib turtle installed, nor any newer than 2.7 (and potentially even 2.6), and only gets a few dozen downloads per day on the latter, implying that with the former it is basically infintesimal, it seems it should just be deleted per PEP 541 as a placeholder and the name reserved like the others. However, it is perhaps a useful case study to note, in case there are more such packages.

It’s worth nothing that for turtle, around 95-98% of downloads are on Python 3, for which the package will error out on import, with half a percent on Python 2 and around 3% unknown (which I’m guessing may also be mostly Python 2). This is substantially higher than other packages like requests, which is at around 93-94% Python 3, 3.5% unknown and 2.5% Python 2 (the proportions of which appear to be mostly dominated by containers, CIs, services, etc. rather than directly installed, being very deep in the stack), or at the other end of the spectrum, Spyder (an end user application directly installed by users, mostly via means other than pip) at around 85% 3, 10% Null and 5% 2.

Looking at Python minor version for turtle, 3.11 has a large plurality, at over 30% on average, with 3.10 taking 2nd at modestly over 20%, and 3.8 and 3.9 being tied for third at around 15% each, 3.7 at 5-10%, with the remaining being Python 2 and increasingly smaller amounts in earlier versions. By contrast, both requests and spyder follow the conventional Python adoption curve, of a normal distribution with a peak around Python 3.9 and trailing off to both sides. This, coupled with the high proportion of Windows users for turtle (60%) relative to spyder (around 35%) seems to suggest that this is likely a high proportion of confused beginners downloading Python for the first time (rather than, as some initially speculated, users trying to download turtle on Linux distros that didn’t ship it).

2 Likes

One (relatively minor?) disadvantage is that users installing with --only-binary or --prefer-binary will not see the sdist. It’s quite possible that at some stage, pip will switch to --only-binary by default (see here) and then we’d be back where we started.

What about the use case where modules are removed from the stdlib and end up with the canonical version back on PyPI?

Presumably they would just upload new versions that aren’t yanked or deleted with appropriate requires python metadata?

I joined the forum because of issues related to that, so thank you for additional analysis.

Turtle will break on Windows if you uncheck the option in the installer to install IDLE and Tkinter, which is not going to be intuitive for a lot of beginners. While the people doing this are probably mostly beginners, I don’t think we can assume that they were expecting turtle to be third-party. A lot of them probably ran into an error before trying the download. (Granted, some of them will have tried naming their source file turtle.py; but the fact that this causes problems is also not ideal IMO.)

Yeah, that’s a good point (and one that did cross my mind at one point). However, since as framed now this mostly just affects a bunch of existing packages, and it can be avoided in the future with requires-python upper-capped on all non-pre-release versions, if we implemented the solution now it should presumably get the problem mostly solved long before that becomes a factor—all it takes is a fraction of users to see it in order to report it to the maintainer to fix it. And even if that is implemented, it is still is strictly no worse than the status quo.

At least for things that aren’t already there as backports, stdlib names are blocked for new PyPI uploads for to protect against major security implications and user mistakes, and (given the nntplib discussion I linked above),

IMO they should remain that way, to avoid having to pick who gets ownership of the name, user confusion over whether the core team is still involved, the continuing security implications (and potential liability), and user mistakes installing them when they don’t mean to (and do their own due diligence like with any other package). Community members who decide to take on their own maintenance of a stdlib module after it is removed can simply publish it under a different name (and the Python docs can link to one, if deemed appropriate, as was done for nntplib).

So how does this idea mesh with the prior discussions about how
upper bounds on requires-python metadata don’t really work the way
people expect and should be discouraged (or even generate warnings,
or maybe be ignored by the dep solver completely)?

Linking the thread for others’ reference:

I [posted about it over there:

But for others’ reference, current thinking appears to be leaning toward just documenting that upper bounds shouldn’t be used and having them generate a warning, which doesn’t help this scenario at all and if anything is somewhat worse. If --only-binary is made the default and we don’t want a warning every time users try to install the package (not to mention attempting to backtrack through every version, which is already the case, and get the requires-python metadata correct from the start), then it seems we don’t have any real options here than either break all current users or live with this problem forever.

By contrast, Option 2 there would be close to the perfect fix for new backported packages over time, as all it would require is a post-release of the most recent release with a requires-python upper bound to produce the same result as our sdist solution (aside from a not quite as nice error message), except cleaner, simpler and compatible with --only-binary and non-Setuptools build backends.

Since it seems as I just posted about on the relevant nntplib thread that at least one exception to the standard PyPI policy was apparently made recently to hand over an existing stdlib name to a third party project, and this topic focuses specifically on PyPI projects that were superseded by standard library modules and I don’t want to drag things further off topic, I’ve opened a new thread in Packaging to discuss and hopefully come to consensus on a coherent policy on this “opposite” case, in collaboration with the PyPI admins involved:

Also, @dstufft any particular reason this is in Core Development and shouldn’t be moved to Packaging ? Seems like its much more of a packaging concern than something directly related to the development of those modules in CPython, and would attract more relevant interest being moved there, but there could be something I’m missing here.

I was on the fence where to put it, and ended up here since its specifically about stdlib Python module names on PyPI so it felt slightly more likely that core devs would have an opinion. Im happy to have it moved to packaging if you think people might have more thoughts there.

I also just suspect nobody really feels strongly one way or the other :slight_smile:

Hi, and sorry if what I’m about to write is already well-known and taken into account.

There is another side to this: static checkers that parse Python source code. At least the mypy type checker, when presented with a Python source file that contains the following lines:

if sys.version_info >= (3, 11):                                                                                        
    import tomllib                                                                                                     
else:                                                                                                                  
    import tomli as tomllib

…will expect to be able to import the “tomli” module and examine its interface even if mypy itself is being run with Python 3.11. I assume that this is because mypy wants to make sure that my code will also work on earlier Python versions, but the fact remains that mypy expects the tomli module to be installable and to at least export all the symbols that my program tries to use, even on Python 3.11. One obvious workaround - mark the tomli import as # type: ignore[import] - would not really be very friendly to potential contributors who want to hack on this code using Python 3.10 - their run of mypy will not flag any incorrect use of tomli’s functions and methods.

Of course, “this is a mypy problem” is a valid answer; I just thought I’d make sure that people are aware of this aspect.

Thanks for all you people are doing for Python itself and for the ecosystem as a whole!

G’luck,
Peter

…and I just realized that none of what I wrote makes sense if the module on PyPI has the exact same name as the one in the standard library. So if that’s what makes my point moot, okay, I can accept that.

G’luck,
Peter

I don’t think that’s true; mypy (1) never imports your code, it’s purely static and (2) understands the sys.version_info check, so it checking in 3.11+ mode it won’t care about the else branch.

Perhaps you have mypy’s python_version setting set to a lower version of Python, so mypy will type check your code for that version.

I just closed a question on Stack Overflow as a duplicate of this reference Q&A:

It seems that the basic setup in this case is:

  1. The user installs typing for a more recent version of Python that doesn’t require it.

  2. Some environments are in some way subtly incompatible with this backport; lower-level stuff (either read from a .pth file by site.py, or bootstrapped by Jupyter, etc.) will cause mysterious errors wherein stack traces end up at:

    import typing
    File "/path/to/site-packages/typing.py", line 1356, in 
    class Callable(extra=collections_abc.Callable, metaclass=CallableMeta):
    File "/path/to/python3.7/site-packages/typing.py", line 1004, in new
    self._abc_registry = extra._abc_registry
    AttributeError: type object 'Callable' has no attribute '_abc_registry'
    

    That is to say, something or other imports the backport of typing rather than the standard library typing, but in a context that relies on implementation details (i.e., collections.abc.Callable having a _abc_registry attribute, where collections.abc seems to have been aliased as collections_abc).

  3. According to user reports, newer versions of Pip cannot uninstall the package. (In the question I closed, it seems that OP can’t install basically anything with Pip, so maybe attempts to uninstall also run into this problem?)

I tried installing typing in a virtualenv based on 3.8, and found that it successfully installed (and uninstalled) 3.7.4.3 (the current version being 3.10.0). The current version demands a Python version before 3.5, but it seems like this requirement wasn’t applied to most earlier releases.

Has anyone here seen anything like this? Any particular insight? (Is it possible that some versions of Pip would break in this situation, but the most up-to-date ones are okay again?) I’d like to be able to improve the information available on Stack Overflow about the problem.

Most of the folks who can actually do something about this don’t browse the #users category, and there’s already an active thread about this, so I’ve merged these posts with that one. Its particularly interesting to see cases where installing the backport apparently actually breaks things, beyond being a minor nuisance or a source of confusion.

In fact, browsing the #users section as I occasionally do, I noticed at least three instance in the past week of problems and general confusion directly due to these old backports:

Seems there’s more motivation than I thought to do something about this…if the discussion trickles off here, I can move this to #packaging to get more feedback there.

1 Like

Ran across one of these just now - someone failed to install multiprocessing, which was “Python 2.5/2.4 back port of the multiprocessing package”. The error in this case was just like the hashlib one - an error due to syntax (print usage) that didn’t really make the user in question any wiser.

Any thought of just deleting some of these old and at the time well-meaning backports?

I just yanked the hashlib release on PyPI - I wonder what old dusty configs that is going to “break”.

It has been in the stdlib since Python 2.5 so anything with that still listed as a requirement is in serious need of maintenance.

FYI pip is no longer usable on Python 2.6 since it doesn’t implement crypto used by pypi.org TLS security. Maybe it’s even Python 2.7, I forgot the details.

1 Like

Yeah, for really old (<2.6) backports like multiprocessing, hashlib, etc., that see ≈<100,000 monthly downloads, the calculus above changes and yanking all releases (like @gpshead did) is probably the way to go (and is strictly superior to deleting, as discussed above), as the breakage is less, the benefit is more, and it’s the easiest option to implement (just click a button). (Newer or more widely used backports will need more thought, though, per the analysis above.)

2 Likes

Right. Um. Yeah. So it seems that I had, indeed, misunderstood the intent of specifying the Python version to mypy. I thought it was more like “…but bear in mind that this version of Python might also need to run that code, so do complain if I use newer features”, and now that I think about it, this kind of contradicts the other thing I thought mypy should do: “…but please also use the newer features of the currently-running version” :slight_smile:

So yeah, thanks for pointing this out. In the past two weeks since you wrote that I have already started dropping the python_version option from the mypy invocations in some of my projects, so, yeah. Thanks, and sorry for the noise. It seems that this is indeed not a valid concern WRT the actual topic.

G’luck,
Peter