PyPI policy on handing over protected standard library names to third party maintainers

Up until now, as far as I’m aware names of modules and import packages in the Python standard library are not permitted for new PyPI uploads, in order to protect against the obvious security implications and user mistakes. However, with the PEP 594 mass deprecation and removal of obsolescent stdlib modules, questions have arisen on how to handle third party requests to claim these protected names in order to publish their own standalone versions of these modules, including:

  • Allow nntplib on PyPI, where a third party developer maintaining a standalone version of the nntplib standard library requested ownership of the name be transferred to them. Most community members appeared to not support this and raised a number of significant concerns about transferring the name, some specific to the particular case (notably, the project’s development practices and the maintainer’s discouragingly combative responses to questions and critique), but most focused on the broader issues that this would entail for the removed modules in general. Despite this, while I can’t find any public mention or discussion on the thread, the PyPI GitHub org, or googling it, it seems the name was in fact later transferred to the maintainer in question.
  • Handling modules on PyPI that are now in the standard library? discusses the issues with and potential solutions to the inverse case, backport and unrelated/squatter PyPI projects sharing the same names as current stdlib modules still seeing a very high amount of use, years to even over a decade after all supported Python versions include the module [1]. The thread both highlighted how widespread inadvertent installation of such packages is, and also discussed the concerns of allowing such for removed modules more directly.
  • PEP 594 has been implemented: Python 3.13 removes 20 stdlib modules presents a detailed summary of the PEP 594 removals and a path forward for modules that the community is still interested in, which both included questions and concerns about the PyPI naming, and also some ideas for how to direct people to third-party replacements without just handing someone the official name.

While the potential harms in the extent example case of nntplib are somewhat more limited, given it the module is for a relatively niche, obsolescent (though still in-use) protocol, other PEP 594 modules like cgi that see far more continued use are a whole different story, and we should be consistent and fair to the other maintainers involved who’ve already moved to their own non-conflicting names. Therefore, I believe we really should discuss and decide on a coherent policy in collaboration with the PyPI folks (e.g. @dustin , @EWDurbin and @dstufft ) as soon as practical, before this becomes a larger problem.

Some of the issues with handing out the stdlib names to third-party maintainers include

  • The PyPI admins will have to pick and choose who gets permanent ownership of the name based on very limited initial information, which may change over time.
  • Using the original, canonical name could cause user confusion over whether the package is official or supported by the core team rather than third party
  • There are continuing security (and potential liability) implications with handing out well-known, “official” stdlib names to arbitrary third party maintainers if either any maintainer (or their successors) provide untrustworthy, or their accounts are compromised
  • A high volume of devs and users that are likely to attempt to install the package simply based on its name alone (from error messages, or recognition) rather than its quality, with a lower chance of doing their own due diligence like with any other third party PyPI package

By contrast, the downsides of not allowing such are quite limited. As the import module/package name can stay the same, and only the PyPI project name change, it is still just as much a drop-in replacement. Developers and users will still need to actively install/add a dependency on the new PyPI package, they just can’t simply assume that the package with the “official” name of the stdlib module is the “canonical” one they should use and trust. Instead, this would ensure they perform the same due diligence as with any other third-party PyPI package they choose to install/depend on.

Additionally, in the near term to ease migration, the deprecation notices and What’s New announcement of the deprecation/removal can suggest one or more suitable PyPI replacements, if available (as done for nntplib), which doesn’t require committing to one “blessed” successor in near-perpetuity.

Therefore, I would personally strongly recommend that the names of modules that are removed continue to be protected for at least the immediate future (and the nntplib name reclaimed as soon as practical to minimize disruption, with the maintainer given sufficient time to move to a new unique name). Should a well-supported popular community-maintained successor to one or more of these modules emerge that retains significant interest over time, and compelling reasons are found to use the stdlib name, then this can always be revisited later.


  1. For example, 6.8 million downloads/mo. for argparse, added in Python 2.7 in 2010, or 36k downloads/mo.—>96-98% on Python versions >2.7 for which it will error on install due to using the print statement in the setup.py script—for turtle, a quasi-placeholder unrelated to the stdlib module uploaded on one day in 2009 and never touched again ↩︎

7 Likes

I wouldn’t expect anything on PyPI to be “official” or “blessed”, unless I can see it’s been published by an “official” or “blessed” source.

If I want to use something from PyPI, I have to evaluate whether I trust the publisher based on who they are, not based on the package name.

If I need oldlib, and it’s only available on PyPI as oldlib-legacy, or python-oldlib, or py-oldlib, the longer name doesn’t necessarily make me feel as if it’s any less secure or trustworthy that just oldlib.

If anything, having to find a similar, but slightly different name, that’s making things harder for me. Also maybe oldlib2 has been packaged with a different prefix or suffix, and oldlib3 different again.

Once I decide to use a replacement, whether it’s called oldlib or python-oldlib, in both cases there remains a risk the developer replaces it with something bad. But I’ve already decided to trust it, so the actual name is irrelevant, and I need to ensure I can still trust updates.

2 Likes

Right, of course. The problem is that many, many users simply won’t understand all of this when they are typing pip install name or adding name to their dependencies/requirements list. As a real-world example, thirty-six thousand people each month see references to the turtle module, or get an error trying to import it, and jump to pip install turtle without so much as skimming the PyPI page to see that it is a totally different package with a totally different purpose than what they are expecting.

They think they understand what they are getting because they see lots of references to the stdlib package/module name in official, trusted sources or popular sources, giving the name a substantial amount of social currency, or code they trust attempts to import the stdlib module or gets a ModuleNotFound error, and their default response is to attempt to install the PyPI project with that name.

However, unlike with most PyPI projects, the actual package that they get is not, in fact, the same project widely referenced by that name, but a third-party fork owned by someone else and potentially containing different (or even malicious) code. This is a classic case of dependency confusion, which both creates difficulties and frustration for users, and can lead to dependency confusion attacks, one of the most dangerous and increasingly common attack vectors on PyPI and similar FOSS repositories today (with the torchtriton incident prompting PEP 708 being just one of the most recent examples).

True, the fraction of project which have distinct names for distribution packages vs. import packages (e.g. scikit-learn vs. sklearn, or in your case pillow vs. PIL), and those which have package names on different indices (e.g. PyTorch) have have the latter part of the problem (attempting to naively install the import module from PyPI), though not the former as the actual PyPI project name is widely documented and used online.

However, this has been mostly mitigated by avoiding that, adding dummy packages that fail to install with a clear error at those names (or the existing package being deleted, as with PIL), or preventing them from being registered at the PyPI level (with more sophisticated solutions like PEP 708 and others in the works), so I’m proposing we do the latter here, to reflect that rather than regressing on that and contributing to the problem further.

Also true, that would mean that the import name wouldn’t match the distribution name for the “one true fork”, but that would still be the case for any other forks regardless, and validly reflects the fact that there is no perpetual “one true fork” and that none of these projects are the original module developed and controlled by the core devs.

Yes, it does require that each user spend a bit of time looking at the alternatives and determining which one they trust and fits their needs, assuming they don’t just go with the suggested replacement(s) in the deprecation notice or the What’s New (which I am in favor of, to ease migration). But they should do this anyway, even if one fork did happen to have the original name; protecting the latter from registration ensures that they have to do at least the minimal diligence of searching and selecting an option themselves. Sure, they still might select the wrong option, but at least they are prompted to make an affirmative choice rather than relying on (and re-enforcing) a common assumption.

2 Likes

I tend to agree with @CAM-Gerlach. I expect that anyone posting on here knows enough not to blindly trust pip install cgi, but given how widely used Python is, it’s easy to imagine that plenty of people will jump from import cgi to pip install cgi.

2 Likes

I know this might be asking a lot of the Steering Council, but given that these module names are of great import, perhaps they should appoint the maintainers who would control them.

1 Like

I’m not a SC member, but the impression I got from the previous PEP 594 discussion and SC comments surrounding it was that as the modules are no longer supported by the CPython and the Python core developers, it should be up to the community to self-organize and maintain one or more PyPI versions of the module in question if there’s sufficient interest in doing so, and the SC/core dev team needn’t and shouldn’t be officially involved. For example, it was initially part of the PEP proposal to maintain a separate official repo for the deprecated modules (as some suggested here), but that was later rejected. It seems to me that formally appointing a new maintainer to control the name would invoke similar concerns.

And in any case, that involves essentially the same issues as described above (having to pick a winner, confusion over whether the package is “official” or supported by the core team, long term security and maintenance implications, accidental installations, etc) for putting the PyPI admins in that position, just with the SC in its place. By contrast, it seems to me that can be avoided by just keeping the names reserved, and letting the community interested in and responsible for the old module sort things out.

@brettcannon , as both a SC member and the current author of PEP 594, and also involved in discussions concerning the stdlib’s future, we’d love to hear your thoughts here!

I recommend a blanket ban on the stdlib names and encourage backports to publish under suitable names such as “backport-nntplib” etc

5 Likes

Given how core Python works it might be easier to pitch as just having the core team control these names, rather than a set of individuals.

I think the proposal would basically be to have the names of standard library modules reserved by the core team, and having users request their release if a module gets taken out.

The main question this brings, I think, is if the core should actually be the party responsible for this, as opposed to the PyPI security team, for example.

Personally, I don’t really care much about who is doing it, as long as someone is (and is a trusted party, naturally).

Could you share your reasons for this? It seems to me that it would be best to first discuss and decide on whether we want to be handing out stdlib names to third-party maintainers in the first place before getting into the weeds of who would be responsible for doing so. The central thesis of the OP (and several others here) is that the potential benefits of doing so are outweighed by the costs, and the alternative and de-facto status quo—third party community backports selecting their own third-party PyPI names—should continue instead, so it would be helpful to understand how you see things differently. Thanks!