PyPI as a Project repository vs. Name registry (a.k.a. PyPI namesquatting, e.g. for Fedora packages)

Hello,
As PyPI has become the de-facto canonical namespace for freely redistributable Python packages, I would like to encourage maintainers of packages in the Fedora Linux distro to synchronize their Distribution Package names with PyPI – effectively, to merge Fedora’s namespace into PyPI’s.

This has not yet been discussed in Fedora; I’m kicking off the discussion there as well. Here, I’d like to discuss the PyPI side of things.

While this discussion is ongoing, the names that Fedora uses but are not yet on PyPI have been kindly blocked by the admins, so that:

  • good faith actors are advised that the name is taken in some sense (though not in the PEP 541 sense), so using the name for a different purpose might confuse users, and
  • bad-faith actors can’t snatch the names that easily.

This will need to be reverted and replaced with a better solution, once we know what that is.
But let me frame the issue a bit more generally.

PyPI as a namespace vs. project repository

I feel that currently, PyPI has a bit of a split identity. Per PEP 541, it tries to be purely a repository of useful projects. However, PEP 518 uses names on PyPI as “markers” in pyproject.toml.
This puts us in a tough spot when we want to use tools that aren’t packaged on PyPI.
(Note: I’m not necessarily arguing that pyproject.toml tools are a good idea, but that reasonable people like PEP 517 authors view PyPI as a namespace.)

Why would a tool not be packaged on PyPI? For example, there bindings to software written in other languages, which use non-Python build systems (e.g. make; make install). One could write a PEP 517 build backend for such software, but in some cases it wouldn’t be very useful: for reasons like non-Python build-time dependencies, building wheels just sometimes doesn’t make sense. In Fedora, pyproject.toml handling could reasonably be implemented with as RPM macros, not a Python library.

It does make sense to have well-known names for projects that aren’t installable from PyPI. For example, If I’m querying system-wide installed from Python, I want to include my system package manager in install_requires. But for that package manager, “installable from PyPI” is more of a box to tick than a useful feature.

Blocking and squatting

There are several legitimate reasons to block certain names on PyPI.

The proper way to do that is to request PyPI admins to mark packages as unregistrable (which they will generally be happy to do, given good reason – but it takes their time). The “guerilla way” is name-squatting: registering an empty or uninstallable package. Note that current regulations prohibit name-squatting.

Some examples:

  • Parts of the Python standard library, e.g. math, are not registrable.
  • The trademark microsoft is currently name-squatted
  • The name ldap is currently name-squatted (by me) for reasons described on the PyPI page
  • The package manager dnf is currently name-squatted; AFAIK the buildsystem it uses doesn’t produce sdists/wheels.

My proposal

Personally, I actually find name squatting more user-friendly than blocking:

  • The explanation can be much more verbose than with marking the package non-registrable
  • It’s self-service: anyone can name-squat (though it’s discouraged), and conflict resolution only needs admin intervention if it escalates (as it does now)

Do you think it would be good to update PEP 541 to allow name-squatting, under some rules, for:

  • Packages installable from elsewhere (like from a Linux distro) that are Python-related (other Python projects might want to depend on them; they are build tools pyproject.toml), and
  • New, non-abandoned packages with a repo before their first release (see my rationale)?

Perhaps a register command could be brought back, so you could upload or edit the description without wasting a version number – but that’s a detail :‍)

Of course, another solution could work as well.

4 Likes

As the owner of the various Microsoft trademark [squatting packages], I’m obviously in favour of squatting for those purposes (we had to use lawyers to get a couple of the names). Being able to explain why a name is unavailable is very useful.

There are quite a few packages that have typosquatted themselves this way, e.g. piptools (which I just happened to run into last night).

It would be nice to be able to register the name and description without a real package, and possibly even add a friendly error to send back to installers requesting the package, but otherwise I’m quite happy with this approach.

2 Likes

I think adding some kind of dedicated support for “owning” a name without actually uploading a release for that would be a reasonable addition. There would be some open questions about how that got represented in our APIs like /simple/ but those are all solvable problems.

I don’t think we would allow you to claim a name for a reason that goes against the spirit of PEP 541 but I also think we can amend PEP 541 if needed, and that the things mentioned that are technically against PEP 541 would be fine to add to it. Roughly we just don’t want people locked out of names that don’t serve some kind of real purpose. Coordination of namespaces with downstream and upstream tooling or for legal reasons all seem like they are real purposes that can be served,

2 Likes

Below is my proposal, to go under Invalid projects in PEP 541. Does that look good?
PEP 541 calls for amendments to go through PSF General Counsel, but I think we should agree here before I start a formal approval process.

PEP 541 is marked Final, but it seems Active would be a better fit (per end of PEP Review & Resolution in PEP 1). Is Final→Active a valid transition?


Name reservations

Usually, a package that has no functionality or is empty is considered
“name squatting” and is invalid. As an exception, it is allowed to register
an empty project to reserve a name for:

  • a mistyped name of a popular project on PyPI (“typo-squatting”);
  • a project that is freely available from elsewhere and would otherwise be
    valid (for example: a project only installable by specific installers or
    package managers, or a part of the standard library of a Python
    implementation); or
  • a trademark or another name that would infringe the
    Intellectual property policy below if used as project name without
    the owner’s permission.

A name reservation project’s description must state reasons for the reservation
and include relevant links. It is recommended to use a low pre-release version
(e.g. 0.0.dev0) and to make the package not installable with pip
(e.g. by uploading a source distribution that fails to build
with an informative message).

Note that private projects should be hosted on a private package
index, and generally should not have a name reservation on the public
Package Index.

And the point in “Invalid projects” should be changed to:

  • project is name squatting (package has no functionality or is empty,
    except name reservations as described below);
2 Likes

That looks fine to me. What do you think @dustin @EWDurbin ?

I’m not 100% on the status, maybe @brettcannon knows.

I think typosquats should just be permanently banned (via the blocklist), but other than that this looks fine to me.

Maybe worth mentioning that the versions 0.0.dev0 and 0.dev0 are equivalent, so that suggestion could be slightly simplified.

I’m not on the PSF board so I have no idea about any status.

@brettcannon I think he was referring to this question about the PEP process:

Ah, then yes. :slight_smile: Process PEPs actually can’t be “final” since there is no explicit implementation that only happens once. Since things about a process are always happening, those PEPs should be marked as Active.

1 Like

I agree, I would prefer if we didn’t encourage empty uploads for this. However we don’t have a great process for a necessarily private request to add a name to the reserved set.

I’d say empty uploads for now and if/once we get a dedicated mechanism for it, then ask people to use that.

I think typosquats should just be permanently banned (via the blocklist), but other than that this looks fine to me.

My intention is to help PyPI admins by making name reservations self-service: rather than up-front review, they’d only require admin attention if they’re disputed. Let me know if that’s not the right direction.

Also, consider this case: I squat ldap (commonly typed when people mean python-ldap) which has historical releases of another project. pip will still install these if you pin the version. I’m afraid that if I ask admins to block the name, those releases would get deleted.


Another thing that occurred to me is making these packages searchable. This could also allow converting name reservations en-masse once there’s a better process in place. Do you think adding Trove classifiers is a good idea?


Proposal with classifiers:

Name reservations

Usually, a package that has no functionality or is empty is considered
“name squatting” and is invalid. As an exception, it is allowed to register
an empty project to reserve a name for:

  • a project that is freely available from elsewhere and would otherwise be
    valid (for example: a project only installable by specific installers or
    package managers, or a part of the standard library of a Python
    implementation); or
  • a trademark or another name that would infringe the
    Intellectual property policy below if used as project name without
    the owner’s permission.

A name reservation project’s description must state reasons for the reservation
and include relevant links.
It must also include the Trove classifier “Name Reservation :: External” or
Name Reservation :: Legal” corresponding to the reason for the reservation.
It is recommended to use a low pre-release version
(e.g. 0.0.dev0) and to make the package not installable with pip
(e.g. by uploading a source distribution that fails to build
with an informative message).

The classifier “Name Reservation :: Typo” is available to reserve mistyped names
of popular projects (“typo-squatting”).
Such projects are considered invalid and Package Index maintainers may
remove them without warning or discussion. (This will not necessarily make the name
available: If the Package Index maintainers agree with the reservation,
they may block the name for security reasons when they remove the project.)

Note that private projects should be hosted on a private package
index, and generally should not have a name reservation on the public
Package Index.

If and when a better process for name reservation is implemented, Package Index
maintainers may remove some or all packages marked with the Name Reservation
Trove classifiers and replace them by another form of reservation.

And the point in “Invalid projects” should be changed to:

  • project is name squatting (package has no functionality or is empty,
    except name reservations as described below)

And, of course, the classifiers need to be added.

I appreciate this but I think it’s actually easier for us to block a list of project names en masse rather than handle potential PEP 541 requests for the names in question.

You are correct, they would. I think in the case of “historical backwards-compatibility” squatting is acceptable.

No, I don’t believe this is necessary. This happens infrequently enough that I don’t think we need additional tooling/support around it.

Yes, I assume blocking is easy. But what about unblocking the packages as maintainers get ready to share the code on PyPI? At least in the “package in another collection/distro” case, the ideal would be to eventually unblock all the packages again, and that will not happen at once.
I also expect mass blocking to be rare, only done if/when a collection/distro first chooses to sync project names with PyPI.

I guess edge cases like this are adequately covered by PEP 541 requests being handled by humans. (Thank you!)


Removing the classifiers idea:

Name reservations

Usually, a package that has no functionality or is empty is considered
“name squatting” and is invalid. As an exception, it is allowed to register
an empty project to reserve a name for:

  • a project that is freely available from elsewhere and would otherwise be
    valid (for example: a project only installable by specific installers or
    package managers, or a part of the standard library of a Python
    implementation); or
  • a trademark or another name that would infringe the
    Intellectual property policy below if used as project name without
    the owner’s permission.

A name reservation project’s description must state reasons for the reservation
and include relevant links.
It is recommended to use a low pre-release version
(e.g. 0.dev0) and to make the package not installable with pip
(e.g. by uploading a source distribution that fails to build
with an informative message).

Projects that reserve mistyped names of popular projects
(“typo squatting”) are still considered invalid and Package Index
maintainers may remove them without warning or discussion.
(This will not necessarily make the name available: the
Package Index maintainers may block the name for security
reasons when they remove the project.)

Note that private projects should be hosted on a private package
index, and generally should not have a name reservation on the public
Package Index.

And the point in “Invalid projects” should be changed to:

  • project is name squatting (package has no functionality or is empty,
    except name reservations as described below);

I think this is still preferable over any potential for PEP 541 requests or additional user confusion.

1 Like

OK. Then, this change is not necessary at all! That’s certainly easier for me :‍)

However, I’d like all packages reserved for Fedora to go to the relevant maintainers as soon as they ask.

Sadly, I recently saw—a few weeks too late—that libiio’s maintainer heard the package name was taken, and worked around that by uploading under a different name.
How can I best help to ensure this doesn’t happen in the future?

(This is not to blame the individual PyPI admin who responded; he’s but an agent of a system that is not working as I would like.)

Unfortunately we have no way to tell who is a maintainer of the upstream project who should have the name and who isn’t. Perhaps going forward we can CC you on any similar requests?

That would be great. Thank you.

To clarify this, where do I send a list of project names I’d like blocked out (due to typosquatting risk)? And do you prefer the real names, or should I do the typos?

1 Like

Email admin@pypi.org. Send us the typo’d names if you can, I don’t have any way to generate common typos.

1 Like