PEP 541 - Should name squatting be actively discouraged?

Hi all,

Me and @jamadden are moderators on PyPI and as part of our work we deal with PEP 541 requests from users, many of them fall under the invalid project category of PEP 541 for name squatting:

project is name squatting (package has no functionality or is empty);

In my travels on the admin interface I’ve found many other instances sometimes the same user name squatting a long list of names.

Given the PEP is quite clear about this not being allowed I wonder if the overall packaging ecosystem, e.g. twine, warehouse and others, should do more to actively discourage this practice.

I realize the key to this issue is to be able to assess what constitutes a “functional” package. I’ve found examples of packages with a single bogus function or a chunk of copy/pasted code. But I do think we could do a better job on detecting very clear cases of empty packages and prevent them from being uploaded or maybe show a warning in PyPI/pip letting users know of this fact.

I’d love to hear some thoughts from the community about this.

Thanks.

1 Like

I think it would be appropriate to have an automated process for screening empty packages, but the ones that have code that wouldn’t qualify as being functional might be a bit more difficult (or are copy/pasted). Those cases would likely require manual review, and it would only be worthwhile to do so if that package name is significantly in demand or specifically requested by another author. Otherwise it could easily become an endless time sink.

I’d recommend starting with the easier and more clear cases first with automatically screening empty packages (if nothing is in place at the moment). From there, specific guidelines can be created that define what would qualify as a functional package. If a package author with a functional package (already in place or created) wants to request that name, a manual review would have to be done on the existing package to ensure that a legitimate package isn’t being replaced.

@aeros provided sensible suggestions.

Note that some things with little or no code such as https://pypi.org/project/anyreadline are not squatting names… but these are rare enough.

So I would be very much in favor of a few things:

  1. have tools (in particular warehouse) check for some patterns that would then trigger a warning and that would be something you could query for review

  2. as you mentioned, there are repeat offenders: these may be also easier to spot with some checks done when publishing on warehouse (and go in some review queue or query as above?)

In order to avoid someone (you) being in the way (and possibly overwhelmed) I would not make any such check blocking. A warning as you suggested would go a very long way. And later if such automated checks proves solid, the warning could become a publishing block?

Just to add into the discussion, I am (legitimately, I believe) squatting on a few names that are US/international trademarks of my employer. We might want to release packages under these names in future, but we certainly don’t want anyone else doing it. In some cases, we’ve had to use lawyers to get them back from more malicious owners.

Please make sure any checks and/or blocks take this scenario into account.

4 Likes

This is an important point, and while I’m not trying to say you’re expected to answer this question, do you have any suggestions on how we’d do that? If you’re saying that blanket blocking of name squatting isn’t possible because there are “legitimate” reasons for people to do it, then that may be a fair position to take. But the original post here linked to PEP 541 which, from my reading, doesn’t allow for any legitimate cases of name squatting (although as it’s an advisory document for a manual process, that’s fine in practice).

I’m fine with saying that this is hard or maybe even impossible to do right in an automatic process. My understanding of the original proposal was that it was about adding extra reporting and warnings to make it easier to manually address cases of name squatting. Maybe we should stick to that remit, at least for now.

At some point I think we will probably have to bite the bullet and be more explicit about the rules on name ownership on PyPI (in particular, nasty questions like which legal jurisdictions apply). There’s just too many people relying on PyPI and believing it provides greater assurances than it actually does. But I think that needs to be a proper debate, and not a side-thread on a relatively obscure technical discussion like this one. If that means we have to rely on a combination of warnings/reporting and manual management for now, then maybe that’s what we have to do.

One (possibly strained) reading of the PEP is that the whole document is only about conflict resolution.

Here’s my use case for squatting: I’m implementing a new library. I want to eventually release it (under the same name as its importable module). So I squat the name as a signal to the community that I’m planning to use it, knowing that if anyone disputes the name, they’ll automatically get it and I’ll need to rename my project. And that’s fine; there’s a balance between “The namespace shouldn’t be polluted by unfinished projects” and “it’s useful to have a project name before implementing it”.
Today, in practice, the balance is at “if someone wants the name enough to contact the moderators, they’re free to have it.”

I understand this places unnecessary work on the PyPI moderators, and the ones doing the work, they should have most say in the rules. But still, I’d be sad to lose a way to “weakly signal the intent to use a name”. In basically any project, there’s a period where the name is known, but there’s no usable code to release. In an open-source project, that name is even publicly known (and itself vulnerable to squatting by trolls).

Perhaps we need a better workflow for legitimate problems people now solve by squatting. Detecting useless packages could soon become an arms race.

3 Likes

I think this is the only reasonable outcome. Last time I read PEP 541 I didn’t feel like it was going to cause any problems for preserving trademarks, but that’s probably because it implied (or I assumed) that manual intervention would always be required.

As far as I can recall, the post directly above mine is the first suggestion of automatically blocking empty packages from even being published.

Perhaps if we one day make this an active block rather than a conflict resolution process (as Petr describes, and I like his description), we would also need a name reservation system that allows “this is a trademark I have responsibility for” as an option (possibly also “I intend to publish this package within 30 days” or “I believe there are security reasons this package name should not be allowed”?)

1 Like

PyPI already has a blocklist for such invalid names. Right now it mostly contains typosquats and stdlib names, but I don’t see any reason why we can’t use it for trademarked names as well (the block can always be removed).

1 Like

I’m quite happy with my current approach of pushing an empty package (e.g. microsoft, windows) with a more detailed description. If the name is already taken (e.g. xbox) then it has to go through conflict resolution anyway, so the blocklist won’t help.

I’d also be quite happy if a prominent banner was added “this package looks empty - here’s how to claim the name if you think you would use it better”, then let the resolution process go ahead as normal.

I would not be happy if I had to include “enough” code in the empty packages just to reserve the name, and I’d be less happy if the name simply showed as unavailable without the explanation/links/email I have there today.

All I’m arguing against is blocking uploads completely (suggested in a couple of posts), and warning the uploader is pretty pointless because they’ve already jumped through enough hoops to get this far they almost certainly know what they’re doing.

I’m sympathetic to the workload of the moderators though, and grateful for their work. I’m just not sure how to help lighten the load here.

1 Like

All I’m arguing against is blocking uploads completely (suggested in a couple of posts), and warning the uploader is pretty pointless because they’ve already jumped through enough hoops to get this far they almost certainly know what they’re doing.

While I agree completely blocking uploads is not ideal I do think repeat offenders should be considered for such a measure, particularly on instances where same user uploads multiple packages in a short period of time which I’ve seen instances of.

I disagree on your stance on warnings though, I think it would send a clearer message on PyPI’s stance on name squatting.

we would also need a name reservation system that allows “this is a trademark I have responsibility for” as an option (possibly also “I intend to publish this package within 30 days” or “I believe there are security reasons this package name should not be allowed”?)

I’d like to see explicit reservations of names in PyPI, maybe subject to moderation approval but definitely better than having someone like yourself coming up with a dummy package.

Temporary reservation of names also sounds interesting, maybe limited to small number per user to avoid mass reservations, though I can see it being more work coordinating warehouse and twine to block uploads for reserved names.

Yea, that’s my reading too. In cases where there’s a conflict, we’d get rid of the conflicting package w/o question if it’s clearly a name-squat.

Seconded, though my concern is an implementation one – we don’t have a good mechanism to warn on uploads. I guess twine could learn to print messages or we could drop an email to the maintainers of a package?

Aha! Another use case for user-flagging of names / packages. :slight_smile:

My impression is that the pypi team already struggles to keep up with PEP 541 requests in cases where someone is actively interested in the name (current list). So maybe it would be a bad idea to go out looking for more packages to add to the list?

OTOH if the idea is to provide maintainers better tools to respond to these requests when they come in, that seems like an obvious improvement.

As someone who initially suggested automatic blocking, I would consider something with an adequately detailed description to not be considered empty for upload blocking purposes. I’m not certain as to what would qualify as being “adequately detailed” though.

Optimally, use cases such as yours where authors are publishing a package without code, but are including a description of why they are claiming that name would not be blocked. Requiring a description could potentially help with conflict resolution as well, particularly in cases where trademarks are involved.

1 Like