PEP 541 - Should name squatting be actively discouraged?

yeraydiazdiaz · September 29, 2019, 5:18pm

Hi all,

Me and @jamadden are moderators on PyPI and as part of our work we deal with PEP 541 requests from users, many of them fall under the invalid project category of PEP 541 for name squatting:

project is name squatting (package has no functionality or is empty);

In my travels on the admin interface I’ve found many other instances sometimes the same user name squatting a long list of names.

Given the PEP is quite clear about this not being allowed I wonder if the overall packaging ecosystem, e.g. twine, warehouse and others, should do more to actively discourage this practice.

I realize the key to this issue is to be able to assess what constitutes a “functional” package. I’ve found examples of packages with a single bogus function or a chunk of copy/pasted code. But I do think we could do a better job on detecting very clear cases of empty packages and prevent them from being uploaded or maybe show a warning in PyPI/pip letting users know of this fact.

I’d love to hear some thoughts from the community about this.

Thanks.

aeros · September 29, 2019, 9:00pm

I think it would be appropriate to have an automated process for screening empty packages, but the ones that have code that wouldn’t qualify as being functional might be a bit more difficult (or are copy/pasted). Those cases would likely require manual review, and it would only be worthwhile to do so if that package name is significantly in demand or specifically requested by another author. Otherwise it could easily become an endless time sink.

I’d recommend starting with the easier and more clear cases first with automatically screening empty packages (if nothing is in place at the moment). From there, specific guidelines can be created that define what would qualify as a functional package. If a package author with a functional package (already in place or created) wants to request that name, a manual review would have to be done on the existing package to ensure that a legitimate package isn’t being replaced.

pombredanne · September 30, 2019, 7:21am

@aeros provided sensible suggestions.

Note that some things with little or no code such as https://pypi.org/project/anyreadline are not squatting names… but these are rare enough.

So I would be very much in favor of a few things:

have tools (in particular warehouse) check for some patterns that would then trigger a warning and that would be something you could query for review
as you mentioned, there are repeat offenders: these may be also easier to spot with some checks done when publishing on warehouse (and go in some review queue or query as above?)

In order to avoid someone (you) being in the way (and possibly overwhelmed) I would not make any such check blocking. A warning as you suggested would go a very long way. And later if such automated checks proves solid, the warning could become a publishing block?

steve.dower · September 30, 2019, 1:47pm

Just to add into the discussion, I am (legitimately, I believe) squatting on a few names that are US/international trademarks of my employer. We might want to release packages under these names in future, but we certainly don’t want anyone else doing it. In some cases, we’ve had to use lawyers to get them back from more malicious owners.

Please make sure any checks and/or blocks take this scenario into account.

pf_moore · September 30, 2019, 2:31pm

This is an important point, and while I’m not trying to say you’re expected to answer this question, do you have any suggestions on how we’d do that? If you’re saying that blanket blocking of name squatting isn’t possible because there are “legitimate” reasons for people to do it, then that may be a fair position to take. But the original post here linked to PEP 541 which, from my reading, doesn’t allow for any legitimate cases of name squatting (although as it’s an advisory document for a manual process, that’s fine in practice).

I’m fine with saying that this is hard or maybe even impossible to do right in an automatic process. My understanding of the original proposal was that it was about adding extra reporting and warnings to make it easier to manually address cases of name squatting. Maybe we should stick to that remit, at least for now.

At some point I think we will probably have to bite the bullet and be more explicit about the rules on name ownership on PyPI (in particular, nasty questions like which legal jurisdictions apply). There’s just too many people relying on PyPI and believing it provides greater assurances than it actually does. But I think that needs to be a proper debate, and not a side-thread on a relatively obscure technical discussion like this one. If that means we have to rely on a combination of warnings/reporting and manual management for now, then maybe that’s what we have to do.

encukou · September 30, 2019, 3:03pm

One (possibly strained) reading of the PEP is that the whole document is only about conflict resolution.

Here’s my use case for squatting: I’m implementing a new library. I want to eventually release it (under the same name as its importable module). So I squat the name as a signal to the community that I’m planning to use it, knowing that if anyone disputes the name, they’ll automatically get it and I’ll need to rename my project. And that’s fine; there’s a balance between “The namespace shouldn’t be polluted by unfinished projects” and “it’s useful to have a project name before implementing it”.
Today, in practice, the balance is at “if someone wants the name enough to contact the moderators, they’re free to have it.”

I understand this places unnecessary work on the PyPI moderators, and the ones doing the work, they should have most say in the rules. But still, I’d be sad to lose a way to “weakly signal the intent to use a name”. In basically any project, there’s a period where the name is known, but there’s no usable code to release. In an open-source project, that name is even publicly known (and itself vulnerable to squatting by trolls).

Perhaps we need a better workflow for legitimate problems people now solve by squatting. Detecting useless packages could soon become an arms race.

steve.dower · September 30, 2019, 4:59pm

I think this is the only reasonable outcome. Last time I read PEP 541 I didn’t feel like it was going to cause any problems for preserving trademarks, but that’s probably because it implied (or I assumed) that manual intervention would always be required.

As far as I can recall, the post directly above mine is the first suggestion of automatically blocking empty packages from even being published.

Perhaps if we one day make this an active block rather than a conflict resolution process (as Petr describes, and I like his description), we would also need a name reservation system that allows “this is a trademark I have responsibility for” as an option (possibly also “I intend to publish this package within 30 days” or “I believe there are security reasons this package name should not be allowed”?)

dustin · September 30, 2019, 5:17pm

PyPI already has a blocklist for such invalid names. Right now it mostly contains typosquats and stdlib names, but I don’t see any reason why we can’t use it for trademarked names as well (the block can always be removed).

steve.dower · September 30, 2019, 5:34pm

I’m quite happy with my current approach of pushing an empty package (e.g. microsoft, windows) with a more detailed description. If the name is already taken (e.g. xbox) then it has to go through conflict resolution anyway, so the blocklist won’t help.

I’d also be quite happy if a prominent banner was added “this package looks empty - here’s how to claim the name if you think you would use it better”, then let the resolution process go ahead as normal.

I would not be happy if I had to include “enough” code in the empty packages just to reserve the name, and I’d be less happy if the name simply showed as unavailable without the explanation/links/email I have there today.

All I’m arguing against is blocking uploads completely (suggested in a couple of posts), and warning the uploader is pretty pointless because they’ve already jumped through enough hoops to get this far they almost certainly know what they’re doing.

I’m sympathetic to the workload of the moderators though, and grateful for their work. I’m just not sure how to help lighten the load here.

yeraydiazdiaz · September 30, 2019, 7:28pm

All I’m arguing against is blocking uploads completely (suggested in a couple of posts), and warning the uploader is pretty pointless because they’ve already jumped through enough hoops to get this far they almost certainly know what they’re doing.

While I agree completely blocking uploads is not ideal I do think repeat offenders should be considered for such a measure, particularly on instances where same user uploads multiple packages in a short period of time which I’ve seen instances of.

I disagree on your stance on warnings though, I think it would send a clearer message on PyPI’s stance on name squatting.

we would also need a name reservation system that allows “this is a trademark I have responsibility for” as an option (possibly also “I intend to publish this package within 30 days” or “I believe there are security reasons this package name should not be allowed”?)

I’d like to see explicit reservations of names in PyPI, maybe subject to moderation approval but definitely better than having someone like yourself coming up with a dummy package.

Temporary reservation of names also sounds interesting, maybe limited to small number per user to avoid mass reservations, though I can see it being more work coordinating warehouse and twine to block uploads for reserved names.

pradyunsg · September 30, 2019, 8:29pm

Yea, that’s my reading too. In cases where there’s a conflict, we’d get rid of the conflicting package w/o question if it’s clearly a name-squat.

Seconded, though my concern is an implementation one – we don’t have a good mechanism to warn on uploads. I guess twine could learn to print messages or we could drop an email to the maintainers of a package?

Aha! Another use case for user-flagging of names / packages.

njs · September 30, 2019, 8:39pm

My impression is that the pypi team already struggles to keep up with PEP 541 requests in cases where someone is actively interested in the name (current list). So maybe it would be a bad idea to go out looking for more packages to add to the list?

OTOH if the idea is to provide maintainers better tools to respond to these requests when they come in, that seems like an obvious improvement.

aeros · October 3, 2019, 5:33am

As someone who initially suggested automatic blocking, I would consider something with an adequately detailed description to not be considered empty for upload blocking purposes. I’m not certain as to what would qualify as being “adequately detailed” though.

Optimally, use cases such as yours where authors are publishing a package without code, but are including a description of why they are claiming that name would not be blocked. Requiring a description could potentially help with conflict resolution as well, particularly in cases where trademarks are involved.

Paddy3118 · November 4, 2019, 9:08pm

If were not lawyers; and we want similar access from around the world then it might be best to automatically reject any name that is copyright that any copyright holder defends. We should have the right to get out of any potential mess by telling lawyers “No one gets it”.

pitrou · November 4, 2019, 10:47pm

Names are not copyrighted. There can be trademarks on them, but trademark law is somehow complicated…

croemer · July 12, 2024, 9:39am

@steve.dower you wrote:

We might want to release packages under these names in future, but we certainly don’t want anyone else doing it.

What you’re doing is the exact definition of name squatting. If you own a trademark in some location for a certain product you don’t own the name globally for all possible products. The burden of proof that such squatting is legitimate should be on the squatter.

Wikipedia on this:

A registered trademark confers a bundle of exclusive rights upon the registered owner, including the right to exclusive use of the mark about the products or services for which it is registered. The law in most jurisdictions also allows the owner of a registered trademark to prevent unauthorized use of the mark about products or services which are identical or “colorfully similar” to existing registered products or services, and in certain cases, prevent the use of entirely dissimilar ones. The test is always whether a consumer of the goods or services will be confused as to the identity of the source or origin, not just the area of rights specified by the trademark. An example might be a very large multinational electronics brand such as Sony Corporation where a non-electronic product such as a pair of sunglasses might be assumed by a consumer to have come from Sony Corporation of Japan despite being outside a class of goods to which Sony has rights, yet still protected by Sony’s trademark; a similarly named psychotherapy office or line of hamburger buns or summer camps, however, would not be infringing on Sony Corporation’s trademark because the service or products being offered are so vastly different from Sony Corporation’s trademark claim of rights and range of manufactured goods.
Trademark - Wikipedia

So it depends very much on how well your employer’s trademark is known. Unless it’s a brand as known as Coca-Cola, I would doubt that there would not conceivably be a legitimate reuse of the name, making you an illegitimate squatter.

pf_moore · July 12, 2024, 9:40am

Are you aware that Steve’s employer is Microsoft? That seems reasonably well-known…

(Edit: Sorry, I just noticed the original thread was 5 years old. Please don’t resurrect old threads that have run their course, it’s generally not very helpful - better to start a new discussion if you have something you want to raise).