Today I’ve incidentally noticed that there is a PyPI package called nuumpy which looks pretty much like a case of typo squatting. However, a quick look at the code suggests it’s not malware (at this time). It doesn’t seem particularly useful (or “loved”, as I’d say) either. This brings the question: should we be reporting such packages?
While I mean no offense to the package owner, I foresee some risk in an arbitrary person holding such a project name, as technically a future version could be malware.
Found some more similar cases: bokey (for bokeh), jupter, nlkt (for nltk), panadas, pythest, and suspiciously close to actual typo squatting (it doesn’t provide anything): pytorch-gpu.
PyPI’s “Report project as malware” button at the bottom of each project page links to a malware support form, which has the following text:
Examples of malicious activity in projects include typo-squatting, dependency confusion, data exfiltration, obfuscation, command/control, and other similar behaviors.
By a strict reading, typo-squatting might be enough to make something malware. For the Inspector link, you can probably just link the metadata as evidence of typo squatting. I went through this process for pytorch-gpu and nuumpy.
Also, I don’t think there’s much harm in reporting at least a couple? If someone asks me to stop after that report, I’ll write that here, but the first one or two reports should be pretty much no-foul.
I would go ahead and submit them as typo squatting. They don’t have malware now, but could easily be changed to host malware at any point in the future.
Someone pointed me to this thread, which explains why I’m seeing multiple reports from some of you.
These aren’t malware - so please don’t report them as such. I’ll probably need to update the language on that page so that it’s clearer to remove the things that are not explicitly malicious.
There’s a limited set of possible names in the Python project namespace, and if any of these projects were performing any sort of impersonation, or data fingerprinting/exfiltration, they could be considered as malicious and subject to administrative action. Most of them will be left alone unless there’s very clear indicators of what they are today, not what they may become.
That decision doesn’t fill me with confidence. I can easily picture a series of events where one of those creates a dependency on the package they’re approximately squatting, making it possible to install them by “accident” without careful lockfile reading, and then at a future point introducing malware either by the current owners or through the malicious intervention of a third party.
That’s a gut feeling on my part, though; I’d be interested in why you don’t see this risk as sufficiently motivating?
I can’t speak for them, but publishing to PyPi is easily scriptable, and I assume the PyPi maintainers don’t want to start a futile game of whack-a-mole.
Is this something that might better be suited to a build system or package manager? Checking Levenshtein distance isn’t difficult, and having pip emit a “nuumpy is similar to numpy, did you misspell it?” Warning would mean the PyPi folks wouldn’t have to be as active on whacking these.
Python already does this for __getattr__ nearmisses. And compiling a list of top 500 or so packages you check against wouldn’t be too difficult I’d think.
So, pypi has a similarity checker built-in, but I’m not sure what it’s algorithm is. “mcp-shield” was too close to an existing pacakge name, but I was able to set up a pending Trusted Publisher for “automta” (which I’d say is too close to the real package “automat”)
It’s a good idea for an opt-in code quality or security tool. But as an opt-out new feature in core tools like pip, firstly for starters I use uv far more than pip these days. Many other package managers are available too. Secondly, it’s very easy and cheap to suggest adding these checks, but neither me nor yourself will have to deal with the hassle from all the inevitable false positives.
“Why is pip giving warnings to users of my package on PyPi?”
“Why is it giving me this warning when I install nuumpy?”
etc. etc.
Perhaps a good first step would be to reach out to the maintainers of these packages, and point out that they actually end up in production environments due to people making typos (don’t ask…).
I’m sorry that doesn’t fill you with confidence - I don’t know what would. Names are a series of characters - there’s nothing “special” about any name, until it becomes “special”.
There’s nothing different about the behavior you described from a typo vs any other name - every project’s name has the same potential security implications - you’re donwloading some code off the internet and running it.
The PyPI policies aren’t clear on future, potential attacks, and I don’t have a good-enough future scrying glass to predict what folks might do - if you’ve got one, please share it! The policies are clearer on items that are easier to rule upon like impersonation - not a week goes by with some new requests-like project that uses the same metadata, description, etc - in those cases, it’s much clearer to make a decision.
With many of these “suspicious looking” projects, until there’s a clear signal of abuse, security problem, I leave them alone.
You can read the entry point here and follow the paths used to check names.