Improving risks and consequences against typosquatting on pypi

[disclaimer: I write better in french than globish.]
[disclaimer: forum’s stupidity force me to break most links.]

Hi,

Some facts first

  • I have recently been powned due to a typo in package name (request instead of requests).
  • I wrote my story here (in french) : https://metrodore.fr/i-have-been-powned.html
  • There is no CVE, no public statement about the malware. It is not found in publicly available database.
  • The trojan stayed on my machine during ~20 days after it was removed by the pypi admin.
  • There is almost no publicity about this menace (I found only one serious asian source alerting about it, reproduced in two asian web media).
  • Pypinfo reports that between name-squatting beginning (31 July) and removal, about 12000 downloads have been done.

Then I wish to share some thought about the current situation. I hope it can help to make a better python movement. Also I suppose that many popular software repos have similar difficulties (npm, cargo, etc.), I suppose they also may have produced thought and answers to be analyzed.

Ideas about struggling against pytosquatting consequences

The main idea is to publicize the menace in order to reduce occurrences of naive hoster of the nasty code (who can naively host the menace many time after the pip package as been removed, upstream or downstream).

Public statement about removals

When a pypi package is removed by admin for security reasons, a public statement should be emitted.
This could be done without extra work for pypi admins if warehouse takes it in charge. The admins should simply motivate the removal reasons and if possible links to resources helping infected people to understand and mitigate the menace.

Imho there is no scenario occurring better if the menace is shut up compared as if it is pinned.

Maybe the are good reasons to do the same when removal is done for other reasons (policy violation).

Exploit the upgrade way

Currently, after a pytosquatting, pypi admins attempt to prevent the re-creation of another project with the same name.

A way to advertise the menace among infected users exist when they upgrade package. Currently they simply get an error due to the upstream removal and may not understand they have been victims to pytosquatting. One can imagine pypi admins could maintain an upgradable version of the package in some “ultimate version”. This upgrade could lead during setup to an exception with an highlighted message advertising the menace and linking to removal statement.

Against, this could be done without extra work for pypi admins if warehouse takes it in charge.

Notes:

  • This upgrade would break users’ upgrade, but no more than after an upstream removal.
  • The ability to build an upgradable but non-installable package is beyong my knowledge of pip/pypi.

Ideas about struggling against pytosquatting

Not reversing responsibilities

I did read an advice looking like “you can’t trust anything on PyPI, pin & hash your dependencies” as a standard response about pypi security. Fortunately this statement is not the official one and don’t appears on:

  • pypi. org/security/
  • packaging .python.org/tutorials/installing-packages/
  • packaging .python.org/guides/analyzing-pypi-package-downloads/
  • pypi .org/help/
  • python-security .readthedocs.io/packages.html

I say “fortunately” because this kind of statement looks like counterproductive for me and I suppose is not the intended purpose of pypi. Here is how I receive it: “hey look, we have a wonderfull infrastructure maintained by thousand wonderfull volunteers helping millions of developers; but we urge you to not use it if you cares about your butt”.

More seriously, checking individually pip package is almost impossible. It is very hard to trust code you didn’t wrote. So we have two alternatives: going toward a community where we can trust each others, or going toward stopping using pypi.

I would prefer the first one. More generally, even if things are difficult, I think it is important that the python overall community have in mind that it is important to remains a safe place to remains (and not pleasant to leave).

Helping pypi admins to detect squatting occurrences

See: github .com/pypa/warehouse/issues/4998
And more generaly: github .com/python/request-for/blob/master/2019-Q4-PyPI/RFP.md#milestone-2—systems-for-automated-detection-of-malicious-uploads

Make a policy avoiding convenience typosquatting

If I correctly understood, the request package have been for a long time a non-nasty package. The owner suppressed it and only after it has been squatted.

Imho, even a convenience typosquatting is a bad idea. Here I see two options:

  • If typosquatting is done by the same owner than original package, it is still probably a bad idea but quite safe.
  • Else it should be avoided by policy, enforced by a creation reject. Because unfortunately, an informal convenience hack done by an isolated person have many chances to be abandoned and in long term to leaves the name into nasty hands.

Different owner mean different package

Note: I didn’t tested the current behavior wrt this purpose.

Imagine a package owner abandon it. If another owner makes a new release, the old release should by default not be upgradable to the new release. Maybe the package’s html page and any published description should also be prepended by a notice explicating the name recycling.

For legitimate owners changes, this behavior could be avoided by a transfer procedure (looking like dns name transfers) and/or using multiple owners for a package (then allowing to add/remove them dynamically).

Use redemption delays

After abandon, a package name should be blocked for some time. This would not protect from future new installations, but it should protect from pip updates (assuming pip allow to upgrade between different owners) and from users minds which may have not noticed the different nature of the package owners and/or purpose.

Against, for legitimate owners changes, this behavior could be avoided by a transfer procedure and/or using multiple owners.

Improving pypi uploaders trustability

Pypi isn’t the first open community in the free software movement. As an example, Debian is an open community involving thousands peoples around the world. To my knowledge, there is almost no occurrences of nasty installed software from Debian’s repos. I could says the same about Fedora, Gentoo, etc. In comparison, pypi looks like a malware nest.

I am not specialist in Debian organization, but I suppose there are learning to get from their organization and from other open organization in free software movement:

  • package signing
  • web of trust
  • developers cooptation
  • package review?

I imagine pypi cannot move from day to another in such organization; and maybe hasn’t the wish to makes trust/cooptation as a prerequisite to have the right to publish.

By the way one can imagine pypi adopting progressively package signing and owners trusting as a good practice to be enforced. Trusting owners could depend as a first step not only on cryptographic cooptation, but could also be done by some scoring of choosen parameters (packages published and popularity, seniority, teams membership, coworking distance to trusted owners, etc.). Pip could have a default option --enable-untrusted which could become after years --only-trusted.

Future is open.

2 Likes

internalize an vulnerability audit in pip

Even if it serves different goals, pip also should internalize a “safety like” check. This is done e.g. buy npm with the --audit option which is trigged at least after each install.

I have somehow encountered this package (indirectly) as well. So here are some links and keywords to eventually help people who are looking up this issue connect the dots…

Reports of known malware should go directly to the security team (see https://www.python.org/dev/security).

This thread is for discussing general mitigations or responses.

There are several different possibilities which you might call ‘abandoning’ a package:

  1. The owner stops working on the package, and no-one else can make a new release.
  2. The owner adds additional uploaders who they trust. You may want to be careful in that case, but many popular projects have releases made by different people in a team, so it’s probably not practical to warn by default every time the uploader changes.
  3. The owner has completely deleted a package from PyPI, and someone else has claimed the name. It would certainly make sense to be concerned in that case, but as far as I know it’s vanishingly rare for someone to delete a package.

The central difference between PyPI & Debian is that anyone can upload packages to PyPI. Debian strictly manages a list of people who are trusted to upload packages, and getting onto that list is non-trivial. I don’t think PyPI should or will go in that direction.

If you accept that PyPI is an open repository - where anyone can upload a package so long as they pick a new name - then adding package signatures isn’t the obvious win it might sound like. It’s not totally meaningless, and PEP 480 proposed a signing scheme, but it’s a lot of work for something that wouldn’t prevent the obvious and simple attacks, where the attacker simply tries to trick people into making a mistake and installing a malicious package.

1 Like

Btw, as far as I know, that’s the way the request package became nasty near 31 jully.

Personal experience

It doesn’t seem that rare to me. I am not sure what happened exactly in the current case (request). But I certainly now “own” a PyPI name that had seen at least 1 release before I took ownership of it (I didn’t know it had had a previous owner and had already seen a release before, but I can see it in the project’s journal on PyPI). Now as far as I understood, existing but seemingly abandoned projects are carefully reviewed by PyPI’s maintainers before being handed over to a new owner (PEP 541; there are quite some cases, see: PEP 541 requests on GitHub). So I can only assume that this project I now own has been considered as having no dependents and safe to hand over.

Opinion

I believe this can of course be mitigated by pinning versions (and using hashes).

I do not know if transferring ownership is the right thing to do at all.

I feel like (and I believe it has been discussed in other threads already), using namespaces would add another level of mitigation (à la user/foo, organization/foo, company/foo, malware/foo; although it would be easy to come up with ogranization/foo which would only move the typo slightly to the left). But for it to be possible, I believe Python’s import mechanisms should be adjusted as well (probably quite a lot of work, if even possible at all, or maybe – wishful thinking – it could be as simple as enforcing import organization.foo vs. import ogranization.foo; and even more unrelated wishful thinking import organization.foo['1.2.3'] as foo123 for multiple versions of the same project in a single environment).

Questions

I wonder what kind of processing happens on PyPI’s side when distributions are uploaded. Would it be possible, sensible, and helpful to run some kind of malware detection (bandit?) on incoming distributions?

Links

Most likely, all of this has been discussed already. I’d be thankful for links.

I already found (and currently reading) those:

I’m actually not sure what this thread/topic is for.

This seems to be listing a bunch of existing plans for how to improve PyPI and installers like pip and a few new ideas that… well, it’ll be easier to discuss them if they were brought up separately. Many of them already have corresponding issues on the issue trackers of the relevant projects. Others are likely better served as feature requests over at https://github.com/pypa/warehouse.

2 Likes

I suspect the thread is an attempt to try and collect some of those efforts in a central location :wink: It’s not that easy to find them all right now, which leaves the impression that we don’t know about them or aren’t investing in them.

So a communication issue, combined with (slow) open source timelines, rather than anything especially new.

1 Like

I initially wrote that

I have been hoping this would have helped to understand a big picture and to put into light some effective measures to be developed or tried. Maybe I could have been enjoyed by the idea to contribute on it.

Several month after, I leave this discussion being disappointed. My understanding is that distributing malware by pip is not considered as a big problem (PyPI is open to all, including malware, users should care), and this point of view is congruent with my personal bad experience.

This is disappointing because from now I will distrust a service I found very useful in the past.

From my point of view I there is numerous aspects of the problem, including policies and organizational aspects. The most trivial example is the publication of a list of removed malwares after they have been published (almost not done).

Knowing where to find the corresponding issues require an experience I don’t have (sorry for being a noob).

And more generally, going into details before sharing the big picture looks like a nonsense for me.

Could you be more specific about what caused you to get that impression? We’re actively seeking funding to improve security on PyPI in several ways, including productionizing malware detection.

Facts:
Only one example: the request package has been published containing an effective malware. It has been removed few days after (once the malware has been discovered). During this time it has been downloaded a dozen of thousand times. There have been no attempt to log or announce about the flaw on the internet that the package have contained a malware.

And it seems that it is not an isolated case. And it is still continuing as is.

Impression:
After asking security@python.org about the latter, I have been advised to post on this forum, maybe the best place to discuss. It appears finally that there is almost no discussion. I essentially learned that the problem is not new, that this place is not the place to speak about problems, and that I should have filled issues into warehouse. I started this thread with the joy that my bad experience could help to make PyPI better, but actually I leaves it being disappointed.

Opinion:
I am quite skeptical about the ability of effectively mitigating the risk using automated malware detection (but maybe I am wrong).