How to handle Security blocking PyPi.org

ashishbijlani · January 8, 2023, 5:02pm

I’m building Packj [1] to quickly scan for “risky” attributes, such as expired account emails, spawning of shell, access to sensitive file (e.g., SSH keys), decode+exec behavior, and mismatch of GitHub code vs packaged code (provenance). While it cannot detect all malware, it can accurately point out risky code (line num). Hopefully, this will be helpful for developers and security vetting teams. We found/reported a bunch of malicious packages on PyPI in 2021/22. I’d love to integrate this in PyPI if at all possible.

GitHub - ossillate-inc/packj: The vetting tool 🚀 behind our "dependency firewall" to block malicious/risky open-source packages in your software supply chain

kgraham · January 9, 2023, 6:43pm

I have been off the discussion board for a few days. As the OP, I get the feeling that this is left up to those who want to use Python to figure out on their own. Over the holidays there was malicious code added to PyTorch module on PyPi. That makes me think our Security Director is right. If there isn’t better security from PyPi and GitHub those sites will be blocked by more and more companies. Open Source needs to be more secure. /sigh

oscarbenjamin · January 9, 2023, 7:19pm

This is not quite true as I understand it. The PyTorch module on PyPI was unaffected and no one installing it from PyPI was affected either. The problem affected people installing PyTorch’s nightly builds from PyTorch’s separate nightly index (i.e. not from PyPI).

The malicious package itself was uploaded to PyPI but would only be installed if you installed PyTorch from elsewhere. You can see the explanation from PyTorch folks here:

kgraham · January 9, 2023, 7:33pm

“PyTorch-nightly Linux packages installed via pip during that time installed a dependency, torchtriton, which was compromised on the Python Package Index (PyPI) code repository and ran a malicious binary.”

That sounds like something was compromised on PyPi last month.

I am not getting much confidence.

steve.dower · January 9, 2023, 7:45pm

The name on PyPI was previously unclaimed, which is how it was “compromised”. It had been used on a private index, but not on the public one, so this is simple dependency confusion.

In short, you should only ever use the --index-url option to pip, and never use --extra-index-url or --find-links without also using --no-index.^[1]

There’s a bug against pip somewhere to fix how they handle multiple indexes to handle this case, but the maintainers are opposed to it, so it probably won’t happen. Switching to conda (which prioritizes feeds safely) is a viable option, if they include all the packages you want.

To date, only a single package has ever been truly compromised on PyPI, and it was via a password reset issue unique to that account. Nobody has otherwise replaced a legitimate package with an illegitimate one.

And there’s no point mixing --extra-index-url with --no-index, so just use --index-url as originally suggested. ↩︎

dustin · January 9, 2023, 8:05pm

I’m not sure what you’re specifically referring to, but there has been more than one instance of “legitimate package replaced with illegitimate package”

the ctx project, compromised via domain resurrection (Account Takeover and Malicious Replacement of ctx Project — Python Security 0.0 documentation)
exotel, spam and deep-translator projects, compromised via phishing attacks (https://twitter.com/pypi/status/1562442207079976966, https://twitter.com/pypi/status/1562544091719958528)

That said, blocking all of PyPI is definitely an overreaction here: following Secure installs - pip documentation v23.3.2 to enable hash-checking for pip is enough to mitigate these types of attacks (and other ‘malware’ attacks that depend on typosquats, etc).

pf_moore · January 9, 2023, 8:16pm

The issue is index-url extra-index-url install priority order · Issue #8606 · pypa/pip · GitHub. The maintainers aren’t fundamentally opposed to the idea, it’s just that no-one has yet come up with a proposal that addresses all of the questions involved, and handles the transition in an acceptable manner. Feel free to read the issue for a lot of discussion on “why it isn’t quite as simple as people think”.

But to cut a long story short, this can happen if someone puts together a PR and a transition plan to implement it. But if it’s just left to the pip maintainers, it will probably be quite a long time before it happens (not least because we’re all burned out on the subject).

steve.dower · January 9, 2023, 8:23pm

ctx is the one I was thinking of, but I’d forgotten about these.

Still, phishing isn’t something that’s really PyPI’s fault, and all we can do is help teach our users how not to get tricked. So there’s no reason to count it against the service itself (by comparison, I know we cleaned up expired domains prior to ctx, so the fact that we didn’t have something in place to catch ones that expired since then could be put on us, which is why I changed my “not once” to “exactly once” ).

Simple version pinning is enough, which is my point. Nobody has replaced a known version of a package with a different file, hash or otherwise. And there’s only been a very small number of cases where a trustworthy name became untrustworthy.

The vast majority are new names that have never been a useful package.

dustin · January 9, 2023, 9:08pm

This is getting a bit off topic, but: version pinning is not actually enough. If a project has only published projectname-1.0.0.tar.gz and an attacker comes along and publishes projectname-1.0.0-py3-none-any.whl, pip and other installers will start installing the latter instead, even though the version pin is unchanged.

Same is true for just wheels too, because there can be multiple “compatible” wheels published, and pip will prefer wheels with more specific tags over less specific tags.

oscarbenjamin · January 9, 2023, 11:23pm

It’s quite a subtle point here but also very important: had you been installing pytorch from PyPI then you would not have been exposed to this.

A malicious package was uploaded to PyPI under the name torchtriton which was not previously used as a package name in PyPI. The word “compromised” above was poorly chosen because there had been no previous torchtriton package in PyPI that was compromised and no one had their PyPI account compromised or anything like that. Rather a new package was uploaded to PyPI with a previously unused package name. That name matched a package name that also existed in pytorch’s own index though (outside of PyPI).

The vulnerability was that once the torchtriton package was added to PyPI it might be installed by default by pip in a situation where the user tried to install the pytorch nightly package (from outside of PyPI). The pytorch package in PyPI did not ever depend on this torchtriton package and so no one installing the ordinary pytorch package from PyPI was affected.

Ihavenofear · August 18, 2023, 2:06pm

It might be late to this issue, I hope you have found a way to “bypass” the security mayhem.

Since a few high-profile hacking took place, the security team in my company became so self entitled, arrogant even, implemented all sorts of security measures, totally disregard other’s work, e.g. us as developers. I can say, i feel your pain.

I have tried most if not all of the proposed solutions from internet for the last a year or two, many claimed to work, but none of them worked in my situation. I am also facing the same issue for Nuget as I am also a .net developer.

Anyway, I haven’t got a “solution” to this “issue”, but the “workaround” way for me is to manually download and install the package, including the dependancies. It is very painful, but at least I can get my work going and i only need to go through it once, for every package though…
Because not all the package has a good doc for its dependencies, I usually download the package, with wheel file first. I try installing it locally with pip install, if i miss a dependency, it will show in the error message. I will then download and install that package. Repeat until i can install a child package successfully and work backwards. Sometimes wheel file doesn’t work, I will download the .gz, extract it, there will be a setup.py file, do python setup.py install to install it.

Another way is, if you can copy file from other computers, say home computer, then you are in a better place (not in my case), you can install the package and copy the files over to your work computer. I can’t verify this way since I can’t copy files from home computer to work computer, it is blocked.

Python packages aren’t too bad, not as many dependencies for most of them. Nuget packages however, are a different story.

kgraham · August 18, 2023, 2:45pm

It is a shame that wheels don’t just work right.

Hopefully there can be something done to prevent malware modules.

I know there is minimal staff at PyPi. But for Python to be secure, something needs to be done.

Thank you,

Kirk!

merwok · August 18, 2023, 3:41pm

What do you mean?

PyPI team responds to malware reports, but they can only do so much.
Vetting software is a big responsibility, and for each company it requires time or delegating to a third party that sells the service.

That’s way too vague to be actionable. What precisely would you want to see?

Ihavenofear · August 21, 2023, 4:09am

Update: probably not applied to everyone who has security blockage issue when installing package from pypi.

In my company, they actually have setup an artifactory repo and a proxy, so what i need to do is to specify them.
pip install --user -i --trusted-host .

As mentioned earlier, it maybe applicable to everyone, but for some, hopefully, it gives a direction of solving this issue.

Mythobeast · August 29, 2023, 8:03pm

With all of this in mind, is there an initiative where a group of organizations might work together to create a list of reasonably vetted packages? Having everybody do their own investigations sounds like an immense amount of duplication of effort. Is there perhaps a set of flags within PyPi that might be used as a proxy for this collaboration?

hugovk · August 29, 2023, 9:28pm

Google provides a service called Assured Open Source Software that does something like this: Assured Open Source Software | Google Cloud