How to handle Security blocking PyPi.org

kgraham · January 5, 2023, 9:43pm

[It was suggest that I ask this here, so forgive me if this is the wrong place.]

Weird problem I ran into last month. I started having issues using pip to download modules at work.
I would get a series of WARNING messages that SSL certificate verify failed, and this message:
“There was a problem confirming the ssl certificate: HTTPSConnectionPool(host=‘pypi.org’, port=443)”

It took a while, but I found out that the Security Director here blocked pypi.org on the Umbrella (Cisco’s OpenDNS) because there were “29 malwared malicious modules” at the site. As far as I can tell that was from last fall, and all those modules have been removed.

My question is, what can I do about correcting his thinking on this? Is there any simple to understand policy or plan going on so that I can put his mind at ease and unblock pypi?

Does anyone else have this situation at their work?

Thanks,
Kirk (noob)

brettcannon · January 5, 2023, 10:26pm

Any malicious projects found on PyPI are taken down immediately and there are guidelines on how to report at Security · PyPI .

My first question would be whether they block every project index out there (e.g., npm, crates.io, etc.), as they all have the same problem? Or what about GitHub? I mean where does the line get drawn for protecting you from potentially malicious code?

My follow-up is how do they expect you to do use any open source Python code? If so, how are you supposed to get that code? Straight from the repositories? I mean I know lots of large companies that ban pulling directly from code indexes like PyPI, but then these are large companies with dedicated teams to get the source, store it internally, do their own builds of wheels, etc. If you block access to using what the projects provide you have to be up for doing all the work they provide in getting you those files.

kgraham · January 5, 2023, 10:31pm

Yes, they blocked the entire pypi.org URL. They’re talking about blocking GitHub, but at least they are talking about that one. I pointed out that all compromised modules were deleted already and got no where.

One answer I was given was to buy non-open source software to do what I want to do. Which is kinda stupid as nearly everything has open source built into it these days. No way to avoid it.

Basically Python is dead at our company. I am a novice programmer, and don’t know (yet) how to go around not being able to get to pypi and I’m asking other programmers what they’re doing. Maybe a united front?

Beyond frustrating. Can I point pip to other resources? Seems to default to pypi.

Rosuav · January 5, 2023, 10:32pm

It also assumes that pay-for software is perfect. From experience, I can assure you that this is not the case.

kgraham · January 5, 2023, 10:45pm

Yes, I agree. APT Hackers just attacked Excel, but we can still use Excel in-house.

I get the pressure he’s under, but paranoia isn’t a winning strategy.

I guess I could dig out my old FORTRAN book and try writing in that. Security through obscurity!

brettcannon · January 5, 2023, 10:56pm

That’s if you can find it already available, else you’re hiring someone to build the bespoke software for you.

It’s on Mars. If it’s good enough for NASA isn’t it good enough for you?

It is a default, but you can point pip at any index you want as long as it implements the appropriate API.

mattip · January 5, 2023, 11:20pm

There are tools to manage the complexity of vetting external software. The best known of these is artifactory but there are others. But the security team will need resources to implement a full software lifecycle part management (similar to how manufacturing firms manage part validation). If the third-party components (open source or proprietary) will become part of a product (and not just build tools or testing/QA tools), the product managers will have to learn to manage a software bill-of-materials catalog of all the components they use, the versions, and the licenses. If management is not willing to allocate resources to these efforts, you will have a hard time moving forward.

In my experience this is not a simple process, especially if there is no existing culture of outsourcing software components. It requires a lot of convincing, and a feel for how the wind blows in your company. Can you find a high-level management patron who is willing to listen to the advantages of Open Source? Are there conferences for your industry where you can reach out to like-minded people and form a consortium for “Open Source in the … Industry”? Perhaps such an organization already exists. I was lucky to be in the right place at the right time to effect this change in a large bureaucratic manufacturing firm, and the end result was very satisfying. It took a multi-year effort though, so don’t give up quickly. There are many resources on-line about how Open Source is eating the software world that you could put on your company blog, and many speakers who would be willing to come give a talk about moving to using Open Source at your periodic company “external lecture” events.

kgraham · January 5, 2023, 11:39pm

Thanks everyone, but the Security Director guy is the only one that’s making the decisions and right now he’s being a complete pest. I’d blame it on him being a boomer, but he’s a bit younger than me.

He listens to outside vendors that tell him “oh that’s bad” and then assumes that must be true. Never mind that they are selling something. I guess it gives him CYA.

I am trying to get another Director that has programmers using python in the same situation to join the fight.

mwichmann · January 6, 2023, 12:23am

There’s little point in us giving you arguments in favor because as you’ve described it, you have no leverage. Companies that intend to use open source software ought to have an Open Source Program Office (maybe not named so, and maybe just a single individual’s role, depending on circumstances) which works with stakeholders to develop policies and strategies for the various things that should or need to happen to enable such usage. The Security Director has a specific remit, and it’s not “let’s make things easy for developers”, so things can’t reside there alone. That’s a completely different thread (organizations exist to help with this, like the TODO Group - see https://todogroup.org for them - and others, if your org wants to go that way). But that’s not going to help you in the short term.

If you think you can proceed in the face of a block of a specific domain without violating a broader policy, you might consider that the actual packages (wheels and sdists) are not hosted on pypi.org so if you have another way to find the links, like say, a personal device, you may be able to download the things you need anyway, and you can install from those packages directly. This is NOT advice to do so: you have to figure out what is acceptable to do in your organization and what is not.

kgraham · January 6, 2023, 12:25am

I was looking for any others that had run into a similar situation and what they were doing to resolve it.

fungi · January 6, 2023, 12:36am

Not quite a “boomer” but as someone with decades in the information
security profession, it sounds like your company needs to reconsider
this individual’s qualifications. Security is always a balance
between safety and convenience. Take away enough convenience and you
can make the company almost completely safe, but also out entirely
of business.

kgraham · January 6, 2023, 12:37am

I couldn’t agree more. Frustrating to say the least.

mwichmann · January 6, 2023, 12:46am

I have been in the situation - working for a large company whose IT department occasionally blocked domains for “security reasons” that had a substantial impact on certain developers. My point was that you’re unlikely to be able to battle this on your own, there has to be some kind of corporate buy-in to a strategy of using open source for the myraid reasons that exist to do so, and then decisions can be measured against that, and not just “oh, we heard there were some trojaned packages once hosted on that site, so it’s blocked”. Wish the news could be brighter.

Rosuav · January 6, 2023, 12:48am

Sorta-kinda. In practical terms, what usually happens is: Take away enough convenience and people will do whatever it takes to get around your security barriers, usually weakening security massively in the process. (For instance: “Your password must contain one uppercase, one lowercase, one digit, one symbol, and at least three characters that only exist in codepage 437” - which basically amounts to “keep your password in clear text where you can paste it in”.)

fungi · January 6, 2023, 1:07am

Yes, or when it comes to the specifically identified inconvenience
in this case: downloading through covert VPNs, tethering to their
cell phones, and so on.

uranusjr · January 6, 2023, 10:40am

There are also various PyPI mirrors out there you can use. There’re a ton of them in China (basically your same reason but worse), for example. You’ll need to decide on your own whether to trust them, but they’re not fundamentally worse than VPNs from a technical view. Those can be used with --index-url.

steve.dower · January 6, 2023, 11:44am

Thanks for posting, as you can see, far more interesting discussion than we’d have had on the smaller list

Matti’s answer above is very good, and yeah, it’s not an easy challenge to win. Where I work we’ve largely outsourced this effort (to Anaconda, though there are a few other vendors) and contractually cover ourselves for a lot of the risk.

Where outsourcing isn’t an option, we just do the work ourselves, typically from PyPI sdists directly (which haven’t been totally blocked… yet… but I’m hoping to arrange an exception for the workaround builds since that’s the point!) but occasionally from forked repositories (where we need our own patches).

It sounds like paying a vendor is your best way forward. If the money can’t be found, using a mirror is a reasonable option - a tool like Azure Artifacts (if you are a Microsoft shop you may have this already) or Artifactory can also be a private and transparent mirror,^[1] and may provide additional security coverage^[2].

Meaning you get full access to PyPI on-demand, and then it’ll cache anything you’ve installed before, without you having to push the packages to the private repository yourself. The features in these two are known as “Upstreams” and “Remote Repositories” ↩︎
Caching is one (you’ll have a copy of a malicious package you installed for later analysis), name blocking/filtering is another possibility, and I’ll be very surprised if there aren’t automated malware scans built in over the next couple of years. ↩︎

barry · January 6, 2023, 6:19pm

+1 and that’s essentially what we do where I work too. Our CI machines don’t have access to the internet, and thus PyPI and other language repositories are off limits. They’re accessible from local dev machines, which does sometimes cause failures in CI. We have a process for mirroring external packages into our internal repositories (for the most part, Artifactory), which involves security and license scanning before the import is allowed. We import sdists by default, so this workflow gives us a hook to build wheels for our internal downstream consumers. We do have a process for manually importing binary wheels when building from sdist is problematic. It’s a fair bit of machinery, but it all works pretty well and we use essentially similar processes for other language ecosystems.

smontanaro · January 6, 2023, 7:01pm

Same at my last job. Our network admin teams (or some subset thereof) were responsible for keeping the local cache in sync. That meant they could (I presume) exclude packages which were problematic. In fact, I suspect the OP’s security folks could probably start with an empty cache and a known blacklist, then only whitelist packages as deemed necessary. (That said, I don’t know the ins and outs of Artifactory. I could certainly be way off-base on this.)

steve.dower · January 6, 2023, 11:55pm

There’s no “known blacklist”, because everything we know about has already been removed and blocked from PyPI. But yes, manually approving each package that may be used is the ideal (depending on how thorough this approval is… but anything is better than nothing) and being able to later remove the package from internal access (quicker than from PyPI, if necessary) is better.

One downside here is that most packages removed from PyPI are taken down silently, so you may end up keeping them around for longer. But only if you happened to try to install it while it was available, in which case you’re already compromised and it’s likely not getting any worse. I’m still quietly hopeful that PyPI will come up with some mechanism for publishing taken down packages so that mirroring tools can also do it (potentially more aggressively, if they want), but haven’t been pushing on this recently.