How to Ensure the Security of Libraries during Installation

Vincent · May 18, 2023, 1:30am

I have a question about the security of the library. When installing libraries, how can we ensure that the source code being installed does not have security Problems? Are there certain tools or methods that can detect the security of this library before installation?

Rosuav · May 18, 2023, 2:08am

How do you detect the security of anything? Your web browser? Your operating system? Your BIOS? Your CPU?

Ultimately - you don’t. You have to trust that, with enough eyeballs on the code, it’s probably safe. This is easier to believe when there are more eyeballs, and that works better with some projects with others, but at no point can you ever be truly sure.

Whatever software you’re using - whether it’s a library or an application - you have to figure out what level of confidence you’d be happy with. What kind of assurances would lead you to buy a piece of software from a vendor? What would let you buy a CPU? Do you get into a car, knowing that it has a computer inside it, which has software that is potentially buggy or even malicious?

Every situation has a threshold. It’s up to you to decide what that threshold is, and it doesn’t have to be the same for everything. Maybe you’re not really sure about this particular library, so you decide NOT to install it in the usual way, but to first isolate it in a virtual machine. Or maybe, after reading over the list of contributors and eyeballing some of their release notes, you decide that any problems it has will affect far more people than you, and would most likely have been found already.

It’s worth noting, by the way, that the PyPI admins are pretty vigilant about outright malicious packages (typosquatting, packages that exist solely to deliver malware, etc). I believe someone said recently that malware packages generally disappear within a few days of initial upload. But it’s as well to still double-check that you’re getting what you think you are.

Vincent · May 18, 2023, 3:28am

Thank you, Chris! Your answer really impressed me. You mentioned using virtual machines for isolation in your response. I’m wondering if using the conda environment can achieve a similar isolation effect.
Another point that puzzles me is that you mentioned installing libraries is like installing software. However, security-conscious companies often consider security checks for software installation packages. Is there a way to also check the source code of the libraries? What’s frustrating is that I can’t fully understand the source code, and even if I could, it would take a long time and be quite inefficient.

Rosuav · May 18, 2023, 3:48am

No, it doesn’t; one potential attack vector is the installation process itself - while a Python package is being installed, its setup script can do whatever your user is permitted to do. So one possibility would be running the installation using a restricted-permissions user, although that is probably not all that useful; for full safety, complete isolation would be required.

It’s worth noting that this would be a measure of extreme paranoia though. It’s usually much more practical to examine the source code of the package, and to have a look at the project’s safety record.

Yes, libraries are software like any other. Some might not get a chance to execute until you link against them and run your app, others get to do things as they’re being installed. Either way, though, they’re fully-privileged software.

And it generally IS possible to check the source code - or at least, the purported source code - for most Python packages. Check the PyPI page for a link, often to the project’s GitHub repository or other hosted source code.

Indeed, which is the problem with ALL software. There is no way in the world that I’m going to read through every line of Firefox’s source code before firing up a web browser, which means that the trust I place in the Mozilla community is just as serious as the trust that I place in any other software company or organization. Do you use Google Chrome? Then you have to trust Google. Do you use Safari? Then you have to trust Apple.

There’s a bit of a difference with open source, in that the sheer number of potential eyeballs can be far higher, but it’s sobering to note just how few experts there are in areas like cryptography; bugs in major libraries like OpenSSL can lie there for years and years before someone notices them. Still, I prefer to trust the people behind OpenSSL to have done their due diligence in ensuring that the code is non-malicious, and that’s usually the most important threshold to cross.