I am a new user to python and accidentally did a pip install of a library “pgeocode”. My code was:
pip install pgeocode
I am reading about security risks associated with installing packages and realize that I was not familiar with potential risks with downloading packages.
Being new I want to be cautious and cannot find much discussion about this library on internet forums to investigate the code for malicious activity. I do not know how to read the code myself so apologize in advance but can you please read the code for “pgeocode” and let me know if there are any security concerns associated with it.
Also, in order to avoid this in future, could you please share any best practices about how to view the code before installing so I can check for malicious activity?
In general, malicious software on PyPI is very uncommon, though it isn’t completely unheard of. However, it doesn’t hurt to be a little cautious, especially with packages that only have a few downloads, have names similar to popular packages, are not open source, or have suspicious readmes or links.
I took a look at this package and its code and while I’m no security expert, it looked perfectly innocuous to me. It does read from some URLs, but this is apparently necessary for its function.
Realistically, as a new user it seems somewhat difficult to judge this. However, if you want to take a look at the code, just download the wheel or the sdist from the Files tab on the project’s PyPI page and open it with your favorite unarchiving tool, and the code will be in there (exactly where depends on the distribution archive format, how old the tool is and how the original code was organized).
You can also find the files downloaded by pip on your local PC and read them. I expect that if you search your python path, you will find a file pgeocode.py, most likely in a directory called “site-packages”. That’s the source code.
Its probably too late now: pip has already run the installer, so if it were malware, it has already run and could have covered its tracks.
Since this is a security question, I must point out that if you don’t trust the developers/publishers, it’s important to check the exact code you’re installing. In particular:
there’s no guarantee that the linked repository corresponds to the installable package, and
there’s no guarantee that built distributions (wheels) correspond to any particular source archive.
If you trust the developers of pgeocode, neither is a concern. But if you’re downloading an archive and reading through it (or having someone else do an audit for you), install it with pip install downloaded-file.whl rather than pip install pgeocode.
Hi Petr, thanks for the feedback, but I don’t understand your advice.
Where do you get the “downloaded-file.whl” from?
Once you run pip install on the whl, does that not execute code in the wheel? So couldn’t a malicious wheel do whatever bad things it wants and then hide the traces?
I think it is also important to point out that we’re not trusting the developers. We’re trusting the open source system that if the package was malicious, somebody else would have noticed by now and raised the issue.
Nope, that’s the purpose of wheels—they don’t require executing dynamic code to be installed, they only need to be unpacked and spread, whereas installing an sdist first requires running the project’s build backend, and possibly (using the setuptools backend and using a setup.py instead of a setup.cfg or pyproject.toml for metadata and configuration) executing arbitrary code from within the package itself, which could do anything. The procedure @encukou specified ensured it was the wheel that was installed rather than the sdist, so that no code would be run. Of course, the same could be achieved by passing the --only-binary flag to pip install.
Of course, once you import the package you’re vulnerable, but @Newpythonuser seemed to be specifically referring to package installation.