Security question - pgeocode

Newpythonuser · July 12, 2022, 12:08am

I am a new user to python and accidentally did a pip install of a library “pgeocode”. My code was:

pip install pgeocode

I am reading about security risks associated with installing packages and realize that I was not familiar with potential risks with downloading packages.

Being new I want to be cautious and cannot find much discussion about this library on internet forums to investigate the code for malicious activity. I do not know how to read the code myself so apologize in advance but can you please read the code for “pgeocode” and let me know if there are any security concerns associated with it.

Also, in order to avoid this in future, could you please share any best practices about how to view the code before installing so I can check for malicious activity?

CAM-Gerlach · July 12, 2022, 1:49am

In general, malicious software on PyPI is very uncommon, though it isn’t completely unheard of. However, it doesn’t hurt to be a little cautious, especially with packages that only have a few downloads, have names similar to popular packages, are not open source, or have suspicious readmes or links.

I took a look at this package and its code and while I’m no security expert, it looked perfectly innocuous to me. It does read from some URLs, but this is apparently necessary for its function.

Realistically, as a new user it seems somewhat difficult to judge this. However, if you want to take a look at the code, just download the wheel or the sdist from the Files tab on the project’s PyPI page and open it with your favorite unarchiving tool, and the code will be in there (exactly where depends on the distribution archive format, how old the tool is and how the original code was organized).

steven.daprano · July 12, 2022, 2:10am

I don’t think that you actually ran the command pip install pgeocode from the Python command prompy, >>>. If you did, you would have got a SyntaxError.

So what did you actually do?

I believe that this is the source repository of pgeocode, so you can read the code there.

You can also find the files downloaded by pip on your local PC and read them. I expect that if you search your python path, you will find a file pgeocode.py, most likely in a directory called “site-packages”. That’s the source code.

Its probably too late now: pip has already run the installer, so if it were malware, it has already run and could have covered its tracks.

I say that only as a hypthetical, there is no indication that pgeocode is malware or anything other than what it claims to be. It is mentioned on Stackoverflow and seems to be a small but reasonably healthy project with no security issues.

I have to ask though, as a beginner to Python, how would you recognise a security threat in the first place?

encukou · July 12, 2022, 1:19pm

Since this is a security question, I must point out that if you don’t trust the developers/publishers, it’s important to check the exact code you’re installing. In particular:

there’s no guarantee that the linked repository corresponds to the installable package, and
there’s no guarantee that built distributions (wheels) correspond to any particular source archive.

If you trust the developers of pgeocode, neither is a concern. But if you’re downloading an archive and reading through it (or having someone else do an audit for you), install it with pip install downloaded-file.whl rather than pip install pgeocode.

Newpythonuser · July 12, 2022, 4:58pm

Thanks all for the inputs. Super helpful.

steven.daprano · July 13, 2022, 1:13am

Hi Petr, thanks for the feedback, but I don’t understand your advice.

Where do you get the “downloaded-file.whl” from?

Once you run pip install on the whl, does that not execute code in the wheel? So couldn’t a malicious wheel do whatever bad things it wants and then hide the traces?

I think it is also important to point out that we’re not trusting the developers. We’re trusting the open source system that if the package was malicious, somebody else would have noticed by now and raised the issue.

encukou · July 13, 2022, 9:57am

Yes. You’ll want to run pip download pgeocode instead (in a new directory – it’ll fetch several files.)
Or get the file from PyPI’s download page.

(Technically, installing wheels should not run the downloaded code, but that’s a minor detail – other kinds of installs as well as the next invocation of python can run it.)

Yes. It works best if you have some idea of whether there is a somebody else for a given package, and how likely it is that there’s an attack everyone else would miss.

CAM-Gerlach · July 23, 2022, 1:43am

Nope, that’s the purpose of wheels—they don’t require executing dynamic code to be installed, they only need to be unpacked and spread, whereas installing an sdist first requires running the project’s build backend, and possibly (using the setuptools backend and using a setup.py instead of a setup.cfg or pyproject.toml for metadata and configuration) executing arbitrary code from within the package itself, which could do anything. The procedure @encukou specified ensured it was the wheel that was installed rather than the sdist, so that no code would be run. Of course, the same could be achieved by passing the --only-binary flag to pip install.

Of course, once you import the package you’re vulnerable, but @Newpythonuser seemed to be specifically referring to package installation.