Pip installation reports

Hi all,

recently we had a discussion about possible reports created by pip that would be optionally produced when pip installs software.

Consider an example use case, when one installs packages into a containerized environment. The container image is shipped and one can introspect what packages are installed by issuing pip list. Mind that the listing created by pip list does not capture the index URL from where pip installed software. Also, information on artifact level (which wheel build was matching and used) is not captured in this report. Could pip eventually support easy to read reports created optionally - something like pip install flask --report pip_install_report.json

I would like to ask if the packaging community would be interested in such a feature. I haven’t found any way how to do such reporting with a recent pip version (without increasing verbosity of pip logs, which are essentially unstructured). Sorry if I missed anything.

Thanks in advance for any response.
Fridolin

This should probably just be raised on the pip tracker. I’d personally be cautious about how we did this, as there’s a risk we burden pip with a big maintenance overhead, generating reports for all sorts of use cases. But adding an option to the install command to dump a bunch of extra raw data to a file, for 3rd party reporting, would be a reasonable feature to request.

Instead of pip producing a multitude of reports upon request, perhaps it be better for pip to just begin storing the necessary metadata for these reports alongside an installation, so any third-party tool could generate any format of report, without needing to execute pip itself?

For example, this could eventually allow for a report could be generated from the static filesystem of a container image with Python packages installed, without needing to run the image.

1 Like

Yes, that would be probably be better (if only because it avoids the possibility that someone loses the file from the pip install invocation :wink:). Doing that should be raised here, though, as it would need a packaging PEP to define new installation metadata.

1 Like

OK, sounds reasonable. Thanks for the replies and discussion.

To clarify the effort, the goal here would be to eventually extend pip list (or something similar) and provide an option to list more metadata kept for each installation. Eventually, something like pip list --full - the listing could report package name, package version, index URL, used artifact, hash at first and have the solution extensible enough for future extensions, such as adding information about the installation of signed packages and so. The implementation could integrate with other pip list options, such as pip list --full --user --pre --format json.

It might be worth clarifying if the implementation will use a single “database” to store, associate and maintain such metadata (a single consistent file) or keep metadata spread per installation. I’m not that familiar with pip internals that much and this will probably indeed require a proper design proposal. The former probably fits to Dustin’s “obtain metadata without running the image” more.

I would be against having the reporting itself in pip. I’m fine with the Python packaging tools recording the information needed to report on (I prefer @dustin’s suggestion of making that standard metadata), but I’d want any reporting to be handled by a dedicated tool, rather than making it part of pip. After all, the whole point of having packaging standards is so that we don’t have to bundle everything into pip.

It would be stored with the installation, in the .dist-info directory. This is the relevant standard you would be looking to extend.

1 Like

Full disclosure, I’m mostly thinking about this in the context of the recently-announced pip-audit tool, which already has support for generating SBOMs, and is planning to add support for auditing container images. I created Detailed installation reports · Issue #170 · trailofbits/pip-audit · GitHub to track this feature request there.

1 Like

FYI this is all covered by PEP 665 if you record the wheel that gets installed (not sure if you want to record the sdist that a wheel may have been built from?), so if this information is recorded it could be used to create a lock file from what’s been installed so others can reproduce the environment.

The key things in that list that aren’t covered in the core metadata are:

  1. Index URL
  2. The used artifact
  3. Hash of the used artifact

I guess this is sort of the index-installed equivalent of Recording the Direct URL Origin of installed distributions — Python Packaging User Guide?

1 Like