Pip installation reports

fridex · December 3, 2021, 10:48am

Hi all,

recently we had a discussion about possible reports created by pip that would be optionally produced when pip installs software.

Consider an example use case, when one installs packages into a containerized environment. The container image is shipped and one can introspect what packages are installed by issuing pip list. Mind that the listing created by pip list does not capture the index URL from where pip installed software. Also, information on artifact level (which wheel build was matching and used) is not captured in this report. Could pip eventually support easy to read reports created optionally - something like pip install flask --report pip_install_report.json

I would like to ask if the packaging community would be interested in such a feature. I haven’t found any way how to do such reporting with a recent pip version (without increasing verbosity of pip logs, which are essentially unstructured). Sorry if I missed anything.

Thanks in advance for any response.
Fridolin

pf_moore · December 3, 2021, 11:27am

This should probably just be raised on the pip tracker. I’d personally be cautious about how we did this, as there’s a risk we burden pip with a big maintenance overhead, generating reports for all sorts of use cases. But adding an option to the install command to dump a bunch of extra raw data to a file, for 3rd party reporting, would be a reasonable feature to request.

dustin · December 3, 2021, 12:23pm

Instead of pip producing a multitude of reports upon request, perhaps it be better for pip to just begin storing the necessary metadata for these reports alongside an installation, so any third-party tool could generate any format of report, without needing to execute pip itself?

For example, this could eventually allow for a report could be generated from the static filesystem of a container image with Python packages installed, without needing to run the image.

pf_moore · December 3, 2021, 1:47pm

Yes, that would be probably be better (if only because it avoids the possibility that someone loses the file from the pip install invocation ). Doing that should be raised here, though, as it would need a packaging PEP to define new installation metadata.

fridex · December 3, 2021, 2:33pm

OK, sounds reasonable. Thanks for the replies and discussion.

To clarify the effort, the goal here would be to eventually extend pip list (or something similar) and provide an option to list more metadata kept for each installation. Eventually, something like pip list --full - the listing could report package name, package version, index URL, used artifact, hash at first and have the solution extensible enough for future extensions, such as adding information about the installation of signed packages and so. The implementation could integrate with other pip list options, such as pip list --full --user --pre --format json.

It might be worth clarifying if the implementation will use a single “database” to store, associate and maintain such metadata (a single consistent file) or keep metadata spread per installation. I’m not that familiar with pip internals that much and this will probably indeed require a proper design proposal. The former probably fits to Dustin’s “obtain metadata without running the image” more.

pf_moore · December 3, 2021, 2:47pm

I would be against having the reporting itself in pip. I’m fine with the Python packaging tools recording the information needed to report on (I prefer @dustin’s suggestion of making that standard metadata), but I’d want any reporting to be handled by a dedicated tool, rather than making it part of pip. After all, the whole point of having packaging standards is so that we don’t have to bundle everything into pip.

It would be stored with the installation, in the .dist-info directory. This is the relevant standard you would be looking to extend.

dustin · December 3, 2021, 3:01pm

Full disclosure, I’m mostly thinking about this in the context of the recently-announced pip-audit tool, which already has support for generating SBOMs, and is planning to add support for auditing container images. I created Detailed installation reports · Issue #170 · pypa/pip-audit · GitHub to track this feature request there.

brettcannon · December 3, 2021, 8:53pm

FYI this is all covered by PEP 665 if you record the wheel that gets installed (not sure if you want to record the sdist that a wheel may have been built from?), so if this information is recorded it could be used to create a lock file from what’s been installed so others can reproduce the environment.

The key things in that list that aren’t covered in the core metadata are:

Index URL
The used artifact
Hash of the used artifact

I guess this is sort of the index-installed equivalent of Recording the Direct URL Origin of installed distributions - Python Packaging User Guide?

fridex · November 24, 2022, 7:36am

We are evaluating a possibility to work on this and invest some time to deliver this feature. @brettcannon, and others, do you see any dependency here with respect to the lockfile related work we should consider?

pf_moore · November 24, 2022, 9:14am

Are you aware of the new pip install --report <file> option?

fridex · November 24, 2022, 10:36am

Nice, I missed it - it looks like implementation of feature initially requested. Thanks for pointing it out.

We wanted to implement feature discussed and proposed later in this post - storing installation information in .dist-info directory. It might help with a different use case - reconstruct what was actually installed from filesystem (an example could be containerised environments).

brettcannon · November 24, 2022, 8:14pm

Depending on how much information is captured, the possibility of generating a lock file someday from what’s already installed.

fridex · January 30, 2023, 6:00pm

Proposed a PEP with a WIP patch to pip (will work on it once PEP is accepted). I was not sure about editables in the PEP. Let’s see how it evolves.

davidism · February 1, 2023, 2:48am

7 posts were split to a new topic: PEP sponsors and CODEOWNERS

sbidoul · January 30, 2023, 8:53pm

@fridex you should create a new thread here to discuss the idea.