we would like to share a tool with Python packaging community. The tool is called “thoth-solver”  and we use it to aggregate package metadata. As PyPI does not provide additional metadata about packages, we developed this tool to get metadata from published packages. The tool aggregated metadata about Python packages hosted on PyPI for us (roughly 20% of all package releases on PyPI were analyzed - the most popular ones). Part of the dataset produced can be found at [2,3].
In short, the tool installs specified packages from any PEP-503 compliant package index and extracts metadata available for the given package using standard library functions. Most valuable for us were dependency information (hence the name), but the tool can extract additional info as well (see the project README file for more info ).
Feel free to use the tool, we would be happy for any input or suggestions from the community.
 GitHub - thoth-station/solver: Dependency solver for the Thoth project
 datasets/notebooks/thoth-solver-dataset at master · thoth-station/datasets · GitHub
 Thoth Solver Dataset v1.0 | Kaggle