Script to get top-level packages from source tree

Hi all, for some time I’ve wanted a standards-compliant way of getting the top-level packages of a source tree (the ones that will end up in a wheel). It turns out that the file collection mechanisms are not standardised, so I figured that the only way of doing this in a 100 % correct way (modulo bugs) was to go through installing the build backend in an isolated environment, building the wheel, and extracting the resulting RECORD file. That’s what I did here:

It works on itself:

>>> from pygetpackages import get_packages
>>> get_packages(".")
['pygetpackages']

It’s annoyingly slow and requires an Internet connection, so if there’s a better way of doing this, I’ll be happy to know.

I don’t see why you need to parse RECORD. Can’t you just get the file list from zipfile?

Also note that by looking for __init__.py files you are excluding namespace packages. Depending on your use case, this might be good or bad.

1 Like

I would guess that building is the only way, yes.

If we consider things like setuptools’ package_dir which kind of allows rewriting the project directory structure at build time. If I am not mistaken pymsbuild and probably other build backends allow this kind of things, where the installed tree ends up being completely different than what is in the source tree.

Note that as a bit of a hacky shortcut for (the still large majority of extant) wheels that are built with setuptools, a list of the top-level import packages can be trivially parsed from top_level.txt in dist-info, but its an unstandardized relic of eggs, and other backends do not appear to emit it (at least checking meson-python and hatchling).

I should share my Gist from Oct 2022 here, which handles namespace packages as well as .data directory magic, when someone uses that.

I had a bit of deja-vu, and realised we’ve discussed this specific topic before and I posted this link theere too.

4 Likes

@pradyunsg While we wait, I took your gist and turned it into a package pygetimportables · PyPI

@pradyunsg and @astrojuanlu Thanks for sharing these helpful resources. I often wish the lock files separated highest level dependencies from the dependencies brought in by those libraries.

Sorry, I’m rather confused by what you are describing and how it connects to the topic here. Reading between the lines, it sounds like you might be intending to suggest that lock files distinguish direct from transitive dependencies, but this topic is discussing how to enumerate the top-level import packages in a given project source tree, which is entirely unrelated.

Did you mean to post this in another topic, sorry? Or was this all one big misunderstanding brought on by the unfortunate overloading of the word “package” (meaning both import package, as in the case here, and distribution package, as perhaps was mistakenly assumed), coupled with a misinterpretation of “top-level packages” to instead imply “direct dependencies”?

1 Like