Record the top-level names of a wheel in `METADATA`?

brettcannon · September 12, 2023, 12:02am

“expect” isn’t quite right if it’s being prescribed to me. I’m very happy with it being calculated and only being the top-level name since that’s “claimed” by an installation of a project. I only brought this up as Provides is currently defined in a PEP to be for any/all names a distribution provides which no back-end can definitively calculate thanks to things like __path__ manipulations (pywin32 I believe typically gets mentioned at this point).

I’m personally only interested in the top-level names. If people want a PEP to reclaim Provides for top-level only and no pyproject.toml under the assumption that back-ends will fill that detail in based on what would get unpacked then I’m happy with writing that PEP. I’m also happy to skip any PEP and say we just document Provides with a SHOULD for top-level names and MAY for everything else.

Yes, but that’s up to whomever chooses to index things. I would assume most projects would be given some hard cap and other metrics such as download count or something would help sort out people trying to fool people into using a malicious package.

Flimm · November 23, 2023, 9:04am

I have another dream which is related to this one. It would be handy to have a mapping of executables to PyPI distribution.

On Ubuntu, here’s what happens if I run a command that is not found:

$ foobar
Command 'foobar' not found, but can be installed with:
sudo snap install foobar  # version 0.12.3, or
sudo apt  install foobar  # version 0.12.2-2

I’m dreaming of a future where PyPI is searched for executables as well, so that you could see output like:

$ cowsay
Command 'cowsay' not found, but can be installed with:
sudo apt install cowsay  # perl implementation
pipx install cowsay  # python implementation

For that feature to work, there would need to be a mapping of executables to PyPI projects. This mapping would preferably be stored offline, if possible. PyPI’s API could also offer this information.

If people are interested in this, let’s create a new thread, as I don’t want to derail the conversation. I only bring it up here because I can see how the implementations of this mapping and the module mapping might overlap.

pf_moore · November 23, 2023, 9:12am

Is this not already covered by the entry point metadata?

Flimm · November 23, 2023, 9:36am

I’m not sure. I downloaded the files from pyright to check. When I download the .tar.gz tarball, I can find the executable listed in entry_points.txt under the pyright-1.1.337/pyright.egg-info/ in the archive. When I download the .wheel file, I can’t find the executable listed in METADATA, but I can find it listed in entry_points.txt.

I also had a quick look at the PyPI API, and I don’t think it returns entry points or console_scripts. There also isn’t an API that will redirect you to a project given a particular executable name.

domdfcoding · November 23, 2023, 11:24am

Entry points are always listed in entry_points.txt, even in a wheel.

brettcannon · November 23, 2023, 7:45pm

The entry points spec and core metadata spec should help.

mknorps · November 24, 2023, 10:54am

While developing FawltyDeps - dependency checker we bumped into the same issue and wished it was easy to map package name to import names it exposes.
We implemented this mapping in few steps, where source of truth usually ended up to be top_level.txt file (see the blog-post on mapping strategy).
For PyPI packages, that are not available in the user’s virtual environments, FawltyDeps may download the package, and peek what is in the top_level.txt file.

PyPI currently does not expose top-level names via API. For the context: [PyPI-5375], [PyPI-12710].
In FawltyDeps we solve it in a following way. We create a temporary virtual environment and pip install unmapped packages there. Then we extract import names from top_level.txt. Other libraries with a similar purpose, like pigar, unpack packages downloaded from PyPI and reads top_level.txt. It would be a huge simplification if we could obtain mapping information directly from the PyPI API. The condition is that the information should be reliable, thus not user-provided.

That sounds really good! I see you have a strategy of constant mapping plus the user-defined mapping. Do you find a lot of corner cases in which it does not work? Or the majority of the packages is covered by the constant mapping and the rest is up to the user?
I wonder because in FawltyDeps, we had similar dilemmas and that is why we went with mapping dependencies to imports only.

For the reverse-lookup issue I think the user should be aware that it may not be 100% accurate and the suggested library may not be the one they actually want to use. It should not be confused with guarantees given by linters for example, that when you run them with --fix option the code will work.

astrojuanlu · December 4, 2023, 7:28am

Came here from Script to get top-level packages from source tree - #5 by pradyunsg, I have been looking for an easy way to do this for some time.

Seems like there’s agreement around this being useful and desirable, having it in the Provides field in METADATA, and the need of writing a PEP to at least un-deprecate Provides.

Most people only care about top-level imports (so do I), so we could start with it (adding subpackages later is backwards compatible?)

There seems to be still some uncertainty around (1) how to handle namespace packages and (2) whether this should be manually entered in pyproject.toml or only written by backends.

I’d love to have it in pyproject.toml (essentially a standard version of setuptools packages feature) but I also see how it gets in the way of how certain backends infer the packages.

Am I missing something? Shall we focus for now on clarifying the semantics around namespace packages and let backends write whatever they inferred to METADATA.Provides for now (hence codifying the current behavior, and deferring the pyproject.toml question for a later stage)?

brettcannon · December 4, 2023, 8:08pm

I think there’s agreement it would be useful, but I don’t know if there’s agreement on using Provides for this.

My guess is the preference is users never specify this and it comes primarily from the build back-ends.

A PEP?

takluyver · February 13, 2024, 10:00am

I think having the build backend infer it should work for 99% of packages.

As you mentioned in a previous post, there are some cases like pywin32 where that might not be possible, so there might be a need for a manual override. But if that’s a tiny proportion of packages, as I suspect, how you specify the override can be left up to individual backends.

flying-sheep · August 31, 2024, 11:27am

Does PyPI handle range requests and HTTP/2? Then one could probably partially read the .zip file to get its file index, and then just download top_level.txt instead of the whole package. That could save a lot of bandwidth and time for some big packages.

thejcannon · August 31, 2024, 12:07pm

It does yes.

It also means you can do the same “trick” to infer the top level names if missing