Standardized way for receiving dependencies

Hello,

with the growing number of packaging tools like setuptools, flint and poetry (thanks to pep517) it’s getting harder for dependency management tools like pipenv and poetry to receive the dependencies a package needs, if it’s not already build.

@bernatgabor had some thoughts about it in Structured, Exchangeable lock file format (requirements.txt 2.0?)

Is there any work about this?

I could imagine something similar to the optional hooks in PEP517.

fin swimmer

2 Likes

See: https://twitter.com/brettsky/status/1246232498998095872

I would like to qualify, however, that this is still better left for build tools to perform. Since a build tool can choose to generate metadata dynamically (like how Setuptools uses a Python file), your best shot to retrieve dependency information is still to build the package, even after the standard is established. Dependency information from the source distribution is not reliable unless there are very drastic changes to the ecosystem.

cc @brettcannon :stuck_out_tongue:

1 Like

To expand on this point a little, the vast majority of packages have a fixed set of dependencies that only ever changes when the package version changes. For those packages, it makes perfect sense that install tools like pip should be able to say “what does foo 2.0 depend on?”

Packages may depend on other packages only some of the time - classic examples being backports (if I’m installing on Python 3.7, I depend on the importlib.metadata backport) or platform-specific modules (on Windows, I depend on pywin32). These can be handled with environment markers, but not all projects do that (yet, and older versions never will).

And finally, there are probably a small number of projects which genuinely have dependencies that can only based on checking something that isn’t covered by the marker feature. But these are (in my experience) rare enough that it’s hard to say anything specific.

The problem with standardising something is that it always gets bogged down in those edge cases. And with the current set of standards, there’s no way for a project to say "I’m not an edge case"1. Plus, metadata is held at the distribution file level (each wheel has its own metadata file), not at the project/version level - so tools like pip can’t assume that metadata from one file applies to any other.

That’s not to say that there couldn’t be such a standard. Consider the following:

  1. An extension to PEP 503 that added a link to a static metadata file for version X.Y.Z of a project.
  2. A standard pyproject.toml extension that let projects specify their metadata. The metadata would apply unchanged for all builds of that source, and backends are not allowed to override a metadata field that’s specified in pyproject.toml. Frontends would be allowed to read metadata from pyproject.toml without building the project.

Mechanisms would be needed to let tools like twine upload a versioned static metadata file, for Warehouse to validate that the versioned metadata does not get modified by subsequent uploads, etc. We’d also need to do some work to handle versions (metadata depends on the version, but lots of projects want to dynamically set their version from somewhere else in the project like a __version__ attribute).

But there’s the bare bones of a proposal. So the idea isn’t impossible, it’s just:

  1. Hard to get the details right
  2. Necessarily opt-in, so tools will have to maintain support for older projects (and newer projects that don’t opt in) essentially forever.

Maybe I just reinvented @brettcannon’s idea - who knows? :slightly_smiling_face:

1 Insert Monty Python “We are all individuals!” “I’m not” reference here :slightly_smiling_face:

Yes, that’s why I would like to see a definition for an API like the optional hooks in PEP517. Having something like this, the build tools can decide where this information can come from.

I would extend my original request in that way, that it would be nice not to only get the dependencies but also the version. Something that can answer “Tell me who you are and what you need.”

Wait, are you asking for a build hook? If so, what’s wrong with prepare_metadata_for_build_wheel?

I will just clarify that my idea is coming up with a standard way to specify metadata that can be specified statically, but allows for a way to let other metadata that is dynamic to be specified as such.

I agree, which is why I have an escape hatch idea. :grin:

Anyway, I am not up for having a public discussion on this topic until I have vetted it with people privately to help minimize the guaranteed bikeshedding on this topic (and that’s if my idea isn’t bonkers).

Thanks for pointing to prepare_metadata_for_build_wheel!

While this can be used to get metadata like version and dependencies, this would be just a workaround. Why? It’s creating folder(s) and it’s creating file(s) which needs to be cleaned up. It’s is necessary to parse the content. Requirements are not given PEP508 conform.

I can’t help but read this in the most ironical way. Creating files and folders, that is a given if you need to build stuff; disallowing that would mean to ban dynamic metadata, and we’re back to square one. What prepare_metadata_for_build_wheel supposes to generate is the canonical form of package metadata, and also the only format that has a parser built-in in the standard library (importlib.metadata). You literally can’t be more canonical than it.

The situation is entirely the other way around from what you perceive. The standard way to receive dependencies is to build the package (either wholly or partially). Reading static dependencies directly from source is a workaround to conserve resources, while achieving approximately the same result.

OK. I think I see what @finswimmer is trying to get at here. We may be getting distracted by the general question of metadata, where the original proposal was only about dependencies.

For a project on PyPI, or indeed any wheel or sdist, it’s possible to parse the project’s name and version from the filename. In the case of sdists, it’s possible that the name/version may turn out not to match the generated metadata when you build the project, but it’s relatively reasonable to simply treat this as and error and reject that case. (Pip’s new resolver currently does this, although we’re discussing possible alternatives, like backtracking when we get the revised information).

So for a dependency resolver like poetry or pip-tools, or pip itself, that leaves only the dependency data that needs a build (or at least build metadata) step to generate. And therefore that’s the only part of the puzzle that’s costly to produce.

So having a file that specifies the project dependencies in a static format, would allow resolvers to complete the resolution process without doing that costly build step. OK, that makes sense as a use case.

But the thing is, we already have that. In a correctly formed sdist, there’s a file PKG-INFO, that contains the project metadata in standard format. The problem is, there’s no guarantee that it’s correct - because it can be generated on the fly, at build time. See this discussion for plenty more on this topic. But if you’re willing to accept the risk that the data is wrong, you already have a standard location for dependency data.

It may even be that dependencies are static enough in practice that doing this is reliable. I don’t know. I do know that we’ve chosen in pip to go down the route of using the PEP 517 prepare hook to get accurate dependency metadata. That was our choice - other people may have different priorities. One problem with the PKG-DATA file is that it doesn’t have a flag in it that says “this data is static, so you can rely on it”. So you could be reading data that’s only valid on the machine of whoever built the sdist. Caveat emptor.

So in summary, there’s no need for a new format of static metadata file for sdists. We have one. The problem is that sdist metadata isn’t actually what you want when resolving - what you want is the wheel metadata that the sdist would generate. And with existing tools, there’s no guarantee that these would be the same, and no way for projects to assert that it is the case.

Personally, I wonder whether merely adding some sort of field saying “the following fields won’t change, so you can use them for all targets”. But the discussion I linked to above goes into plenty more depth about that approach (I haven’t read it in a long while, so I don’t recall whether the conclusion was that it’s a workable idea, or whether it has flaws that I’ve forgotten…)

Thanks a lot for your answer @pf_moore.

You were pointing out that there are several situation, where a dependency resolution tool must be able to read a packge’s name, its version and it’s dependencies:

  1. It’s already build as a sdist or a wheel.
  2. It’s not build. So we have a setup.py or a pyproject.toml.

In the first case we could try to get the needed information from the metadata files, the packages should provide. But as it looks by now, no one is forced to provide these information when building these packages. I would be really happy if pypi et. al. would refuse those packages in the near future.

The second case arises when trying to resolve a package from a local path or git repository. Building the package to get access to the metadata needed, is possible, but have a huge overhead. prepare_metadata_for_build_wheel is an alternative, but isn’t really in API due to my points I mentioned above.

Metadata can be dynamic. But shouldn’t they be available in a very early step of the build process?

(reading through your linked discussion now)

Maybe. But the problem is in the details - can you clarify what you’re proposing here? For example, what changes would you propose to setuptools to make the metadata available any earlier than it is now? And what additional hook(s) would PEP 517 need to expose that capability? (I’m not asking for a full implementation strategy, just a clearer picture of what you think could be done better in the context of existing tools)

@pf_moore Typo of PEP 517?

1 Like

This kinda would go against backends having the liberty of storing their dependencies where and how they wish. Why not just use PEP-517s PEP 517 – A build-system independent format for source trees | peps.python.org and parse that?

Correct. That’s a trade-off that someone would have to argue for if they wanted to push such a proposal. I don’t have much of a preference myself, so I’ll leave it to someone else to pursue, if they want to.

Hello,

I don’t think we are that far away from what I have in mind :slight_smile:

With PEP 566 and the Core metadata specifications lot of work is done. Except that the recommendation to not surround version specifier by parenthesis should be changed to not allow parenthesis to be complete PEP508 compatible.

With prepare_metadata_for_build_wheel from PEP 517 we get nearly the other half of the road. What’s the use case for this hook to create files? What I’m missing is a hook, that returns this information directly as a python data structure. This would avoid writing a parser (which might have bugs) and cleanup those files for tools that consumes these data.

Please keep in mind that PEP 517 hooks are designed specifically to be run in a separate process. It is not viable to pass around rich Python objects, and you’ll need a parser for anything more complicated than a plain string.

And since the metadata format is parsable through modules in the standard library, I don’t think there’s a large possibility of issues around parsing of the metadata.

I’d say anything that’s not JSON dump able would be out of question, at which point we can just use importlib metadata parser after prepare wheel finishes on the content written to disk.

Yes, it’s possible to read it like this. But the headline is “Build backend interface”. And an interface is not necessary a one way road. With get_requires_for_build_wheel and get_requires_for_build_sdist we have hooks that return a data structure

I was hoping that this is possible. But couldn’t find out how to use it on packages that are not installed or only on the METADATA file if the output of prepare_metadata_for_build_wheel is on a totally different location. Any hint?

I have a such usage on the tox rewrite branch, can link in when I get back to computer.