Standardized way for receiving dependencies

Yes, that’s why I would like to see a definition for an API like the optional hooks in PEP517. Having something like this, the build tools can decide where this information can come from.

I would extend my original request in that way, that it would be nice not to only get the dependencies but also the version. Something that can answer “Tell me who you are and what you need.”

Wait, are you asking for a build hook? If so, what’s wrong with prepare_metadata_for_build_wheel?

I will just clarify that my idea is coming up with a standard way to specify metadata that can be specified statically, but allows for a way to let other metadata that is dynamic to be specified as such.

I agree, which is why I have an escape hatch idea. :grin:

Anyway, I am not up for having a public discussion on this topic until I have vetted it with people privately to help minimize the guaranteed bikeshedding on this topic (and that’s if my idea isn’t bonkers).

Thanks for pointing to prepare_metadata_for_build_wheel!

While this can be used to get metadata like version and dependencies, this would be just a workaround. Why? It’s creating folder(s) and it’s creating file(s) which needs to be cleaned up. It’s is necessary to parse the content. Requirements are not given PEP508 conform.

I can’t help but read this in the most ironical way. Creating files and folders, that is a given if you need to build stuff; disallowing that would mean to ban dynamic metadata, and we’re back to square one. What prepare_metadata_for_build_wheel supposes to generate is the canonical form of package metadata, and also the only format that has a parser built-in in the standard library (importlib.metadata). You literally can’t be more canonical than it.

The situation is entirely the other way around from what you perceive. The standard way to receive dependencies is to build the package (either wholly or partially). Reading static dependencies directly from source is a workaround to conserve resources, while achieving approximately the same result.

OK. I think I see what @finswimmer is trying to get at here. We may be getting distracted by the general question of metadata, where the original proposal was only about dependencies.

For a project on PyPI, or indeed any wheel or sdist, it’s possible to parse the project’s name and version from the filename. In the case of sdists, it’s possible that the name/version may turn out not to match the generated metadata when you build the project, but it’s relatively reasonable to simply treat this as and error and reject that case. (Pip’s new resolver currently does this, although we’re discussing possible alternatives, like backtracking when we get the revised information).

So for a dependency resolver like poetry or pip-tools, or pip itself, that leaves only the dependency data that needs a build (or at least build metadata) step to generate. And therefore that’s the only part of the puzzle that’s costly to produce.

So having a file that specifies the project dependencies in a static format, would allow resolvers to complete the resolution process without doing that costly build step. OK, that makes sense as a use case.

But the thing is, we already have that. In a correctly formed sdist, there’s a file PKG-INFO, that contains the project metadata in standard format. The problem is, there’s no guarantee that it’s correct - because it can be generated on the fly, at build time. See this discussion for plenty more on this topic. But if you’re willing to accept the risk that the data is wrong, you already have a standard location for dependency data.

It may even be that dependencies are static enough in practice that doing this is reliable. I don’t know. I do know that we’ve chosen in pip to go down the route of using the PEP 517 prepare hook to get accurate dependency metadata. That was our choice - other people may have different priorities. One problem with the PKG-DATA file is that it doesn’t have a flag in it that says “this data is static, so you can rely on it”. So you could be reading data that’s only valid on the machine of whoever built the sdist. Caveat emptor.

So in summary, there’s no need for a new format of static metadata file for sdists. We have one. The problem is that sdist metadata isn’t actually what you want when resolving - what you want is the wheel metadata that the sdist would generate. And with existing tools, there’s no guarantee that these would be the same, and no way for projects to assert that it is the case.

Personally, I wonder whether merely adding some sort of field saying “the following fields won’t change, so you can use them for all targets”. But the discussion I linked to above goes into plenty more depth about that approach (I haven’t read it in a long while, so I don’t recall whether the conclusion was that it’s a workable idea, or whether it has flaws that I’ve forgotten…)

Thanks a lot for your answer @pf_moore.

You were pointing out that there are several situation, where a dependency resolution tool must be able to read a packge’s name, its version and it’s dependencies:

  1. It’s already build as a sdist or a wheel.
  2. It’s not build. So we have a setup.py or a pyproject.toml.

In the first case we could try to get the needed information from the metadata files, the packages should provide. But as it looks by now, no one is forced to provide these information when building these packages. I would be really happy if pypi et. al. would refuse those packages in the near future.

The second case arises when trying to resolve a package from a local path or git repository. Building the package to get access to the metadata needed, is possible, but have a huge overhead. prepare_metadata_for_build_wheel is an alternative, but isn’t really in API due to my points I mentioned above.

Metadata can be dynamic. But shouldn’t they be available in a very early step of the build process?

(reading through your linked discussion now)

Maybe. But the problem is in the details - can you clarify what you’re proposing here? For example, what changes would you propose to setuptools to make the metadata available any earlier than it is now? And what additional hook(s) would PEP 517 need to expose that capability? (I’m not asking for a full implementation strategy, just a clearer picture of what you think could be done better in the context of existing tools)

@pf_moore Typo of PEP 517?

1 Like

This kinda would go against backends having the liberty of storing their dependencies where and how they wish. Why not just use PEP-517s https://www.python.org/dev/peps/pep-0517/#prepare-metadata-for-build-wheel and parse that?

Correct. That’s a trade-off that someone would have to argue for if they wanted to push such a proposal. I don’t have much of a preference myself, so I’ll leave it to someone else to pursue, if they want to.

Hello,

I don’t think we are that far away from what I have in mind :slight_smile:

With PEP 566 and the Core metadata specifications lot of work is done. Except that the recommendation to not surround version specifier by parenthesis should be changed to not allow parenthesis to be complete PEP508 compatible.

With prepare_metadata_for_build_wheel from PEP 517 we get nearly the other half of the road. What’s the use case for this hook to create files? What I’m missing is a hook, that returns this information directly as a python data structure. This would avoid writing a parser (which might have bugs) and cleanup those files for tools that consumes these data.

Please keep in mind that PEP 517 hooks are designed specifically to be run in a separate process. It is not viable to pass around rich Python objects, and you’ll need a parser for anything more complicated than a plain string.

And since the metadata format is parsable through modules in the standard library, I don’t think there’s a large possibility of issues around parsing of the metadata.

I’d say anything that’s not JSON dump able would be out of question, at which point we can just use importlib metadata parser after prepare wheel finishes on the content written to disk.

Yes, it’s possible to read it like this. But the headline is “Build backend interface”. And an interface is not necessary a one way road. With get_requires_for_build_wheel and get_requires_for_build_sdist we have hooks that return a data structure

I was hoping that this is possible. But couldn’t find out how to use it on packages that are not installed or only on the METADATA file if the output of prepare_metadata_for_build_wheel is on a totally different location. Any hint?

I have a such usage on the tox rewrite branch, can link in when I get back to computer.

I’m not following where you’re heading with the argument. There is not a standard format to serialise multiple PEP 508 requirements, so the hook treats this as an implementation detail, and makes the hook caller de-serialise for you. But dist-info has a standard format, so you can de-serialise transparently. Or are you looking for a convenience API to wrap the dist-info de-serialisation part so you don’t have to to it yourself?

import os
from importlib.metadata import Distribution

container = ... # Create an empty directory.
dist_info_name = prepare_metadata_for_build_wheel(container, ...)  # Call the hook.
dist = Distribution.from_path(os.path.join(container, dist_info_name))
dist.metadata  # This holds metadata.

hmm

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'Distribution' has no attribute 'from_path'

:thinking:

$ python --version                 
Python 3.8.2

I think @uranusjr intended to use PathDistribution or Distribution.at.