PEP 621: how to specify dependencies?

ofek · August 12, 2020, 6:18pm

Seems easy to rectify by splitting out whatever pip does for parsing into a separate lib (or put in packaging).

sdispater · August 12, 2020, 6:35pm

Of course you can do it this way but my point was, with TOML, you can have the information directly without introducing another dependency and complexity.

pf_moore · August 12, 2020, 6:38pm

To be fair, what pip does isn’t standardised, so do we want to make pip’s behaviour a standard here?

But this begs the question, if we do go with an exploded format, which has extensions like this that make it more expressive than PEP 508, what happens to PEP 508? If “exploded format” can’t be translated to a semantically identical PEP 508 format, how do we store what people specify in pyproject.toml in the package metadata (which mandates PEP 508 format here).

So I think that we should be very cautious about suggesting that the “expanded format” would express anything that couldn’t also be expressed as a PEP 508 string. That’s a much wider question, and not one I want to open here, if I have the choice.

Hmm, there’s a thought. Can someone please clearly state what precisely is the “expanded format” that is being proposed here? If I end up being PEP delegate, I’d expect the PEP to be very explicit about the format being proposed (I assume anyone else doing the job would too).

PEP 508 format:

[project]
dependencies = [
"foo",
"bar >= 2.0",
'requests [security,tests] >= 2.8.1, == 2.8.* ; python_version < "2.7"',
"pip @ https://github.com/pypa/pip/archive/1.3.1.zip#sha1=da9234ee9982d4bbb3c72346a6de940a148ea686"
]

Exploded format:

What?

I’d take the view that if there’s even a question to be answered here, someone needs to decide pretty quickly exactly what is being proposed as the “expanded format”, and write it up in the same format as the rest of the PEP. (And how to map between PEP 508 and the expanded format should be clearly specified, too).

The example in the PEP should also be updated to include dependencies, in whichever format is going to be proposed.

ofek · August 12, 2020, 6:40pm

Not really. You’d want an central algorithm for parsing, even if it’s as simple as if 'git' in data: dep_type = GitDep.... Otherwise we all have to implement the same thing for every tool.

finswimmer · August 12, 2020, 7:08pm

To expand @sdispater’s list:

Easy extensible: We might come to point where we need more keys to describe a dependency, e.g. what type of gpu must be present. Users just need to learn how this new key is called. Much easier for them then to learn how this is integrated in an endless string.

ofek · August 12, 2020, 7:40pm

That’s the strongest argument for the table imo.

brettcannon · August 12, 2020, 8:00pm

Probably not. I know I at least purposefully never specified it as that’s a separate bikeshed to paint. Pipenv and Poetry have differing names and such, so there isn’t a unified format to go with. If you want to start with what Poetry has, you can look at Dependency specification | Documentation | Poetry - Python dependency management and packaging made easy (although it doesn’t appear to document extras which is a field which takes an array of strings).

If you want a bunch of alternatives, https://github.com/brettcannon/pyproject-metadata/issues/17 has a lot from when we initially discussed this topic (that repo is private to PEP co-authors).

If you want to see other languages and how they do it, https://github.com/brettcannon/pyproject-metadata/issues/17#issuecomment-624624249 covers Rust, Dart, and Ruby.

But it still needs to encode to PEP 508 in order to work in a wheel file. So that doesn’t buy us anything unless you can show that some future addition cannot be nicely expressed using some addition to PEP 508 while it can be nicely expressed in TOML (and this is purposefully ignoring how we would then handle it for expressing in wheels as that’s a separate discussion if this were to occur).

And I think this is the current crux of the problem. There may be a time where PEP 508 is demonstrably worse than some exploded format. But that day does not exist and planning for a possibility instead of an guaranteed eventuality is quite pricey when we would be asking the Python community to understand two formats for who knows how long (e.g. tox or PEP 518 may never ditch PEP 508 and so that will help keep that encoding around “forever”).

I will say I don’t think supporting both in a transition would have an ambiguity problem. This is not meant to support or argue against the point that there is a definite cost if we went with an exploded format due to this, just that we could transition if we had the stomach for it. I would expect we would extend ‘packaging’ with a function that takes a dict from a TOML parse and it returned the appropriate representation for the build system regardless of how it was specified (which we maybe should do anyway unless pep517 has something like that already).

EpicWink · August 12, 2020, 11:42pm

Art the end of the day, PEP 621 allows for dynamic fields, including dependencies. Tools could have a feature which allows the user to specify the dependencies in exploded table format in another key of pyproject.toml (eg project-dependencies-table)

Perhaps this feature would become popular enough that tools have a fast path to handle this key for dependencies (yay for ecosystem-defined standards; at least it would be borne out of popularity)

domdfcoding · August 14, 2020, 9:33am

Brett, the GitHub links you gave return a 404 error. I don’t know if your repository is private?

brettcannon · August 14, 2020, 4:08pm

Yep, it’s private to the co-authors of the PEP; sorry for forgetting to mention that (I also never got permission to open it up, hence why I probably will leave it private). Paul has access, though, and the examples are that interesting; they are just variations on the exploded table format which you can only get so creative with.

pf_moore · August 14, 2020, 5:18pm

Thanks. I won’t look at these again now (I do recall the discussion) as ultimately what matters is what’s in the PEP that gets submitted for approval. If it’s going to have an “exploded” format, the colour of that bikeshed needs to have been decided by that point - that’s really all I was trying to point out here.

With my personal / co-author hat on, I don’t really care, and I’m explicitly trying not to develop an opinion as I feel that I may end up being PEP delegate (no-one’s objected yet!) and I want to remain somewhat impartial for that. But the “exploded format” proponents do need to get their act together and come up with an agreed definition, or they stand every chance of losing the debate by default (I definitely won’t accept a PEP with “details to be decided later”).

pganssle · August 14, 2020, 5:46pm

I’ll note that I think one of the reasons this has not been done is that the first part of the discussion we’ve been having is whether or not to do an exploded format. I think we have a general sense what such an exploded format would look like, and what its advantages are, and the question of whether or not to go with one has not come down to the fine details.

Personally, I think the fact that we don’t have to design a second way to specify dependencies is one of the perks of going with PEP 508: those decisions are already made and we don’t have to spend any further time debating them. I probably wouldn’t weigh in on design decisions of an “exploded TOML” format until after it was clear that we were abandoning PEP 508.

finswimmer · August 14, 2020, 6:28pm

One more argument for exploded tables:

Custom properties: If we allow this - and in my opinion we should do this - build-backends could use the exploded table to store custom properties (maybe prefix with something like x- that say, this is custom) they need for their work. For example poetry has a flag develop to indicate, if a dependency should be installed in editable mode. More are imaginable.

pganssle · August 14, 2020, 6:53pm

I think pretty much anything that involves expansions will require a corresponding change in PEP 508, as was mentioned in the somewhat related argument above.

Even for situations where it’s just information for the specific backend (and I would think that would probably be stuff that goes in the backend-specific configuration if it’s not actually standardized, TBH), we should be able to allow explicit flags in even the PEP 508 version in a potentially backwards-compatible way by accepting either a string or a list (or some other structure) for any given entry, so some variation on this:

[dependencies]
"requests"
{dependency = "something @ git+https://somewhere", x-custom="value"}

finswimmer · August 16, 2020, 5:29pm

It depends. If it’s something related to how the dependency is described, yes. If it’s an additional information for a specific tool (in my example, poetry should install this dependency in editable mode), no.

This leads us to the question, already asked somewhere earlier in this thread, who is the audience of PEP 621 and the dependency specification. I see three groups:

Human beings
For me the most important audience. This can be people just want to look up information about the package or those who want to add/modify the metadata. The knowledge of these people can be quite different. It can be beginners or experts, they can have experience in other language ecosystems or not.
In my opinion, the best way we have, to satisfy most of the people is the exploded table, because it is self-explaining when reading and are familiar to those coming from some other languages.
Tools using the metadata (except build-backends)
This is mainly already describe in the Motivation part of PEP 621. Those tools can be package manager that try to resolve dependencies. But that can also be tools that try to do some statistics on the metadata. Having in exploded table for dependency definitions, they can have direct access on the keys and there values, which makes it a lot easier.
Another use case is shown by poetry, which uses the dependency definition for building the package and manage the virtualenv. For the later one additional keys can be added to the dependency to provide more functionality. Without that possibility, users had to store those flags somewhere else, leading to more or less redundant data, as they have to define dependencies (at least the name) a second time.
Build-Backends
Of course the build-backend needs the information as well. They need to create a PEP 508 compliant string for wheels. Concatenating the several parts from the exploded table is not a big thing.

The order of these groups I gave, corresponds to the importance I see. PEP 621 should be an interface. And so it should be made as easy as possible for people and tools to use it.

pf_moore · August 16, 2020, 6:25pm

If it’s for a specific tool, it should be in the tool-specific section of pyproject.toml, not in one of the standard-defined sections.

As you say, this is a matter of opinion.

Tools analysing metadata should be using the final METADATA file available in the wheel - or whatever ends up being the formal metadata in the sdist, once sdists are standardised. PEP 621 is not intended as a way to let tools avoid using the official source of metadata in those cases - and the official metadata spec mandates PEP 508 format. Tools will have to use PEP 508 for those situations, so using PEP 508 for pyproject.toml is easier because you then only have one thing you need to implement.

Again, it’s mostly a matter of opinion which of those two arguments you find compelling.

Not having to do so is even less of a big thing…

steve.dower · August 16, 2020, 6:29pm

Maybe it’s time to run the survey. I propose these options:

I think we should use PEP 508
I volunteer to develop, test, validate, and specify a new PEP so I can vote for that one

(FTR, I’m voting for the first.)

finswimmer · August 16, 2020, 7:20pm

That’s a matter of definition. If we define: “Here are the dependencies for this package, the following keys describe the dependency […] and there might be additional information for different use-cases other then dependency resolution”, it’s absolutely fine to store them here, because all these information belonging to the dependency.

So what’s the goal then for PEP 621? I’m really interested in the answer! Because at the moment the PEP seems to contradict itself:

Once a build back-end has produced an artifact, then the metadata contained in the artifact that the build back-end produced should be considered canonical and overriding what this PEP specifies. In the eyes of this PEP, a source distribution is considered a build artifact, thus people should not read the metadata specified in this PEP as the canonical metadata in a source distribution.

But later:

When metadata is specified using this PEP then it is considered canonical […]

Sure. But throwing away all other advantages, just to make a quite easy thing more easy?

ofek · August 16, 2020, 7:46pm

Not really imo: PEP 621: how to specify dependencies? - #144 by ofek

njs · August 16, 2020, 8:47pm

I think one of the major gaps here is that to some of us it’s obvious that the exploded table is more readable, but to others that’s not obvious at all :-). How is one of these more self-explanatory to the naive reader?

# exploded table
[dependencies]
some-package = ">= 1.2"
another-package = {"version >= 0.3", python = "<3"}

# PEP 508
dependencies = [
    "some-package >= 1.2"
    "another-package; python_version < '3'"
]

I just don’t see how slightly different quoting, or the choice of python versus python_version, is going to make one of them more self-explanatory for naive users.

I can see that you and @sdispater and others are passionate about the exploded form being better, so I know I must be missing something. Can you help us bridge that gap? What is it that you’re fighting for, that makes this decision so important to you?