PEP 633: Dependency specification in pyproject.toml using an exploded TOML table

Thanks for the heads-up. I wanted to get things in by the spec deadline for the PEP 621 submission. I’ve extended the date until 2020-09-08 7pm Pacific

Ah yeah, I forgot about that. But I suspect the Canadian who set the deadline also forgot about the long weekend in the US :wink:

Get it in on time though. Any late feedback will be able to be taken into account, I’m sure.

Agreed. Are these intended to be two different ways of saying exactly the same thing? That seems pretty sure to be confusing, IMO. At a minimum, the PEP should clarify why you’re proposing allowing 2 different ways.

First of all, I would like to thank everyone involved in writing this PEP, and especially @EpicWink for kickstarting it.

I will try to lay out my reservations regarding the PEP’s current form and address some of the points made in this thread.

I think this comes down to the name optional-dependencies being a poor choice because it’s not a table of dependencies but groups of dependencies which contradicts in some way the expectations introduced by dependencies.

Regarding the “direct” references, I personally dislike the current form and I think we should go with specialized keys for clarity. It also has precedence in other languages where the git key is pretty standard. Basically I would prefer to have this:

[project.dependencies]
sphinx = { git = "ssh://git@github.com/sphinx-doc/sphinx.git", revision = "main" }
pip = { url = "https://github.com/pypa/pip/archive/1.3.1.zip" }
local-package = { path = "path/to/directory"}
local-file-package = { path = "path/to/file.tar.gz"}

The forward compatibility argument is not that string since I am not aware of a new VCS system currently, and if one should pop up we could always make a new PEP to add support for it since it would require work to support it anyway.

It also allows for each VCS to have it’s own validation rule, if needed.

Regarding the hash key, I am failing to see the purpose of it since it’s not tied to the metadata of the project as far as I know.

I think it would be better for empty constraint dependencies to either be represented by an empty string or "*".

For the validation part, once we have mostly agreed on the specification it would be good to have a JSON schema in the PEP for people to use for validation.

Also, one thing in the draft that bothers me as the creator of Poetry is:

Some dependency specifications, as in Poetry, separate the PEP 508 environment markers into separate keys in the requirement. This loses the general Python-like syntax for environment markers, and also removes the ability to logically combines the markers with grouped and and or operations.

This is not true since for complex cases the markers key is available: Dependency specification | Documentation | Poetry - Python dependency management and packaging made easy

2 Likes

I do not think this is increasing existing complexity. Even in the PEP 508 form, this requires parsing, decomposition and validation. To me, when this is done this is equivalent to someone writing foo @ git+@master.

On that note, while I like the explicit use of direct in the dependency specification examples, I am wondering if this is strictly necessary. My view is that whatever parses this (hopefully a library) will need to validate the tables anyway and also identifying the keys (similar to how the grammar/regexes are currently used for PEP 508 strings) prior to processing them.

I would prefer this over the current draft.

sphinx = { vcs = "git", url = "ssh://git@github.com/sphinx-doc/sphinx.git" }

Although, I would much rather have specialised keys. I am with @sdispater on this one.

sphinx = { git = "ssh://git@github.com/sphinx-doc/sphinx.git" }

Also, for what it is worth, I have written a quick and dirty lib to;

  1. take a toml file (in the discussed form) and translate to PEP 508
  2. take PEP 508 and dump it in TOML form. The PEP 508 parsing needs bit more tweaking re: URI grammar, and the toml dump does not take into consideration inline arrays at the moment.

I might also add examples for the specialised key usages as well.

Related concepts, however different things. From my understanding, an extra is an addition to a distribution which adds functionality, while extras have a set of dependencies of their own. Now that I think about it, “optional dependencies” doesn’t make sense as a term, as if something is optional, then you don’t depend on it: perhaps “extra dependencies”?

Instead of dependency groups, I could lay out the optional dependencies as in dependencies, but require an extra key for-extras as a list of extra names in each requirement. The issue with that is that there’s a lot of duplication of specification of the extra names with that approach. I personally think the most elegant and least error-prone solution I can think of is shown in the docker-compose example.

I would like to present two options:

  1. Be explicit and have users know all keys but memorise/look-up values: have a direct table with url, vcs and revision keys
  2. Be implicit and have users memorise/look-up keys and value schema: have git, hg, etc, and revision and url keys at the top-level

I think option 2 is cleaner, more readable, more concise and more confident, however as I said it’s more implicit.

It’s very straightforward to specify validation based on type value using constants in schema.

You’re right, I suppose it’s more for lock files, not metadata. Is PEP621 intended to allow pyproject.toml become a lock file?

"*" is not a valid PEP-440 version specifier, even if it’s accepted in the ecosystem. The empty string, however, is currently disallowed and could be allowed. It was left out to be consistent with the version key not allowing the empty string, which is mostly a safety for foot-shooters. An empty object (table or string) representing no constraint makes sense to me.

Will do

I’ll update the mention: I’m mostly focusing on the marker keys.

In this case, I believe the pragmatic choice is (2). While it is implicit, it is still considerably more ergonomic and keeps things at the first level rather than nesting things further without good reason. I think at the very least we mist drop the direct nested table. Using vcs = "git" vs git = "url" is more a question of conciseness. Either way vcs specific options will creep up eventually.

The way the optional-dependencies are specified has a significant impact of any tooling. In the lib I linked to earlier the notion that there are two “types” of dependencies makes things a tad more complex to handle. Personally, I always considered extras as a grouping of dependencies (as described by @sdispater) and hence why I originally recommended the approach where the “extras” section simply specifies the names of dependencies for each package-extra.

All approaches discussed here introduces either redundant specifications or validation concerns.

If we go with the optional-dependencies table, we end up with scenarios where a dependency used by multiple extras are re-specified for each dependency.

On the other hand, if we used groupings, the name of the dependency is what is redundant. Additioanlly, here we also loose the ability to specify a different version for one dependency for each extra (group) it is used in. If this is a good thing (flexibility) or a bad thing (dependency resolution nightmare), I am not sure.

If we go with what @EpicWink suggests and use for-extras, we introduce a validation issue and in cases where conditional dependencies exist, you can end up with redundant configurations. However, this could simply mean that extras/groups are dynamic. The validation isssue can be mitigated by using something like whats shown below if required.

project.extras = ["foo", "bar", "baz"]

The caveat here is that the whole optional-dependencies table requirement is not baked yet as part of PEP 621.

Would be great to atleast allow an empty string. This can be further improved upon later if "*" becomes formally accepted.

Is PEP621 intended to allow pyproject.toml become a lock file?

No.

1 Like

Cool, it seems like the most popular option. I’ll make the change in the next update.

I think combining dependencies with extra dependencies into one iterable is what causes some of that complexity. The reference implementation in the PEP keeps them separate, and the dependency object never gets to know if it is in an, and which, extra requirements.

I agree

Which I think is fine, as you can update a dependency for one extra without having to change any of the other dependencies. In my opinion, duplicate-specification leans more to being a good thing (as opposed to duplicate declaration or duplicate implementation).

I think this makes this option a non-starter. We can’t be losing functionality from the current situation (or from our competitors, PEP 631 :smiling_imp:). Note that it’s not just version: it’s everything. This solution to just doesn’t feel right when I made some examples, and I kept forgetting to to write optional = true (although I’ve yet to implement example parsing, and maybe there lies the benefit).

That was my thinking.

That is also a possibility, but is a new key which could introduce more friction when incorporating with PEP 621.

Are you suggesting a different name for the table, or for a different design of its contents? For the different content design, see the rest of this post. For a different name, were you thinking something like optional-dependency-groups, extra-dependencies, extras-dependencies or optional-dependency-extras? For that, see the following point:

For what it’s worth, my preference would be extra-dependencies, with some real documentation on what an “extra” is in Python packaging.

@EpicWink I have created a couple of PRs to your draft. One of them includes PEP 508 compatibility examples. For now I have used bot vcs = "git" and for-extras = [] representations, but these are just placeholders for now, can be updated once decision has been made.

1 Like

I am unsure if this is the case. Because today, if specifying dependencies for a package extra, it is in-effect the same iterable. For example, this is what aiohttp lists as it’s dependencies.

$ curl -sL https://pypi.org/pypi/aiohttp/json | jq -r ".info.requires_dist[]"
attrs (>=17.3.0)
chardet (<4.0,>=2.0)
multidict (<5.0,>=4.5)
async-timeout (<4.0,>=3.0)
yarl (<2.0,>=1.0)
idna-ssl (>=1.0) ; python_version < "3.7"
typing-extensions (>=3.6.5) ; python_version < "3.7"
aiodns ; extra == 'speedups'
brotlipy ; extra == 'speedups'
cchardet ; extra == 'speedups'

I am partial to the for-extras (or similar) option, but as I mentioned before if PEP 621 has already made this decision this discussion is rather moot.

The issue, I find, with the optional-dependencies table approach is that we are transforming something linear into nested tables. That can get error prone. On the other hand, keeping it linear will also make a reasonable case for a straightforward migration. See the examples here.

I had completely forgotten about the extra environment marker. After seeing that, it has made me rethink how the extras’ dependencies are specified. Currently, there’s a possibility of specifying an extra in a dependency that is different from the extra that that depend is required for.

If for-extra we to be employed, why not add the rest of the environment markers? I guess the spec could be that the key environment markers are anded with the markers string, like how Poetry does it.

Edit: @abn, @sdispater what are your thoughts on the form represented in this PR?

I’ve added another PR on including environment markers as keys. You can comment/approve that if you’d like.

One idea I had was to prefix all environment markers with if-. This to me seems more intuitive, and actually provides a replace for for-extra (in if-extra), which in this PR is currently extra.

2 Likes

I support the “competing” PEP, but I think that is a fantastic idea!!!

1 Like

@finswimmer did you have time to check out the draft (and the current proposed changes)? The deadline is Tomorrow 7pm

Updated with above decisions (rendered: https://github.com/EpicWink/peps/blob/pep-621-exploded-dependencies/pep-9999.rst). Change-log:

  • Allow empty string to for any-version
  • Add work-around for environment marker keys drawback
  • Remove hash from requirement
  • Re-open ‘for-extra’ key issue
  • Move direct-reference keys to top-level
  • Cleanup TOML example snippets (#2)
  • Syntax highlighting (#1)

Open issues:

@EpicWink I am thinking for the sake of not increasing the scope of this PEP, we should leae the exploded markers out of this for now and keep relying on markers key. The only exception would be for-extra(s) / if-extra / extra.

1 Like

Final draft of the submission now available at: https://github.com/EpicWink/peps/blob/pep-621-exploded-dependencies/pep-9999.rst. Change-log:

  • Defer the environment marker keys idea
  • Convert optional-deps to table of reqs with extra key
1 Like

The Canadian didn’t forget because it’s also a long weekend in Canada. :slight_smile: But the deadline was stated a while ago and enough people can only do open source on weekends that it should all balance out.

I will also say there’s not real hard deadline here. The dates are to make sure people don’t drag this out. I’m definitely not going to say, “you can’t change things” and I bet neither is @pf_moore.

No, any potential lock file format would be a separate PEP.

Correct. That can change with the proper argument. But do note that naming was done based on how other communities name things (which is ironic as that’s an argument this group is making against PEP 508 :wink:).

1 Like

@abn can you sign the Python CLA (or add your GitHub to your bugs.python.org account) please?