PEP 633: Dependency specification in pyproject.toml using an exploded TOML table

Competing with PEP-631 for specifying the dependencies in PEP-621.

Rendered: https://github.com/EpicWink/peps/blob/pep-621-exploded-dependencies/pep-9999.rst

Some thoughts:

  • Instead of dependencies being a table, it could instead be an array of tables, to avoid the same-name-multiple-requirements problem. In this version this is handled by allowing values to be either a table or an array of tables.
  • Currently in the ecosystem the key for version specifiers is version, but I think versions makes more sense. I havenā€™t seen it in discussion however so Iā€™ve stuck with version
  • Another option for the value of version is an array of version specifier clause strings, eg version = [ '~= 3.1', '!= 3.1.3' ] instead of version = '~= 3.1, != 3.1.3'. This would better better represent the data-models in packaging
  • Specifying a version-only requirement is very straight-forward using dotted keys, which I hadnā€™t realised before: numpy.version = '~= 1.18'. You can of course use dotted keys to specify the other requirement keys, but itā€™s up to the write to choose when to switch an inline-table.
  • How can I help getting syntax highlighting (via ofekā€™s PR) in these PEPs?
Change-log (click to expand)

2020-09-03

  • Add open-issue notes on removing ā€˜optional-dependenciesā€™ table
  • Add open-issue note on environment-marker keys
  • Add motivation to contrast with the PEP-508 string implementation
  • Fix incorrect ā€˜versionā€™ keys in example
  • Switch to ā€˜directā€™ table for direct references
  • Add depenencies-array to rejected ideas
  • Add note on why markers werenā€™t split

2020-09-06

  • Add example implementation for conversion to PEP 508
  • Show consistent examples
  • Make note about ā€˜fileā€™ protocol for direct requirements
  • Reject alternate definitions of extra requirements
  • Include more arguments against separate ā€˜revisionā€™ field
  • Allow version specifiers as requirement specifiers

2020-09-08

  • Allow empty string to for any-version
  • Add work-around for environment marker keys drawback
  • Remove hash from requirement
  • Re-open ā€˜for-extraā€™ key issue
  • Move direct-reference keys to top-level
  • Cleanup TOML example snippets (#2)
  • Syntax highlighting (#1)

2020-09-09

  • Defer the environment marker keys idea
  • Convert optional-deps to table of reqs with extra key

Just to start the ball rolling on discussions, why ā€œgitā€? Are other VCS systems like Mercurial or Subversion not supported? (Pip supports them currently).

(I donā€™t really have a vested interest in the answer, just in triggering a discussion here - if no-one comments on this proposal itā€™s dead in the water, TBH).

Yeah probably you want vcs key, and within that git/hg/svn/etc :+1:

It makes more sense if you think the common thing is specifying more than a single version. But then again you are specifying version constraints, not multiple versions to install. So I would still argue that version is a better fit.

I think thatā€™s your call if you want to go all-in on TOML. If you do then you could even propose dropping markers for keys for each marker type. That then ties into TOML schema validation (if that becomes a thing).

Huh, I had no idea either!

I would just assume you canā€™t. Itā€™s not like you can guarantee people writing this stuff will have syntax highlighting anyway, so seeing it in stark black-and-white is a reasonable thing for comparison.

Regarding the vcs dependencies, I reckon they should be inline with the url dependencies etc. And use the key to identify the VCS in use.

git-project = { git = "https://git.example.com/MyProject.git", revision = "da39a3ee5e6b" }
hg-project = { hg = "http://hg.example.com/MyProject", revision = "da39a3ee5e6b" }
svn-project = { svn = "svn://svn.example.com/svn/MyProject", revision = "2020" }

This is a bit more concise etc than using vcs = "git" syntax, and while they are ā€œVCSā€ dependnecies they all need to be treated differently anyway by the tooling.

Regarding the optional-dependencies, I would like to suggest that we use optional = true in the dependency table. This is something poetry has been using for a while and makes reuse of a dependency in multiple extras a bit more easier.

[project.dependencies]
foo = { version = ">3.6,<4.0", optional = true }
bar = { optional = true }

[project.extras]
foo = ["foo"]
bar = ["foo", "bar"]

Thanks for starting this discussion @EpicWink. Myself and @finswimmer had started on some related content as well, we will consoidate that and pull that into this thread as soon as we get a chance, if that makes sense to do so.

1 Like

I was thinking along the same lines as well. However, I wonder if we should maintain markers as an escape hatch until this is more solidified.

An example for this might be something like this.

foo = [
    { version = "1.0.1", python.version = "~=3.6", platform.system = "Windows", implementation.name = "cpython" },
    { version = "0.5.9", python.version = "~=3.6", platform.system = "Windows", implementation.name = "pypy" },
    { version = "1.0.2", python.version = "~=3.6", platform.system = "Linux", implementation.name = "cpython" },
    { version = "2.0.0", python.version = ">=3.7", platform.system = "Windows" },
    { version = "2.0.1", python.version = ">=3.7", platform.system = "Linux"}
]

You and @pfmoore are right. I need to research the different VCSs to get the forms of the different revision identifiers.

I was more thinking along the lines of that I read version = '> 3.2' as ā€œa valid version is greater than 3.2ā€, and versions = '> 3.2' as ā€œvalid versions are greater than 3.2ā€. This is more an idea I wanted to make sure is out there and argued against, and I donā€™t have a preference.

This format needs to be a consensus, but like the version key, I want the idea to be considered. Iā€™m not sure thereā€™s a benefit, when itā€™s so easy to split and strip the version string, and brackets add to clutter.

I had thought about that: if we did that, the format would get more complicated. Weā€™d have to come up with a TOML replacement for the logical operators allowed in the environment markers (even if theyā€™re not likely common in the real-world). In addition, weā€™d lose the concise and familiar Pythonic syntax of environment markers, and probably some extensibility.

I need to put this in the rationale.

I like the idea of explicitly stating the VCS in the requirement, so a reader can know exactly which to install to use the package. Iā€™m not a fan of using the key to specify which VCS, as it can (highly unlikely) conflict with a future key and the fact that itā€™s a VCS-type requirement becomes more implicit, but it does make sense from a readability and ease-of-use standpoint.

Again, I have to reach other VCSā€™s revision identifiers, but Iā€™d rather keep the revision in the same string, as that seems most common in the wider industry.

This makes sense, however remember that the optional dependencies are currently used to define dependencies for each of the projectā€™s extras, so rather than optional = true, it would have to specify which extra it is for, eg for-extra = 'accelerated'.

Thanks for the feedback, Iā€™ll update the PEP tonight.

The discussions around PEP 610 might be useful for this perhaps.

This is a double edged sword unfortunately. From a tooling perspective, this has made several headaches. For example, updating one marker requires parsing and then cosntructing the string again. Additionally, there is also a slightly higher probably for errors (although this is not different from what we have today).

Considering the most common change is to update the revision (both using tooling and also user), I would be of the opinion that we should lean into TOMLā€™s capabilities and make use of a seperate revision key. Additionally, the use of the string is common, I feel, because there is not many other options. The disconnect between the pip vcs urls standard git urls, I feel is also problematic. Just adding my 2c here. Also, the ability to make use of other keys like tag and branch for git dependencies also could have future benifit.

I had considered this as well. However, being biased from a tooling perspective, it made more sense to have extras specified as indicated in my example above. But either works from my point of view. I would prefer one of these approaches over the optional-dependencies section appraoch.

I agree this is an easier format to write, but this would put extra burden to validate, both for tools and a user reading it. Making each VCS its own separate key would open the possibility of internally inconsistent inputs, e.g.

package = { git = "...", hg = "..." }

Python packaging does not support such specifications, but the user may (not unreasonably) expect this means e.g. use one of these URLs to fetch the package and ask for this feature from the tools. The vcs format would make it clear that you can only specify one URL at a time.1

it is also more it more difficult to do forward compatibility, since the format would need to review every new VCS Python packaging wants to support and invent a new (non-conflicting) key. The vcs specification OTOH only needs to specify one unique key for each VCS it supports, and does not need to worry about conflicts with other unrelated keys (e.g. a hypothetical future VCS thatā€™s commonly known as url).


1 The same argument actually applies to another part of the exploded table format as well,2 specifically the version and url keys. The specification package = { version = "...", url = "..." } is also internally inconsisntent, and I would strongly prefer if the format can eliminate this possibility. But it seems like people like this format a lot, so maybe this is where the line should be drawn to balance writability and validity.

2 And to be clear, PEP 508 has the same issue. It is not immediately obvious to many people why they canā€™t do package >= 1.0 @ https://example.com/package.tar.gz. They canā€™t, itā€™s not a problem, but the exploded TOML format provides an oppertunity to improve the UX around it.

1 Like

I hadnā€™t considered tooling to update pyproject.toml. With that in mind, perhaps it makes sense to really lean in to making everything a key. Having said that, the following should be reliable to update the revision in this case:

req = pyproject["project"]["dependencies"]["my-package"]
req["vcs"] = req["vcs"].rsplit("@", 1)[0] + "@" + new_revision

I should have stated my main issue with your proposal in my previous reply, which is that thereā€™s a separation of some of the specification of the requirement: some in the dependencies table, and some in the extras table. In your example, itā€™s not egregious, but libraries with many required and extra dependencies, it could cause de-synchronisation. In addition, I canā€™t think of a simple way to specify which of an array of requirements in the extras table.

Thanks for the tip. Iā€™m updating the PEP with PEP 610 and PEP 440ā€™s definition of direct URLs, which consolidates URL and VCS targets. Should I explicitly mention file URLs?

The PEP is now updated (renders: https://github.com/EpicWink/peps/blob/pep-621-exploded-dependencies/pep-9999.rst). Change-log:

  • Add open-issue notes on removing ā€˜optional-dependenciesā€™ table
  • Add open-issue note on environment-marker keys
  • Add motivation to contrast with the PEP-508 string implementation
  • Fix incorrect ā€˜versionā€™ keys in example
  • Switch to ā€˜directā€™ table for direct references
  • Add depenencies-array to rejected ideas
  • Add note on why markers werenā€™t split

How does that translate into PEP 508 syntax? Iā€™m genuinely not clear what the example you give is meant to mean.

(Remember, project metadata is not changing, it will be in PEP 508 format. This is just an input format. So itā€™s essential that the proposal is completely clear how you construct a PEP 508 format string from TOML input. No matter how much you dislike PEP 508, thatā€™s a pre-existing reality that you have to deal with).

1 Like

@abn @EpicWink thanks for picking this up before I could.


puts on Python-packaging-contributor hat

I feel like I should say: Please donā€™t try to provide/do anything different from what PEP 508 does. Weā€™re not reinventing the dependency specification format. If youā€™re looking to reinvent that, well, thatā€™s a much broader conversation and now is not the time to do it. Youā€™re setting yourself up for an uphill battle and, IMO, failure of both this PEP and perhaps PEP 621 as a knock-on effect due to folks getting frustrated by the scope-creep of the process. :slight_smile:

IMO, the approach to take for this PEP would be defining a clear TOML value ā†’ PEP 508 string transformation. Essentially, if it can be represented in a list-of-PEP-508-strings, it should be represent-able in whatever format is specified here. Anything else, and, itā€™s not going to work out.

Please, letā€™s not go in that direction.

This makes it significantly difficult to specify extras (which is what has been renamed into the optional-dependencies table) and I strongly suggest to not do this. This does not have a parallel w/ PEP 508 and youā€™re making it more difficult for users who do specify PEP 508 strings to adapt.


puts on his TOML-maintainer hat

I have a few suggestions about the schema design as well as how you are presenting it here.

  • Have a single preferred format that you suggest to users. Either use one-dependency-table-per-line or the each-dependency-is-a-table for all the examples. Otherwise, youā€™re creating confusion for the reader which is counter to your intent in any documentation. IMO the one-dependency-per-line format, with the ā€œif youā€™re doing something more complex, you can use a table too!ā€ will work well here.

  • For same-name-multiple-requirements, allowing values to be an array of tables seems to be the most appropriate choice. Permitting the outer table to be represented as an array is trying to solve the problem at the wrong level.

  • dependencies = { flask = {}, django = {}, numpy.version = '~=1.18' }
    

    This is not a good example, since itā€™ll need users who add more dependencies to split them across multiple lines and create additional work unnecessarily. I suggest not even presenting this.

  • You have a ā€œsimple/commonā€ and ā€œcomplex/uncommonā€ formats (normal version-only dependencies, and fancier ones). Itā€™s a good form to define a transform from the simple one into the complex one. Have numpy = ">= 1.18" automatically get transformed into numpy = { version = ">= 1.18" }. This avoids ā€œleakingā€ information about the complex cases into the simpler ones and let users who only care about the simple cases, get away without needing additional keys/tables.

  • Donā€™t use dotted keys to make things terser when they donā€™t add clarity ā€“ numpy.direct = "URL" in the middle of lines like scipy = {...} is not easier to read, and using that will make the table less easy to skim.

Notably, TOML does allow you to represent values in various ways, but the idea should always be to prefer more consistent representations that make it easier to skim through the file and find the relevant values. Presenting all the possible variations of TOML syntax for a value, should not be what you provide in specifications of behaviours IMO.


Iā€™d like to see this PEP discuss costs for the transition involved as well (since itā€™s a legitamate concern against making the switch). Ideally, itā€™d also make recommendations for tooling (like tox) who are currently using the PEP 508 form.

2 Likes

Oh, I missed a couple of points here:

  • Iā€™d originally suggested the renaming of the requires/extras tables to dependencies/optional-dependencies in PEP 621ā€™s discussions. This was based on a quick look around at how other ecosystems represent their dependencies. We settled on this naming based on a survey of other ecosystems + the mechanism we used for making decisions among the PEP 621 authors.

    So, PEP 621 does specify both these tables. If this PEP wants to deviate from that, please provide strong reasoning for why it does so, and what the trade-offs are here. Orā€¦ stick to what PEP 621 does and avoid that additional work. :slight_smile:

  • This change would make the effort needed for users switching from existing setup.py based files to these formats harder, since thereā€™s now more effort needed to specify extras.

@pf_moore @pradyunsg appreciate your input on this. Regaring the optional dependencies, I was not aware that the ā€œoptional-dependenciesā€ table was already part of the PEP 621 discussion. I might have overlooked that. With that in mind, and with the interest of limiting the scope of this PEP to keep things in line with PEP 508 so that we have a sensible transition, I agree that usint optional = true might not be the right approach. The main, probably the only reason, I dislike the current approach is that this leads to some re-definition of a dependnecy for each extra. If the consensus is that this is okay, then I am happy to be on-board with this approach. However, I do feel using something similar to what @EpicWink suggested might be more ā€œparallelā€ to PEP 508 specification.

Responding just for completeness, the intent was more to decouple the extras definition, ie. what extras the package provide and what dependencies (among the optional ones) each extras need. In retrospect, I do accept that I am wearing my poetry coloured glasses as we have been using that syntax a while now. As @pradyunsg pointed out, this might not be right forum for this change as this is a scope creep.

I feel like we should as this will be required soon anyway for direct references as far as I understand it.

Having had to deal with headache of doing just this on various occassion, the disparity between what git consideres a valid url vs what pip considers a valid url has been painful. I would like to, if possible avoid this in the future. This was/is my motivation to have the type, url and revision separated at the very least.

I can definitely see the argument here. However, no matter how we proceed the tooling will need to validate these anyway. Since TOML does not (at least I do not think so), have a native way to validate the ā€œschemaā€ of a table (inline or otherwise) this I feel is inevitable. Atleast from the poetry side, we have had to rely on internal schema validaiton for our implementation. The burden to validate exists for both cases I feel.

I would have thought this is the same in both options. In case we use the key vcs = "foo" we still need to ensure that it does not conflict with pre-existing vcs type. I doubt there is a forward compatibility trade-off here.

My question still remains, though - and itā€™s something the PEP should clarify in general. The PEP needs to explain precisely how the syntax it proposes translates to PEP 508 form, as thatā€™s a translation that every backend using this syntax will have to implement, if itā€™s to write standards-compliant metadata. Having a well-defined translation process in the standard is therefore key.

1 Like

Thanks for the feedback. For the most part, people are happy with the schema it seems. The rest is other PEP stuff that isnā€™t critical for the Tuesday September 8th deadline. Just two items to address:

Makes sense, and the implementation will already need an isinstance for the array-form, so this option is okay, however it may beg the question ā€œwhy canā€™t we have the rest of PEP508 in the specifier stringā€ down the line. Weā€™ll see what the community wants I guess.

Perhaps it makes sense to split revision out into a separate key. The issue Iā€™m having with that is that thereā€™s now two keys which can only exist when pointing to a VCS repo, but revision without vcs is invalid. This is complicated.


Regarding the others stuff, Iā€™ll make the examples consistent and have a note saying other TOML forms are recommended after gaining familiarity with TOML.

You canā€™t split an inline-table (in TOML v1.0rc2, at least)

I donā€™t understand what you mean. Arenā€™t the costs for the user just ā€œif you want to use the new format then youā€™ll need to learn itā€? Is there costs for tooling as well, beyond implementation of PEP 621?

Iā€™ve successfully used jsonschema to validate parsed TOML in the past. Remember that TOML directly maps to JSON. I think the toml package has some missing validation however, or perhaps uses an older version of the spec than what I was referencing against.

Iā€™ll attempt to add that. As far as I know, thatā€™s little more than a string template right?

I donā€™t know, thatā€™s the point :wink: Thereā€™s been some ideas floating around that may not translate well (broken down VCS revisions, for example).

As tox isnā€™t reading source tree medatada directly, it wonā€™t care about PEP 621, and so this would be irrelevant, surely? Using ā€œexpanded TOML formā€ as an alternative to PEP 508 in any context other than PEP 621 is way out of scope for this discussion, surely?

If I donā€™t get any comment on the specification by 2020-09-08 7pm Pacific, Iā€™m going to take that as no one having any issues with it. The current main open issue is splitting the VCS revision from the URL. Other issues include:

  • Iā€™m on the fence about allowing string requirements to represent version-only requirements (it is, however, a straight-forward path to allowing PEP-508 strings as requirement specifiers).
  • Having a separate key for each environment marker (see PEP for arguments).
  • Splitting hash-type from hash-value in direct table.

Everything thatā€™s currently in the PEP is the current proposal, and will be submitted at 2020-09-08 7pm Pacific.


The PEP is now updated (rendered: https://github.com/EpicWink/peps/blob/pep-621-exploded-dependencies/pep-9999.rst). Change-log:

  • Add example implementation for conversion to PEP 508
  • Show consistent examples
  • Make note about ā€˜fileā€™ protocol for direct requirements
  • Reject alternate definitions of extra requirements
  • Include more arguments against separate ā€˜revisionā€™ field
  • Allow version specifiers as requirement specifiers

I dislike the mismatch between, say:

requests = { extras = [ 'security', 'tests' ], ... }

and:

[project.optional-dependencies]

I think having two possible answers for ā€œwhat is security in requests[security]ā€, ā€œan extraā€ and ā€œa set of optional dependenciesā€, is really confusing.

Just FYI, itā€™s a long weekend in most (all?) of the United States this weekend, and hopefully people are taking advantage of that to not be arguing about stuff online for a change :slight_smile: You might want to allow an extra day.

1 Like