PEP 621: how to specify dependencies?

The discussion on PEP 621: Storing project metadata in pyproject.toml has gotten long enough that I wanted to break out and explicitly call out the open issue in the PEP: how to specify dependencies? Basically the authors couldn’t agree between using PEP 508 string specifiers (like how Flit does it or go with a table approach like how Poetry or pipenv do it. The relevant parts of the PEP is pasted below.


dependencies/optional-dependencies
‘’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’

  • Format: TBD

  • Core metadata_: Requires-Dist
    (link <https://packaging.python.org/specifications/core-metadata/#requires-dist-multiple-use>__)

  • Synonyms

    • Flit_: requires for required dependencies, requires-extra
      for optional dependencies
      (link <https://flit.readthedocs.io/en/latest/pyproject_toml.html#metadata-section>__)
    • Poetry_: [tool.poetry.dependencies] for dependencies (both
      required and for development),
      [tool.poetry.extras] for optional dependencies
      (link <https://python-poetry.org/docs/pyproject/#dependencies-and-dev-dependencies>__)
    • Setuptools_: install_requires for required dependencies,
      extras_require for optional dependencies
      (link <https://setuptools.readthedocs.io/en/latest/setuptools.html#metadata>__)

See the open issue on How to specify dependencies?_ for a
discussion of the options of how to specify a project’s dependencies.


How to specify dependencies?

People seem to fall into two camps on how to specify dependencies:
using :pep:508 strings or TOML tables (sometimes referred to as the
“exploded table” format due to it being the equivalent of translating
a :pep:508 string into a table format). There is no question as to
whether one format or another can fully represent what the other can.
This very much comes down to a question of familiarity and (perceived)
ease of use.

Supporters of :pep:508 strings believe familiarity is important as
the format has been in use for 5 years and in some variant for 15
years (since the introduction of :pep:345). This would facilitate
transitioning people to using this PEP as there would be one less new
concept to learn. Supporters also think the format is reasonably
ergonomic and understandable upon first glance, so using a DSL for it
is not a major drawback.

Supporters of the exploded table format believe it has better
ergonomics. Tooling which can validate TOML formats could also help
detect errors in a pyproject.toml file while editing instead of
waiting until the user has run a tool in the case of :pep:508's DSL.
Supporters also believe it is easier to read and reason (both in
general and for first-time users). They also point out that other
programming languages have adopted a format more like an exploded
table thanks to their use of standardized configuration formats (e.g.
Rust <https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html>,
and Dart <https://dart.dev/tools/pub/dependencies>
). The thinking
is that an exploded table format would be more familiar to people
coming to Python from another programming language.

The authors briefly considered supporting both formats, but decided
that it would lead to confusion as people would need to be familiar
with two formats instead of just one.

1 Like

Just so we have something specific to compare, what exactly would each format look like? I can make some guesses after looking through the links, but I’d rather not confuse people by having the top reply show something made up on the spot by someone not even involved with the PEP.

Has it been decided that you want to include dependencies in the project metadata, and that you want to include them in the initial spec of the metadata? For example, setuptools has the dependencies in the build-options section. I don’t want to start this discussion if its not the purpose of this post however.

How would multiple version specifiers work in the table format? Would the version have to support a string for the common case of a single specified, and (inline or multiline) array otherwise? For example

[[project.dependencies]]
name = spam
versions = ~=4.2

[[project.dependencies]]
name = eggs
versions =
  >1.2rc1
  <2.0a0
markers =
  python_version < "3.9"

Edit: added environment-marker example

I think the PEP’s authors where referring to the following form:

[project.dependencies]
spam = "~=4.2"
eggs = ">1.2rc1,<2.0a0"

I’m interested in if an array would be nicer in this case—one difficulty would be the distinction between the , and the ; separator in the specifier string.

Can you detail what’s the goal of this topic? I can see equal arguments for both sides, and probably just boils down to a matter of taste. Will we put it to vote? Or just to give a platform to users to express their views? For what it’s worth I support using the PEP-508 format, cleaner to read IMHO.

2 Likes

If PEP 508 is good enough to express what needs to be expressed here (or can be made good enough with a reasonable amount of modification), then I think I’d prefer using this notation, so that there is some consistency (at least within PyPA tools, requirements.txt, on the command line, etc.).

But I get the advantage of having some of the validation delegated to the file format parser itself. Not entirely sure, it’s worth it though, the Toml validation would still deliver lots of false positives anyway.

I don’t know enough about Toml, but would it be possible for a field to accept both notations: either a PEP 508 string or an exploded table (assuming both notations stay compatible with each other)?

As per https://github.com/uranusjr/packaging-metadata-comparisons/blob/master/topics/dependency-entries.md#disadvantages-1, I think not being able to select different versions based on markers is a huge downside.

At Datadog, this feature is critical in how we define the dependencies of integrations that are shipped with the Agent.

1 Like

The marker-dependant version is possible with Poetry, which also uses a table format. It is a restriction of the Pipfile format, not TOML tables in general.

I purposefully didn’t give any examples as there is no exact agreement on what the proposal would be until we know what style people prefer. If you want preexisting ideas you can look at what Flit, Poetry, and Pipenv each do (I linked to the appropriate parts of their docs in my opening post).

If you really want a strawman for each:

requires = [
    # simple
    "colorama",
    # complex
    "win32[all] >2.2.0,<3.0.0; os_name == 'nt'",
]

versus

# simple
colorama = {}
# complex
win32 = { version = ">2.2.0, <3.0.0", extras = ["all"], markers = "os_name == 'nt'" }

Yes, although we could potentially be convinced otherwise to leave it out for some reason.

Remember that this PEP is to help standardize what we reasonably can that build tools need to build a wheel, so that makes sense. :slight_smile:

Depends on which format people prefer. PEP 508 has this baked in, the table approach typically still takes a string for the version specifier which can be separated similarly to PEP 508.

The key distinction is how you list all the parts of the dependency. So with PEP 508 you just do it as you do today in tools like setuptools and tox. In the table format you would probably have a version field, markers field, etc.

To have a conversation about how to specify dependencies as the authors of the PEP couldn’t reach an agreement among themselves.

Quite possibly. :slight_smile:

Right now the idea is to see if there is any consensus from the community around one versus another approach. If there’s not then I don’t know how we will decide. Maybe we punt on it and leave it out of the PEP, maybe a vote, maybe I choose (people already blame me for PEP 518 anyway :wink:), maybe I flip a coin.

By definition it is.

We already rejected that idea:

That is specific to Pipfile, not inherent to a table solution. There is nothing saying

django = [
    {version=">=2.0", markers="os_name!='nt'"},
    {version=">2.1", markers="os_name=='nt'"}
]

(That might require TOML 1.0.)

It’ll only need TOML 1.0 if you’re mixing it with strings. This specific form will work just fine on the existing TOML 0.5.0 or 0.4.0 parsers.

1 Like

This is a human-edited file, so we should use the DSL (i.e. flit-style).

In an ideal world, we’d have a library to parse it into some suitable in-memory data structure, and that structure might even serialise easily as tables for exchanging in an externally verifiable syntactic form, but we should not ask people to write that by hand.

If we do this, packaging will gain this functionality – to be able to generate a Requirement object from this form (taking the dictionary loaded from the TOML file).

3 Likes

The PEP 508 syntax is already used elsewhere (e.g. pip install win32[all]). If a table format is used, people will still probably need to know the PEP 508 syntax, at least to some degree.

Maybe that’s a tradeoff worth making, I don’t know :upside_down_face: I wasn’t a fan of the strawman table-style example:

win32 = { version = ">2.2.0, <3.0.0", extras = ["all"], markers = "os_name == 'nt'" }

but it suddenly looks a lot nicer (IMO) if you expand it out:

[requires.win32]
version = ">2.2.0, <3.0.0"
extras = ["all"]
markers = "os_name == 'nt'"

If you choose string-style, you’ll have to build on top of PEP508 to expand functionality, or in the future allow tables for requirement specification. If you choose table-style, then expanding functionality is trivial.

Is the idea of supporting either format rejected indefinitely, or would you be open to it in the future?

Whatever specified in pyproject.toml needs to eventually be serialised into Core Metadata, so PEP 508 still needs to be expanded even if we build on top of the table style. The difference between the two formats is strictly only readability/writability (and potentially ease to parse and validate).

Dependency specification is already complex, so another advantage of forcing new complexity to fit into a DSL is to drive simplicity.

Or alternatively, it will encourage packages to find better ways to offer a compatible interface rather than making the consumer do all the work.

1 Like

This is a human-edited file, so absolutely we should use a style optimised for people to write and edit it.

However, that doesn’t necessarily mean expanded style. The expanded style is (maybe) easier for non-experts, but the PEP 508 style is concise, which suits some people’s preference. (Yes, I’m someone who prefers a concise style, and I’m an expert, so I’m in favour of PEP 508 on a personal level).

Ultimately, we have a range of users so we may need to support both styles. If we choose just one style, it will be a matter of making a trade-off - inconveniencing one group in order to support another.

Also, PEP 508 style is just as much a DSL as table-style. It’s optimised for a different use case, is all. The question is whether this use case (human edited requirements definitions) better matches one or the other of the available DSLs…

To ensure that PEP 621 doesn’t inadvertently allow the definition of dependency declarations that can’t actually be published as part of the resulting artifact metadata, I think it makes the most sense for it to specifically use PEP 508 markers.

However, I’d suggest a hybrid of the examples Brett gave, where a table is still used to separate the dependencies on different packages, but the values within the table are just PEP 508 strings rather than subtables:

# simple
colorama = "*"
# environment dependent
django = [
    ">=2.0; os_name!='nt'",
    ">=2.1; os_name=='nt'" # Affected by Windows-specific bug in 2.0
]
# with extras defined
win32 = "[all] >2.2.0, <3.0.0; os_name == 'nt'"

The normal case would just be a single string, but a list of strings would also be allowed to cover the “multiple mutually exclusive and/or mutually compatible environment markers” case.

It isn’t as self-documenting as the table version if you don’t already know the PEP 508 syntax, but it’s much easier to translate into explicit pip install commands

As noted in Brett’s initial post, Pipfile/pipenv mostly went with the “table-with-subtables” option, but there’s a shorthand for the simplest case similar to the one I describe above: if you’re only specifying a version constraint, you can just use a string.

That means the simplest way to specify a dependency is as:

pkg_name = "<PEP 440 specifier>"

The wildcard string pkg_name = "*" is a non-PEP-440 shorthand for “any version”, since pkg_name and pkg_name= aren’t legal TOML, the full spelling (pkg_name = "== *") is quite verbose, but pkg_name = "" and pkg_name = {} didn’t feel like they conveyed “any version” strongly enough.

poetry offers a similar shorthand for version specifiers (although the syntax for some of the comparison operators differs from PEP 440).

While anything more complex than that uses a subtable (typically written in the inline-dict format, rather than as a full multi-line table), many of the options that Pipfile and poetry support (e.g. editable installs, local path installs, direct URL installs, direct-from-VCS installs) are ones that should only arguably be supported in pyproject.toml as part of intended-for-publication dependency declarations, and it’s those local development and private deployment focused features that either aren’t covered by PEP-508 or are covered by PEP 508 but aren’t allowed in PyPI uploads (and hence aren’t very well known or supported by libraries) that really motivated using the table format.

It’s OK for poetry & pipenv to support unpublishable dependency declarations, as they’re both used as environment managers instead of or as well as package build tools (exclusively in pipenv’s case, optionally in poetry’s case). By contrast, it’s not really OK for PEP 621 to support unpublishable dependency declarations.

I’m aware that using full PEP 508 markers in PEP 621 would likely create demand for Pipfile/pipenv and poetry to also support the extended input format, but I don’t see that posing a major problem in the long run (the tools already cover the relevant semantics, so it should mostly just require adjustments at the file parsing layer).

1 Like

One small worry I have with specifying package names as keys (a top-level win32 = ...) is that, at some point, someone will probably try to depend on something like importlib.metadata and be tripped up by TOML’s dotted-key syntax.

That seems like another motivation to require normalized names. https://www.python.org/dev/peps/pep-0503/#normalized-names