PEP 621: how to specify dependencies?

It’ll only need TOML 1.0 if you’re mixing it with strings. This specific form will work just fine on the existing TOML 0.5.0 or 0.4.0 parsers.

1 Like

This is a human-edited file, so we should use the DSL (i.e. flit-style).

In an ideal world, we’d have a library to parse it into some suitable in-memory data structure, and that structure might even serialise easily as tables for exchanging in an externally verifiable syntactic form, but we should not ask people to write that by hand.

If we do this, packaging will gain this functionality – to be able to generate a Requirement object from this form (taking the dictionary loaded from the TOML file).

2 Likes

The PEP 508 syntax is already used elsewhere (e.g. pip install win32[all]). If a table format is used, people will still probably need to know the PEP 508 syntax, at least to some degree.

Maybe that’s a tradeoff worth making, I don’t know :upside_down_face: I wasn’t a fan of the strawman table-style example:

win32 = { version = ">2.2.0, <3.0.0", extras = ["all"], markers = "os_name == 'nt'" }

but it suddenly looks a lot nicer (IMO) if you expand it out:

[requires.win32]
version = ">2.2.0, <3.0.0"
extras = ["all"]
markers = "os_name == 'nt'"

If you choose string-style, you’ll have to build on top of PEP508 to expand functionality, or in the future allow tables for requirement specification. If you choose table-style, then expanding functionality is trivial.

Is the idea of supporting either format rejected indefinitely, or would you be open to it in the future?

Whatever specified in pyproject.toml needs to eventually be serialised into Core Metadata, so PEP 508 still needs to be expanded even if we build on top of the table style. The difference between the two formats is strictly only readability/writability (and potentially ease to parse and validate).

Dependency specification is already complex, so another advantage of forcing new complexity to fit into a DSL is to drive simplicity.

Or alternatively, it will encourage packages to find better ways to offer a compatible interface rather than making the consumer do all the work.

1 Like

This is a human-edited file, so absolutely we should use a style optimised for people to write and edit it.

However, that doesn’t necessarily mean expanded style. The expanded style is (maybe) easier for non-experts, but the PEP 508 style is concise, which suits some people’s preference. (Yes, I’m someone who prefers a concise style, and I’m an expert, so I’m in favour of PEP 508 on a personal level).

Ultimately, we have a range of users so we may need to support both styles. If we choose just one style, it will be a matter of making a trade-off - inconveniencing one group in order to support another.

Also, PEP 508 style is just as much a DSL as table-style. It’s optimised for a different use case, is all. The question is whether this use case (human edited requirements definitions) better matches one or the other of the available DSLs…

To ensure that PEP 621 doesn’t inadvertently allow the definition of dependency declarations that can’t actually be published as part of the resulting artifact metadata, I think it makes the most sense for it to specifically use PEP 508 markers.

However, I’d suggest a hybrid of the examples Brett gave, where a table is still used to separate the dependencies on different packages, but the values within the table are just PEP 508 strings rather than subtables:

# simple
colorama = "*"
# environment dependent
django = [
    ">=2.0; os_name!='nt'",
    ">=2.1; os_name=='nt'" # Affected by Windows-specific bug in 2.0
]
# with extras defined
win32 = "[all] >2.2.0, <3.0.0; os_name == 'nt'"

The normal case would just be a single string, but a list of strings would also be allowed to cover the “multiple mutually exclusive and/or mutually compatible environment markers” case.

It isn’t as self-documenting as the table version if you don’t already know the PEP 508 syntax, but it’s much easier to translate into explicit pip install commands

As noted in Brett’s initial post, Pipfile/pipenv mostly went with the “table-with-subtables” option, but there’s a shorthand for the simplest case similar to the one I describe above: if you’re only specifying a version constraint, you can just use a string.

That means the simplest way to specify a dependency is as:

pkg_name = "<PEP 440 specifier>"

The wildcard string pkg_name = "*" is a non-PEP-440 shorthand for “any version”, since pkg_name and pkg_name= aren’t legal TOML, the full spelling (pkg_name = "== *") is quite verbose, but pkg_name = "" and pkg_name = {} didn’t feel like they conveyed “any version” strongly enough.

poetry offers a similar shorthand for version specifiers (although the syntax for some of the comparison operators differs from PEP 440).

While anything more complex than that uses a subtable (typically written in the inline-dict format, rather than as a full multi-line table), many of the options that Pipfile and poetry support (e.g. editable installs, local path installs, direct URL installs, direct-from-VCS installs) are ones that should only arguably be supported in pyproject.toml as part of intended-for-publication dependency declarations, and it’s those local development and private deployment focused features that either aren’t covered by PEP-508 or are covered by PEP 508 but aren’t allowed in PyPI uploads (and hence aren’t very well known or supported by libraries) that really motivated using the table format.

It’s OK for poetry & pipenv to support unpublishable dependency declarations, as they’re both used as environment managers instead of or as well as package build tools (exclusively in pipenv’s case, optionally in poetry’s case). By contrast, it’s not really OK for PEP 621 to support unpublishable dependency declarations.

I’m aware that using full PEP 508 markers in PEP 621 would likely create demand for Pipfile/pipenv and poetry to also support the extended input format, but I don’t see that posing a major problem in the long run (the tools already cover the relevant semantics, so it should mostly just require adjustments at the file parsing layer).

1 Like

One small worry I have with specifying package names as keys (a top-level win32 = ...) is that, at some point, someone will probably try to depend on something like importlib.metadata and be tripped up by TOML’s dotted-key syntax.

That seems like another motivation to require normalized names. https://www.python.org/dev/peps/pep-0503/#normalized-names

Wait, does flit use the expanded table? I thought it used PEP 508.

I definitely meant PEP 508 style. Hopefully Paul is just bringing in some additional concerns, and not disagreeing with me while also agreeing with me :slight_smile:

Apologies. I’m not familiar with flit so I misread “the DSL” as meaning table-style.

Apparently I was agreeing with you while confusing the issue. Sorry :slight_smile:

1 Like

Thanks for bringing this up. I made a similar suggestion in the pre-PEP private conversation, but it never managed to catch on among the authors. Let’s give it another shot since I still believe this is an acceptable middle-ground.

The only difference in my proposal was to use this form in the simplest example:

simple = ""  # Instead of "*"

The main reason I prefer this is it’s easier to create a PEP 508 string from the table:

from packaging.requirements import Requirement

requirements = [
    Requirement(f"{key}{val}")
    for key, val in pyproject["dependencies"].items()
]
1 Like

Yeah, while I was in favour of the star for Pipfile, I now agree it would have been better to stay fully PEP 440 compliant and use the empty string for “any version”.

If PEP 621 opts to use the empty string, I expect Pipfile will follow suit, making the explicit star purely a backwards compatibility feature.

1 Like

As the author of Poetry some might say that I am biased but there is a reason why I did not choose to go with the PEP 508 specification: User experience/friendliness.

We are all seasoned Python developers here and we somewhat know how to read a PEP 508 string but I assure you that for newcomers and people coming from other languages this is far from the case. And reading a complete specification to even remotely understand what it does is proof of user unfriendliness.

What we are talking about here is user facing metadata, which is completely different from metadata that end up in the distributions, and should be readable and intuitive.

This becomes more apparent in cases like this:

name [fred,bar] @ http://foo.com ; python_version=='2.7'

Compare that to (Poetry example):

name = { url = "http://foo.com", extras = ["fred", "bar"], python = "2.7.*" }

To me the second one is much more explicit and readable. And there is no awkward parsing to be done, the parsing already happened at the TOML level.

And like others mentioned, if only the version range needs to be specified, it’s possible to reduce it to a single string. That’s what Poetry does: https://python-poetry.org/docs/dependency-specification/

And I’ll repost something I said elsewhere:

To me it seems Python tries really hard to be an exception in the overall programming languages ecosystem. Almost every language that I know of represents its dependencies as a dict-like structure (which helps tremendously to know if a dependency is declared by a given package instead of going through a complete list of requirements) and some of them have support an exploded representation for complex cases (Cargo, Dart and Ruby come to mind).

6 Likes

Hey, I’d said I like it. :stuck_out_tongue:

I’ll also repost something I’d looked up and mentioned earlier (albeit not on discuss.python.org):

One thing to note about these forms is that PEP 508’s is the only representation that doesn’t mention “groups”/“extras” (or equivalent) – at least, amongst the dependency specification formats I could look up easily.

The only difference is additional punctuation, a (likely redundant) “extras”, and a definitely redundant “url”.

PEP 508 syntax also supports or/and expressions in environment markers. That won’t work in the example you gave from Poetry without using an alternate table (which I assume exists, but means you have redundant syntax in there).

This is an interesting idea: what if environment markers had their own tables? That would allow reuse of platform detection expressions without copy/paste, but doesn’t complicate the simple version constraint cases.

This is basically what setuptools used to do when environment markers are keys in extras_require, but I thought people decided it’s not a good idea and chose the PEP 508 approach instead :slightly_smiling_face:

1 Like

Yeah, I’m fairly neutral on that idea, but I don’t see anything else that compelling from any other ecosystem.

The biggest advantage for new users of the expanded table format is that they’re more likely to guess that there’s another key they could use without having seen it before. They’ll still have to look up a reference to find it, but if they’ve only ever seen simple version constraints then they won’t realise that URL references or environment markers even exist (same applies to any table format with a “simplified” mode).

Some of the comparisons ought to be done on libraries with a lot of dependencies. There’s only limited difference between the two formats with a single specification, but once you multiply it by ten or twenty the PEP 508 syntax is much more efficient and no harder to read.

(Also, it’s much easier to include an existing specification by reference than creating a second specification. We also won’t have to spend the rest of time explaining why it’s arbitrarily different from what we’ve previously agreed on.)