PEP 621: how to specify dependencies?

steve.dower · July 1, 2020, 4:28pm

Wait, does flit use the expanded table? I thought it used PEP 508.

I definitely meant PEP 508 style. Hopefully Paul is just bringing in some additional concerns, and not disagreeing with me while also agreeing with me

pf_moore · July 1, 2020, 5:24pm

Apologies. I’m not familiar with flit so I misread “the DSL” as meaning table-style.

Apparently I was agreeing with you while confusing the issue. Sorry

uranusjr · July 2, 2020, 3:24am

ncoghlan:

However, I’d suggest a hybrid of the examples Brett gave, where a table is still used to separate the dependencies on different packages, but the values within the table are just PEP 508 strings rather than subtables:
# simple
colorama = "*"
# environment dependent
django = [
    ">=2.0; os_name!='nt'",
    ">=2.1; os_name=='nt'" # Affected by Windows-specific bug in 2.0
]
# with extras defined
win32 = "[all] >2.2.0, <3.0.0; os_name == 'nt'"
The normal case would just be a single string, but a list of strings would also be allowed to cover the “multiple mutually exclusive and/or mutually compatible environment markers” case.

Thanks for bringing this up. I made a similar suggestion in the pre-PEP private conversation, but it never managed to catch on among the authors. Let’s give it another shot since I still believe this is an acceptable middle-ground.

The only difference in my proposal was to use this form in the simplest example:

simple = ""  # Instead of "*"

The main reason I prefer this is it’s easier to create a PEP 508 string from the table:

from packaging.requirements import Requirement

requirements = [
    Requirement(f"{key}{val}")
    for key, val in pyproject["dependencies"].items()
]

ncoghlan · July 2, 2020, 11:23am

Yeah, while I was in favour of the star for Pipfile, I now agree it would have been better to stay fully PEP 440 compliant and use the empty string for “any version”.

If PEP 621 opts to use the empty string, I expect Pipfile will follow suit, making the explicit star purely a backwards compatibility feature.

sdispater · July 3, 2020, 2:00pm

As the author of Poetry some might say that I am biased but there is a reason why I did not choose to go with the PEP 508 specification: User experience/friendliness.

We are all seasoned Python developers here and we somewhat know how to read a PEP 508 string but I assure you that for newcomers and people coming from other languages this is far from the case. And reading a complete specification to even remotely understand what it does is proof of user unfriendliness.

What we are talking about here is user facing metadata, which is completely different from metadata that end up in the distributions, and should be readable and intuitive.

This becomes more apparent in cases like this:

name [fred,bar] @ http://foo.com ; python_version=='2.7'

Compare that to (Poetry example):

name = { url = "http://foo.com", extras = ["fred", "bar"], python = "2.7.*" }

To me the second one is much more explicit and readable. And there is no awkward parsing to be done, the parsing already happened at the TOML level.

And like others mentioned, if only the version range needs to be specified, it’s possible to reduce it to a single string. That’s what Poetry does: Dependency specification | Documentation | Poetry - Python dependency management and packaging made easy

And I’ll repost something I said elsewhere:

To me it seems Python tries really hard to be an exception in the overall programming languages ecosystem. Almost every language that I know of represents its dependencies as a dict-like structure (which helps tremendously to know if a dependency is declared by a given package instead of going through a complete list of requirements) and some of them have support an exploded representation for complex cases (Cargo, Dart and Ruby come to mind).

pradyunsg · July 3, 2020, 2:29pm

Hey, I’d said I like it.

I’ll also repost something I’d looked up and mentioned earlier (albeit not on discuss.python.org):

pradyunsg:

This was the final push needed, for me, to aggregate what other ecosystems are doing, and… here’s the result:

Cargo

Specifying Dependencies - The Cargo Book

[dependencies]
time = "0.1.12"

[dependencies]
some-crate = { version = "1.0", registry = "my-registry" }

[dependencies]
rand = { git = "https://github.com/rust-lang-nursery/rand", branch = "next" }

[target.'cfg(windows)'.dependencies]
winhttp = "0.4.0"

[target.'cfg(unix)'.dependencies]
openssl = "1.0.1"

[target.'cfg(target_arch = "x86")'.dependencies]
native = { path = "native/i686" }

[target.'cfg(target_arch = "x86_64")'.dependencies]
native = { path = "native/x86_64" }

Dart

Package dependencies | Dart

dependencies:
  transmogrify: ^1.0.0

dependencies:
  transmogrify:
    hosted:
      name: transmogrify
      url: http://some-package-server.com
    version: ^1.0.0

dependencies:
  kittens:
    git:
      url: git://github.com/munificent/kittens.git
      ref: some-branch

dependencies:
  transmogrify:
    path: /Users/me/transmogrify

Ruby

https://bundler.io/man/gemfile.5.html

gem "nokogiri"
gem "RedCloth", ">= 4.1.0", "< 4.2.0"
gem "wirble", :groups => [:development, :test]
gem "some_internal_gem", :source => "https://gems.example.com"
# There's shorthands for GitHub (:github, :gist) and BitBucket (:bitbucket) as well.
gem "rails", :git => "git://github.com/rails/rails.git"

gem "rails", :path => "vendor/rails"

One thing to note about these forms is that PEP 508’s is the only representation that doesn’t mention “groups”/“extras” (or equivalent) – at least, amongst the dependency specification formats I could look up easily.

steve.dower · July 3, 2020, 8:50pm

sdispater:

This becomes more apparent in cases like this:
name [fred,bar] @ http://foo.com ; python_version=='2.7'
Compare that to (Poetry example):
name = { url = "http://foo.com", extras = ["fred", "bar"], python = "2.7.*" }
To me the second one is much more explicit and readable. And there is no awkward parsing to be done, the parsing already happened at the TOML level.

The only difference is additional punctuation, a (likely redundant) “extras”, and a definitely redundant “url”.

PEP 508 syntax also supports or/and expressions in environment markers. That won’t work in the example you gave from Poetry without using an alternate table (which I assume exists, but means you have redundant syntax in there).

This is an interesting idea: what if environment markers had their own tables? That would allow reuse of platform detection expressions without copy/paste, but doesn’t complicate the simple version constraint cases.

uranusjr · July 4, 2020, 3:16am

This is basically what setuptools used to do when environment markers are keys in extras_require, but I thought people decided it’s not a good idea and chose the PEP 508 approach instead

steve.dower · July 4, 2020, 7:58am

Yeah, I’m fairly neutral on that idea, but I don’t see anything else that compelling from any other ecosystem.

The biggest advantage for new users of the expanded table format is that they’re more likely to guess that there’s another key they could use without having seen it before. They’ll still have to look up a reference to find it, but if they’ve only ever seen simple version constraints then they won’t realise that URL references or environment markers even exist (same applies to any table format with a “simplified” mode).

Some of the comparisons ought to be done on libraries with a lot of dependencies. There’s only limited difference between the two formats with a single specification, but once you multiply it by ten or twenty the PEP 508 syntax is much more efficient and no harder to read.

(Also, it’s much easier to include an existing specification by reference than creating a second specification. We also won’t have to spend the rest of time explaining why it’s arbitrarily different from what we’ve previously agreed on.)

pradyunsg · July 4, 2020, 9:10pm

I think you might be missing the point being made here – this redundancy and use of structures that are from the syntax of the language being used (TOML here, YAML for dart, Ruby code for Gem) is exactly what would make this more user friendly, compared to a DSL (PEP 508) that works well to compactly represent the information.

I’ll reiterate that the audience for this includes not-seasoned Python users, who haven’t heard of PEP 508 or know what that even looks like.

steve.dower · July 4, 2020, 9:51pm

But they are already fluent in TOML?

And they’ve managed to get to this point in packaging without looking anything up?

This is not a place where we need to optimise for untrained users. Readability-in-context is the highest criteria here, along with ease of finding the reference material (which hopefully will be well served by searching for the filename and section heading).

EpicWink · July 5, 2020, 1:38am

What are all of the criteria? What are their levels of importance?

I can think of some which may apply

Readability
Ease-of-use
Ease of finding and understanding documentation
Non-ambiguity (how confident a user is in knowing what they’ll get)
Similarity to existing Python dependency specifications
Similarity to other language’s dependency specifications
Conciseness (how small each requirement’s text is)
Difficulty of implementation
Feature support (logically combined environment markers etc, or not a regression from setuptools)
Extensibility

steve.dower · July 5, 2020, 7:54am

I appreciate the intent of making the list, but I’m not going to indulge it, sorry

I’ve been a professional tools developer for nearly a decade, and at this point I’m convinced the three most important things to have are strong gut feel (that you can back up with logic, as I did above), a broad range of anecdotes (to push back against gut feel from directions you may not be used to) and a commitment to whatever you decide (or else it’s doomed to failure).

Trying to turn it all into a ranked list is tempting, but has never been significantly helpful in my experience. (Sometimes items on the list come with helpful anecdotes, though. And occasionally the list is a good tool to persuade someone else that you’ve done more work than they are willing to do to argue with you, but that’s politics, not design )

We definitely have all three of those things here. The only one I worry about it commitment to the final solution, which includes following through with docs, tutorials, talks, advocacy, etc., and not immediately creating an alternative project just to prove a point. If we can do that part, either approach will work fine.

uranusjr · July 5, 2020, 11:47am

They probably didn’t get there without looking anything up, but from my experience most likely got to this point only looking up the very bare minimal—even to the point experienced eyes may not understand how that bare minimum is enough to get them there. I can’t say exactly how many percentage of the users are in this category, but pypa/packaging-problems has many examples on this. I tend to agree that untrained users are not the top priority here, but they are also derfinitely not a problem that can be hand-waved away.

This also reminds me of pypa/pip#8285, which describes how pip fails to pick up the extras from 'setuptools_scm >= 3.5[toml]', reported by @bernatgabor. It turned out to not be a bug, but I failed to diagnose the problem immediately. Maybe you can, but the point is that PEP 508 is evidently not easy to get right, even for experienced Python developers with years of context under their belt.

pf_moore · July 5, 2020, 1:50pm

I agree to a point, but nor should we optimise for non-experts, when those non-experts are likely to have many other things that they’ll struggle to deal with.

Better documentation and tutorials, and more accessible examples of “how to set your project/workflow up”, would be of far more use, I suspect, to the average “non-expert” than simplified syntax for one fairly niche area of packaging. The problem is that no-one is writing such documentation, no-one is (to my knowledge) sharing problems and solutions, and people are picking stuff up cargo cult fashion.

That’s maybe a really good area for funded work - developing newcomer-friendly guides and tutorials for packaging. We could get specialist technical writers and trainers to produce such things, rather than (as we currently do) relying on packaging specialists who are too close to the problem to know how to present things.

Honestly, that probably mainly demonstrates that tools like pip should validate their inputs more eagerly and bail out on errors. (It may be that “compatibility with legacy forms” is the justification for not doing so, but maybe we should just bite the bullet and start rejecting invalid syntax).

I agree that PEP 508 syntax isn’t simple to get right (but nor is regex syntax, for example!) but tools not telling you when you made a mistake is not going to help. Once you know you made a mistake, you can look up the syntax. It’s when you think you got it right, but the tool veered into “undefined behaviour” territory without warning, that you have problems…

steve.dower · July 5, 2020, 4:13pm

Never suggested ignoring them completely, though I think we can ignore untrainable users. But untrained users will need the resources to become trained, and enough context to locate those resources.

As one example, imagine we could use backticks in Python to perform “is None” checks. A user seeing that for the first time can’t google for backticks, and may not even know what name to use. Whereas “is None” can be searched for. This makes self-training possible.

In this case, we should have a section header (“dependencies”) and a file name (“pyproject.toml”) that can be searched for, and then we make sure something useful shows up in the results. This covers not-yet-trained users, and makes the actual syntax less relevant to how we address that particular audience.

In contrast, if every section header was the dependency name, then you couldn’t search for it, because you’d find the package itself rather than the guides. That would be one way for us to get this wrong, and again, the actual syntax of the dependencies is irrelevant.

pganssle · July 7, 2020, 1:27pm

To me, I think the most important point here is that we already have PEP 508 syntax which will be used in requirements.txt, tox and tons of other places where dependencies are specified in something other than a TOML file. If PEP 508 were super complicated, then maybe it would be justified to add the additional mental load of a second specification, but by and large it’s not that complicated.

The parts that are complicated tend to be complicated on the “write” side rather than the “read” side. So:

name [fred, bar] @ https://foo.com; python_version=='2.7'

I don’t think I would intuit that syntax or the ordering or anything, but I think that if you see something like this it’s obvious what it means in most cases. name @ https://... is pretty obviously "pull the package name from the url https://.... The ; python_version=='2.7' is obviously some sort of filter on the Python version, and I think you can easily infer that it means “This rule only applies when python_version is 2.7. The only possibly confusing thing is the [fred, bar] but the names people choose for their extras and the fact that [] is often used to delimit optional things almost always makes it clear that these are feature flags of some sort, so coverage[toml] is very obviously "coverage with toml support”.

I think the case for making people learn two ways to write down their dependencies is very weak, particularly since I know that I for one would have to look up the names of the keys every time I wrote anything remotely complicated anyway (as I frequently do when writing Rust).

uranusjr · July 7, 2020, 6:13pm

pganssle:

The parts that are complicated tend to be complicated on the “write” side rather than the “read” side. So:
name [fred, bar] @ https://foo.com; python_version=='2.7'
I don’t think I would intuit that syntax or the ordering or anything, but I think that if you see something like this it’s obvious what it means in most cases.

And that’s invalid PEP 508 right here

>>> from packaging.requirements import Requirement
>>> r = Requirement("name [fred, bar] @ https://foo.com; python_version=='2.7'")
Traceback (most recent call last):
  File "packaging/requirements.py", line 98, in __init__
    req = REQUIREMENT.parseString(requirement_string)
  File "pyparsing.py", line 1955, in parseString
    raise exc
  File "pyparsing.py", line 3814, in parseImpl
    raise ParseException(instring, loc, self.errmsg, self)
pyparsing.ParseException: Expected stringEnd, found 'p'  (at char 36), (line:1, col:37)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "packaging/requirements.py", line 100, in __init__
    raise InvalidRequirement(
packaging.requirements.InvalidRequirement: Parse error at "'python_v'": Expected stringEnd

You need a space after the URL, because URL can contain semicolons:

>>> r = Requirement("name [fred, bar] @ https://foo.com ; python_version=='2.7'")
>>> r.url
'https://foo.com'
>>> r.marker
<Marker('python_version == "2.7"')>

I don’t know… maybe PEP 508 is super complicated? I get the point it’s on the write side, and tools can catch the error for you before it’s read by anyone else. But I just can’t help but see the irony here. And I also get you only need get that wrong once, but at some point that one write can be more frustrating than all the readability difficulties combined.

pf_moore · July 7, 2020, 7:01pm

IMO, PEP 508 is relatively simple¹ for simple cases, but it gets complex and messy for complicated situations. Conversely, the table form scales up better to complicated cases, but (1) it’s a second format for people to learn, (2) it’s more complex than PEP 508 in simple cases, and (3) it’s not immediately obvious to me that TOML syntax doesn’t add its own layer of complexity here for people to learn (the string syntax is sufficiently close-but-not-the-same to Python’s that I can see that causing issues).

Realistically, this is the same old question - are we trying to optimise for “simple” and “common” cases, or for worst-case situations? Can you find an example as complex as that name [fred, bar] @ ... case that has actually occurred in a real-life requirement specification?

Personally, I’m mildly in favour of PEP 508 syntax because it’s “one less thing to learn” (and specifically, I don’t need to learn a new way to spell pip >= 20.0 or foo[extra]²). But I’m strongly against deciding based on complicated examples that will almost never occur in real life.

A survey of how the two forms would look if used in a number of common projects on PyPI would IMO be far more useful than spending a load of time debating edge cases.

¹ I appreciate I am biased here, but I’d argue that the effect is mainly to adjust the precise point where things get “complicated” rather than fundamentally affecting my argument.
² I went hunting, but couldn’t find a good example of a project with extras that you’d put in a dependency - maybe I didn’t have enough imagination, or maybe even extras are on the “unusually complicated” side of things???

Regardless of my comments above, yes it is ironic, and I apologise for taking your example over-seriously

pganssle · July 7, 2020, 7:06pm

I mean, that’s not “super complicated”, it’s a syntax rule. By the same token you could call TOML “super complicated” because it forbids multiline inline tables, so I’d have to rely on the tool to know that:

name = {
    url = "https://foo",
    extras = [fred, bar],
    python = "2.7"
}

is invalid.

Presumably there would be other gotchas, like maybe that python = "2.7" can’t be python = 2.7 or something of that nature.

I think in the end the fact that I had a typo in there that could have instantly been caught by the tool doesn’t invalidate my point at all. Especially since my point was that it is acceptable for a DSL like this to be a little hard to write.

TBH, you acknowledge that my point was, “Yeah it might be a bit hard to write but it’s easy to read.” It’s not like a big “gotcha” if there’s a minor failure when writing basically the most complicated format PEP 508 has, considering it’s still very simple to read it.