PEP 621: how to specify dependencies?

Never suggested ignoring them completely, though I think we can ignore untrainable users. But untrained users will need the resources to become trained, and enough context to locate those resources.

As one example, imagine we could use backticks in Python to perform “is None” checks. A user seeing that for the first time can’t google for backticks, and may not even know what name to use. Whereas “is None” can be searched for. This makes self-training possible.

In this case, we should have a section header (“dependencies”) and a file name (“pyproject.toml”) that can be searched for, and then we make sure something useful shows up in the results. This covers not-yet-trained users, and makes the actual syntax less relevant to how we address that particular audience.

In contrast, if every section header was the dependency name, then you couldn’t search for it, because you’d find the package itself rather than the guides. That would be one way for us to get this wrong, and again, the actual syntax of the dependencies is irrelevant.

To me, I think the most important point here is that we already have PEP 508 syntax which will be used in requirements.txt, tox and tons of other places where dependencies are specified in something other than a TOML file. If PEP 508 were super complicated, then maybe it would be justified to add the additional mental load of a second specification, but by and large it’s not that complicated.

The parts that are complicated tend to be complicated on the “write” side rather than the “read” side. So:

name [fred, bar] @ https://foo.com; python_version=='2.7'

I don’t think I would intuit that syntax or the ordering or anything, but I think that if you see something like this it’s obvious what it means in most cases. name @ https://... is pretty obviously "pull the package name from the url https://.... The ; python_version=='2.7' is obviously some sort of filter on the Python version, and I think you can easily infer that it means “This rule only applies when python_version is 2.7. The only possibly confusing thing is the [fred, bar] but the names people choose for their extras and the fact that [] is often used to delimit optional things almost always makes it clear that these are feature flags of some sort, so coverage[toml] is very obviously "coverage with toml support”.

I think the case for making people learn two ways to write down their dependencies is very weak, particularly since I know that I for one would have to look up the names of the keys every time I wrote anything remotely complicated anyway (as I frequently do when writing Rust).

7 Likes

And that’s invalid PEP 508 right here :wink:

>>> from packaging.requirements import Requirement
>>> r = Requirement("name [fred, bar] @ https://foo.com; python_version=='2.7'")
Traceback (most recent call last):
  File "packaging/requirements.py", line 98, in __init__
    req = REQUIREMENT.parseString(requirement_string)
  File "pyparsing.py", line 1955, in parseString
    raise exc
  File "pyparsing.py", line 3814, in parseImpl
    raise ParseException(instring, loc, self.errmsg, self)
pyparsing.ParseException: Expected stringEnd, found 'p'  (at char 36), (line:1, col:37)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "packaging/requirements.py", line 100, in __init__
    raise InvalidRequirement(
packaging.requirements.InvalidRequirement: Parse error at "'python_v'": Expected stringEnd

You need a space after the URL, because URL can contain semicolons:

>>> r = Requirement("name [fred, bar] @ https://foo.com ; python_version=='2.7'")
>>> r.url
'https://foo.com'
>>> r.marker
<Marker('python_version == "2.7"')>

I don’t know… maybe PEP 508 is super complicated? I get the point it’s on the write side, and tools can catch the error for you before it’s read by anyone else. But I just can’t help but see the irony here. And I also get you only need get that wrong once, but at some point that one write can be more frustrating than all the readability difficulties combined.

1 Like

IMO, PEP 508 is relatively simple¹ for simple cases, but it gets complex and messy for complicated situations. Conversely, the table form scales up better to complicated cases, but (1) it’s a second format for people to learn, (2) it’s more complex than PEP 508 in simple cases, and (3) it’s not immediately obvious to me that TOML syntax doesn’t add its own layer of complexity here for people to learn (the string syntax is sufficiently close-but-not-the-same to Python’s that I can see that causing issues).

Realistically, this is the same old question - are we trying to optimise for “simple” and “common” cases, or for worst-case situations? Can you find an example as complex as that name [fred, bar] @ ... case that has actually occurred in a real-life requirement specification?

Personally, I’m mildly in favour of PEP 508 syntax because it’s “one less thing to learn” (and specifically, I don’t need to learn a new way to spell pip >= 20.0 or foo[extra]²). But I’m strongly against deciding based on complicated examples that will almost never occur in real life.

A survey of how the two forms would look if used in a number of common projects on PyPI would IMO be far more useful than spending a load of time debating edge cases.

¹ I appreciate I am biased here, but I’d argue that the effect is mainly to adjust the precise point where things get “complicated” rather than fundamentally affecting my argument.
² I went hunting, but couldn’t find a good example of a project with extras that you’d put in a dependency - maybe I didn’t have enough imagination, or maybe even extras are on the “unusually complicated” side of things???

Regardless of my comments above, yes it is ironic, and I apologise for taking your example over-seriously :slightly_smiling_face:

2 Likes

I mean, that’s not “super complicated”, it’s a syntax rule. By the same token you could call TOML “super complicated” because it forbids multiline inline tables, so I’d have to rely on the tool to know that:

name = {
    url = "https://foo",
    extras = [fred, bar],
    python = "2.7"
}

is invalid.

Presumably there would be other gotchas, like maybe that python = "2.7" can’t be python = 2.7 or something of that nature.

I think in the end the fact that I had a typo in there that could have instantly been caught by the tool doesn’t invalidate my point at all. Especially since my point was that it is acceptable for a DSL like this to be a little hard to write.

TBH, you acknowledge that my point was, “Yeah it might be a bit hard to write but it’s easy to read.” It’s not like a big “gotcha” if there’s a minor failure when writing basically the most complicated format PEP 508 has, considering it’s still very simple to read it.

2 Likes

Hello again! I’m making good progress on the Hatch rewrite and would really like to fully support https://www.python.org/dev/peps/pep-0621/ for 1.0.0rc1. Namely, this and SPDX license expressions.

I know we haven’t reached consensus, but is there a format we are leaning toward?

edit: btw I prefer Nick’s suggestion PEP 621: how to specify dependencies?

1 Like

Not that I can explicitly tell. And with PEP 621 on hold until we get the “static sdist” discussion resolved, I don’t know when it will get settled.

This? https://www.python.org/dev/peps/pep-0625/

PEP 625 is part of the discussion, but not all of it.

Oh I see, thanks! Where are the other discussions?

They haven’t happened yet (I’m writing up the opening post to one right now).

1 Like

Is there any benefit to allowing some dependencies to by dynamic, or is that introducing too much complication?

Example:

[dependencies]
pyopengl = ~= 3.1
numpy = dynamic

For:

  • Can specify most dependencies statically in some packages were only one or some dependencies are determined at build-time

Against:

  • Add reasonable complication for PEP 621 implementers
  • Probably confusing for end-users
  • Doesn’t really help dependency resolvers if PEP 621 goes in to sdists, as you can’t resolve the unknown
  • Already exists a mechanism (kind-of): environment markers

Further research:

  • Look at what packages are actually doing when dependencies are computed dynamically

I think that’s too much complication for not enough benefit from the PEP’s perspective. If a tool wants to support that idea it can, though.

1 Like

(I had to step away from my PC and take a stroll after reading this, so well, hopefully, not too much of my frustration comes through)

My point is that basically every other language / packaging ecosystem includes the name of the thing it’s affecting more directly than PEP 508 does, without the need for carefully reading a specification. (“groups”, “target” etc).

And, I’d also call this false equivalence – learning TOML/JSON/YAML is a transferable skill. You can use this understanding elsewhere or bring it from elsewhere. Ruby’s Gemfile specification is basically a special sauce Ruby file. I’d be very surprised to hear that someone has uses PEP 508’s syntax for anything other than in Python Packaging.

I’m sorry but this feels like grasping at straws.

Notice how in all the aforementioned cases, the syntax is not inherently tied to just-this-thing. and I think we’d be in a better place if we do what basically every other ecosystem has done. :slight_smile:

A user can learn than TOML doesn’t allow for multiline inline tables or that JSON doesn’t allow for trailing commas and use this knowledge elsewhere. It’s much easier to not trip on things like these when you may reuse this syntax in other places/contexts.

I agree that it’s rare that someone writes a PEP 508 string that is so complicated and it’s probably fine if they trip while writing them. But surely, reducing the likelyhood of that happening is not a bad idea.

Agreed. In the other hand, the contents/underlying structure is really not complicated unlike regex. I mean, AFAIK, no one else invented a string DSL for dependency specification other than Python Packaging.

Don’t get me wrong, I’m not going to pick up a pitchfork or walk away if we decide to use PEP 508 here. I like that PEP 508 strings are super concise, but I still don’t think PEP 508 is particularly well optimized for humans to write OR read. They’re writable, only if you’ve spent a whole bunch of time reading PEP 508. Readable, only if you know what each of them various forms means.

And none of the PEP 508 forms are searchable/discoverable. If you present “coverage[toml]”, the user would understand its doing something related to coverage and toml, but would have no easy way to go ahead and search for what this actually means. If we have a good way to go from this to “extras” or “markers”, surely that’s not a negative. :slight_smile:

And the fact that nearly no one else in this space has something like PEP 508 for dependency specification makes me feel like, maybe, it’s not the best thing to just assume that’s the best choice for us.

This is our perspectives differ, I guess. :slight_smile:

I think the additional mental overhead isn’t really going to be a a big deal, and this pivot today would actually be helpful when pip itself moves away from requirements.txt to a requirements 2.0. The required knowledge would be know TOML and the key names, and it would also allow us to augment the dependency specification in the future.

Eg: we could switch to have name = { url = ... } instead of name = "@ URL" in addition to whatever we specify here, and that’d significantly simplify the parsing pipeline.

I want Python’s dependency specification to be something more “friendly” than PEP 508 strings, so that we can have a future version of pip supporting requirements 2.0 can use that to specify environments that way. And, that it gives us an extra dimension to evolve our dependency specification format is a worthwhile benefit IMO.

IOW, the end goal in my head is NOT 2 formats for specifying dependencies, but only 1 format which is not-just-for-this DSL and is extensible for other use cases as well. And, string that evolves to dictionary model is strictly more functionally capable than the PEP 508 model, even if it’s less concise.

To reiterate, I view the two formats co-existing is a transitory step. I’d like us to start aligning our packaging formats more with what everyone else in this space is doing and this would be a big step in that direction (along with PEP 621 as a whole).

4 Likes

What I like about PEP-508 style is that they’re copy-paste-able as CLI arguments to pip :thinking: If you introduce the table format how would users copy/paste a definition as pip arg? Would they now need to use two different formats depending if they are inside the pyproject.toml vs CLI? Is it worth adding two ways of doing the same thing, and pay a long transition phase when people use both? In my own view all these probably buys us less than would get us.

2 Likes

They are copy-pastable as pip arguments, except when they are not :laughing: e.g. on Windows you trigger the piping syntax when you pip install foo>1.0. I am likely biased since I read too much of pypa/packaging-problems and pip’s issue tracker, but I feel this “almost always copy-pastable” feature ends up causing problems to less experienced users eventually.

1 Like

You can still call it copy-paste if it’s placed inside single-quotes, eg

pip install 'foo > 1.0'

(It seems space is required to trigger piping in PowerShell 7.0)

1 Like

True but most of the time copy-pastable versus never copy-pastable does seems worse to me :thinking:

I guess that’s a difference in philosophy then. Never copy-pastable seems more explicit to me, because the user’s first thought when something does not work would be “hmm maybe I wrote it wrong,” not “pip obviously got it wrong since the same thing works in pyproject.toml.”

But now you have two ways to specify things :thinking: while in other case you could use in 99% of the time just 1 (and escape correctly in that remaining 1%).