PEP 621: how to specify dependencies?

ofek · July 12, 2020, 9:16pm

Hello again! I’m making good progress on the Hatch rewrite and would really like to fully support https://www.python.org/dev/peps/pep-0621/ for 1.0.0rc1. Namely, this and SPDX license expressions.

I know we haven’t reached consensus, but is there a format we are leaning toward?

edit: btw I prefer Nick’s suggestion PEP 621: how to specify dependencies?

brettcannon · July 14, 2020, 12:59am

Not that I can explicitly tell. And with PEP 621 on hold until we get the “static sdist” discussion resolved, I don’t know when it will get settled.

ofek · July 14, 2020, 1:04am

This? https://www.python.org/dev/peps/pep-0625/

brettcannon · July 14, 2020, 1:04am

PEP 625 is part of the discussion, but not all of it.

ofek · July 14, 2020, 1:12am

Oh I see, thanks! Where are the other discussions?

brettcannon · July 14, 2020, 1:12am

They haven’t happened yet (I’m writing up the opening post to one right now).

EpicWink · July 15, 2020, 5:18pm

Is there any benefit to allowing some dependencies to by dynamic, or is that introducing too much complication?

Example:

[dependencies]
pyopengl = ~= 3.1
numpy = dynamic

For:

Can specify most dependencies statically in some packages were only one or some dependencies are determined at build-time

Against:

Add reasonable complication for PEP 621 implementers
Probably confusing for end-users
Doesn’t really help dependency resolvers if PEP 621 goes in to sdists, as you can’t resolve the unknown
Already exists a mechanism (kind-of): environment markers

Further research:

Look at what packages are actually doing when dependencies are computed dynamically

brettcannon · July 15, 2020, 8:39pm

I think that’s too much complication for not enough benefit from the PEP’s perspective. If a tool wants to support that idea it can, though.

pradyunsg · July 17, 2020, 6:22am

(I had to step away from my PC and take a stroll after reading this, so well, hopefully, not too much of my frustration comes through)

My point is that basically every other language / packaging ecosystem includes the name of the thing it’s affecting more directly than PEP 508 does, without the need for carefully reading a specification. (“groups”, “target” etc).

And, I’d also call this false equivalence – learning TOML/JSON/YAML is a transferable skill. You can use this understanding elsewhere or bring it from elsewhere. Ruby’s Gemfile specification is basically a special sauce Ruby file. I’d be very surprised to hear that someone has uses PEP 508’s syntax for anything other than in Python Packaging.

I’m sorry but this feels like grasping at straws.

Notice how in all the aforementioned cases, the syntax is not inherently tied to just-this-thing. and I think we’d be in a better place if we do what basically every other ecosystem has done.

A user can learn than TOML doesn’t allow for multiline inline tables or that JSON doesn’t allow for trailing commas and use this knowledge elsewhere. It’s much easier to not trip on things like these when you may reuse this syntax in other places/contexts.

I agree that it’s rare that someone writes a PEP 508 string that is so complicated and it’s probably fine if they trip while writing them. But surely, reducing the likelyhood of that happening is not a bad idea.

Agreed. In the other hand, the contents/underlying structure is really not complicated unlike regex. I mean, AFAIK, no one else invented a string DSL for dependency specification other than Python Packaging.

Don’t get me wrong, I’m not going to pick up a pitchfork or walk away if we decide to use PEP 508 here. I like that PEP 508 strings are super concise, but I still don’t think PEP 508 is particularly well optimized for humans to write OR read. They’re writable, only if you’ve spent a whole bunch of time reading PEP 508. Readable, only if you know what each of them various forms means.

And none of the PEP 508 forms are searchable/discoverable. If you present “coverage[toml]”, the user would understand its doing something related to coverage and toml, but would have no easy way to go ahead and search for what this actually means. If we have a good way to go from this to “extras” or “markers”, surely that’s not a negative.

And the fact that nearly no one else in this space has something like PEP 508 for dependency specification makes me feel like, maybe, it’s not the best thing to just assume that’s the best choice for us.

This is our perspectives differ, I guess.

I think the additional mental overhead isn’t really going to be a a big deal, and this pivot today would actually be helpful when pip itself moves away from requirements.txt to a requirements 2.0. The required knowledge would be know TOML and the key names, and it would also allow us to augment the dependency specification in the future.

Eg: we could switch to have name = { url = ... } instead of name = "@ URL" in addition to whatever we specify here, and that’d significantly simplify the parsing pipeline.

I want Python’s dependency specification to be something more “friendly” than PEP 508 strings, so that we can have a future version of pip supporting requirements 2.0 can use that to specify environments that way. And, that it gives us an extra dimension to evolve our dependency specification format is a worthwhile benefit IMO.

IOW, the end goal in my head is NOT 2 formats for specifying dependencies, but only 1 format which is not-just-for-this DSL and is extensible for other use cases as well. And, string that evolves to dictionary model is strictly more functionally capable than the PEP 508 model, even if it’s less concise.

To reiterate, I view the two formats co-existing is a transitory step. I’d like us to start aligning our packaging formats more with what everyone else in this space is doing and this would be a big step in that direction (along with PEP 621 as a whole).

bernatgabor · July 17, 2020, 7:36am

What I like about PEP-508 style is that they’re copy-paste-able as CLI arguments to pip If you introduce the table format how would users copy/paste a definition as pip arg? Would they now need to use two different formats depending if they are inside the pyproject.toml vs CLI? Is it worth adding two ways of doing the same thing, and pay a long transition phase when people use both? In my own view all these probably buys us less than would get us.

uranusjr · July 17, 2020, 7:41am

They are copy-pastable as pip arguments, except when they are not e.g. on Windows you trigger the piping syntax when you pip install foo>1.0. I am likely biased since I read too much of pypa/packaging-problems and pip’s issue tracker, but I feel this “almost always copy-pastable” feature ends up causing problems to less experienced users eventually.

EpicWink · July 17, 2020, 8:11am

You can still call it copy-paste if it’s placed inside single-quotes, eg

pip install 'foo > 1.0'

(It seems space is required to trigger piping in PowerShell 7.0)

bernatgabor · July 17, 2020, 8:50am

True but most of the time copy-pastable versus never copy-pastable does seems worse to me

uranusjr · July 17, 2020, 9:18am

I guess that’s a difference in philosophy then. Never copy-pastable seems more explicit to me, because the user’s first thought when something does not work would be “hmm maybe I wrote it wrong,” not “pip obviously got it wrong since the same thing works in pyproject.toml.”

bernatgabor · July 17, 2020, 9:38am

But now you have two ways to specify things while in other case you could use in 99% of the time just 1 (and escape correctly in that remaining 1%).

pf_moore · July 17, 2020, 9:44am

Maybe more a difference in experience. Are we looking to add convenience for people who have a certain baseline understanding (know how shell quoting works, for example), or for people who have to support beginners, for whom a compact, “copyable” string has a whole load of unspoken assumptions and implications that they aren’t familiar with?

That’s a genuine question - it’s not at all clear to me who the target audience is here. I genuinely don’t understand how we have users working with technologies intended to help with building redistributable Python packages for others to consume, who don’t understand basic command line ideas like “quoting stuff is a nightmare black art” But we do have such users, so we should decide whether we target our designs at their baseline level of knowledge, or not.

dstufft · July 17, 2020, 4:49pm

FWIW, I don’t think it’s true that Python is alone here, for instance npm maps dependency data down to a single string per dependency (they don’t have anything like environment markers or such so their string isn’t quite as complicated as ours).

Additionally the problem with a complex data structure, is that it doesn’t translate well to the CLI, which means you end up inventing a PEP 508 like structure for the CLI like gem install has (gem install akami:1.2.0 atomic:1.1.14 aws-s3:0.6.2), or you use CLI flags and disallow trying to use those flags when you’re installing multiple things like cargo has (and the PR that added support for cargo install supporting multiple things mentions this limitation and talks about possibly inventing a PEP 508 like syntax for getting around this limitation).

As best as I can tell, you only have a couple real options when translating requirements to the CLI:

Use a string based DSL
Use command line arguments, and disallow installing multiple things when using those arguments.
Use command line arguments, and do context sensitive parsing of CLI arguments so that something like pip install foobar --git ... spam --git ... knows to associate the --git flag the the correct thing to install.
Don’t support anything but the most basic operations on the CLI (or don’t support it at all).

It appears that currently pip and npm use (1) for everything, gem uses (1) for the CLI only, cargo uses (2), go uses (4), composer uses (1) for the CLI only, nuget uses (2).

I’m sure there are others that make different decisions, but I think that the idea we’re going to standardize on a single format, and that format is going to be some rich TOML data structure is a pip dream, since that implies (4), and while some tools out there make that decision (mainly tools like bundler that manage projects and thus refuse to operate without some sort of input file/project), I don’t think it’s something that pip itself or another general purpose Python package manager is going to be able to do.

So you’re more or less guaranteed to have two formats in play for the foreseeable future, unless we make the CLI format the “blessed” format, then we can get down to a single format.

For the record, I have a slight preference for just using PEP 508 everywhere, as I think there is value in having a single shared format no matter what context you’re in.

That being said, if I were to argue for the rich data structure approach, I would argue not that we’re going to get down to a single format, and it’ll be a TOML data structure (because we can’t and won’t) but that the PEP 508 string has a steeper learning curve, and while we’re not going to be able to get away from it’s use on the CLI, the CLI is less likely to use the more advanced features that ramp up the learning curve, wheres the more advanced features are more likely to happen inside a file, so it could make sense to pay the cost of two formats, given that split of simpler / more advanced between CLI and files.

Personally I still don’t think that argument justifies the additional cost (extras are likely to get used on the CLI, environment markers are not, but the bulk of the complexity of environment markers come from the markers themselves, not the fact there is a semi colon separating them from the rest of the specifier).

uranusjr · July 17, 2020, 5:34pm

I’m probably misunderstanding something. Since we already have PEP 508, if we invent a rich TOML structure, wouldn’t pip become like Composer and Gem, i.e. “(1) for the CLI only”? Nobody seems to propose we drop PEP 508 and only standardise on a single format.

While we’re comparing tools, I want to give another shoutout to @ncoghlan’s proposal up-thread:

ncoghlan:

# simple
colorama = "*"
# environment dependent
django = [
    ">=2.0; os_name!='nt'",
    ">=2.1; os_name=='nt'" # Affected by Windows-specific bug in 2.0
]
# with extras defined
win32 = "[all] >2.2.0, <3.0.0; os_name == 'nt'"

This combining with pip install using PEP 508 would match npm’s approach (and composer.json), which I find quite reasonable.

dstufft · July 17, 2020, 5:47pm

It’s possible I’m misunderstanding the quoted part of @pradyunsg’s post read to me like his goal is to standardize on a single format, and I was just pointing out that I don’t believe that’s actually possible if we use the rich data structure in TOML idea.

ncoghlan · July 19, 2020, 12:59am

Similar to others, my main concern with standardising on a TOML dict syntax for complex dependencies is that it necessarily becomes a second format to learn.

However, I don’t think we should take lightly the fact that the original developers of both pipenv and poetry independently decided that they didn’t want to make understanding the full complexity of PEP 508 a pre-requisite for using their tools.

I’ll also note that in the long run the “two formats” concern can potentially be mitigated by also accepting the exploded format in CLI tools, where wrapping single quotes around the entire string should suffice to deal with most (but not all) CLI quoting issues.

Conversely, TOML based tools should accept both the abbreviated format and the exploded one. That way copying and pasting in either direction should mostly just work.

If we do that, then the exploded form can become a stepping stone & teaching aid for the shorthand form, by presenting equivalent declarations in the two formats:

pip install 'win32[all] >= 1.0; os_name == "nt"'
pip install 'win32{version=">= 1.0",extras=["all"], environment="os_name == \"nt\""}'

pip install 'name [fred, bar] @ https://foo.com; python_version=="2.7"'
pip install 'name{direct_url="https://foo.com",extras=["fred","bar"],environment="python_version == \"2.7\""}

Here we can see that nested CLI quoting issues in the exploded format mainly come up with environment markers, and those would mostly appear in files where getting the quoting right in the derived command is a pre-existing tooling problem rather than something users have to worry about directly.

By choosing appropriate names for the fields (e.g. having “direct_url” and “vcs_url” be distinct), we can provide a gentler introduction to the full complexity of Python’s dependency declarations than requiring new users to learn the more CLI friendly abbreviated dependency syntax in order to declare dependencies in pyproject.toml.

I still like PEP 621: how to specify dependencies? as a broad structure, and the main change needed to accommodate the exploded form is to permit an inline table anywhere a PEP 508 string appears in the examples.