PEP 621: how to specify dependencies?

True. This is an ongoing problem with packaging discussions.

It would, but it’s typically been near-impossible to do. I’d love to get such input (in an unbiased form - I’d be cautious about anecdotal information from groups that potentially have a level of bias, such as experienced developers, or users of a particular tool, etc). But I don’t want to block progress on chasing after an unattainable ideal.

That’s a matter of opinion, as I’m sure you’ll agree.

Agreed, but it’s also not a strong argument for TOML-style syntax. Any format that includes quotes is IMO a potential usability problem, given the quirks of quoting in different shells. The best I can say here is that of the two proposed formats, PEP 508 uses less special characters. But I don’t feel that either proposal can reasonably claim to be ideal for use on the command line. Inventing an additional, CLI-only, format may be even worse, though. Having said all that, this discussion is not about CLI usage, “can it be copied to the CLI” is an incidental question at best.

Correct. I find the PEP 508 version far easier to read. So there’s little point to the comparison, as all it does is confirm that people have different views.

I’m not aware of pip/setuptools users complaining about PEP 508 format, either. But I haven’t investigated much, so if we really care about that sort of data, we should do a proper survey.

True. But unless we get 100% consensus, it is always going to be the case that some tools will need to change to conform to the standard. I sympathise with the fact that Poetry deliberately chose to use something other than PEP 508, and so this would feel like a regression, but so far the arguments that convinced you to take the route you did with Poetry either haven’t been explained clearly enough in this discussion, or haven’t been sufficiently convincing to change people’s minds.

I think this is the key here. We don’t all agree on what to optimise for, so we keep circling round the same arguments - but no-one is convinced, because we all have different goals.

To give a personal perspective, I consider it “obvious” that the following are the highest priorities:

  1. Simple cases can be written with minimal syntax overhead.
  2. Knowledge of how to write dependencies in pyproject.toml should be transferrable to other situations (CLI usage, other configuration files, metadata, …) and vice versa.

Conversely, I don’t put any priority on (for example):

  1. Making it easy to specify complex cases.
  2. Similarity to other language ecosystems.
  3. What libraries tools need to use to parse the data.

There are plenty of other aspects that come somewhere in the middle (teachability, discoverability, for example).


For a real example, the dependency declaration in would become:

dependencies = [
  'cached-property >= 1.2.0, < 2',
  'distro >= 1.5.0, < 2',
  'docker[ssh] >= 4.2.2, < 5',
  'dockerpty >= 0.4.1, < 1',
  'docopt >= 0.6.1, < 1',
  'jsonschema >= 2.5.1, < 4',
  'PyYAML >= 3.10, < 6',
  'python-dotenv >= 0.13.0, < 1',
  'requests >= 2.20.0, < 3',
  'texttable >= 0.9.0, < 2',
  'websocket-client >= 0.32.0, < 1',

  # Conditional
  'backports.shutil_get_terminal_size == 1.0.0; python_version < "3.3"',
  'backports.ssl_match_hostname >= 3.5, < 4; python_version < "3.5"',
  'colorama >= 0.4, < 1; sys_platform == "win32"',
  'enum34 >= 1.0.4, < 2; python_version < "3.4"',
  'ipaddress >= 1.0.16, < 2; python_version < "3.3"',
  'subprocess32 >= 3.5.4, < 4; python_version < "3.2"',

socks = [ 'PySocks >= 1.5.6, != 1.5.7, < 2' ]
tests = [
  'ddt >= 1.2.2, < 2',
  'pytest < 6',
  'mock >= 1.0.1, < 4; python_version < "3.4"',

Note: it would also be cool if we allowed multiline literal strings in addition to arrays to get rid of quoting and commas. Basically, the same parsing for files read by pip with the -r flag.

1 Like

I do very much like @sdispater’s idea to have a real poll en masse. It may sound odd, but I think the best option we have is for one of us to write a comment here with a simple example of both approaches, then whoever here has let’s say more than 500 followers on Twitter, do a poll that links to the comment.

We could have real data by next week.

For what it’s worth: I have been following the Python packaging topics on Stack Overflow for a bit now, and I can’t remember people having issues with the PEP 508/PEP 440 notations themselves. The cases I remember where about people being confused about which notation to use because of the deprecation of setuptools dependency_links (that was a bit rough). Once pointed to the right document, people seemed to be satisfied (or at least they went quiet). And I haven’t seen such a question in months (of course I haven’t seen every question).

A wider, more public poll would be welcome.

For the simple cases (which might be the most common cases: just the name, plus sometimes a pinned version or a range), PEP 508 feels more readable to me. My gut feeling is that when people need more than that, then they are somewhat experienced enough that they don’t get scared by such notation. A pleasant user experience is important, but we are talking about people who at the very least managed to write the very Python code they are trying to package, so I believe it’s not a big ask.

But obviously I would feel bad, if poetry (and others) had to give up on their notation. I would vote for a reasonable hybrid notation (I think there were some suggestions here), but I am aware it would make for more complex specification and implementations. I would encourage the proponents of a TOML notation to come up with more suggestions.

Maybe something like that:

dependencies = [
  'A [one, two] ~= 1.2.3 ; python_version < "2.7"',
  { name = 'B [one, two] ~= 1.2.3 ; python_version < "2.7"' },
  { name = 'C [one, two] ~= 1.2.3', markers = 'python_version < "2.7"' },
  { name = 'D [one, two]', version = '~= 1.2.3', markers = 'python_version < "2.7"' },
  { name = 'E', extras = [ 'one', 'two' ], version = '~= 1.2.3', markers = 'python_version < "2.7"' },

Basically parse name as PEP 508 first and then everything that comes after replaces (no questions asked, conflicts are user’s fault) what’s already in the Requirement object. I believe it’s quite close to poetry’s notation.

I feel like I’m going agaisnt the current, however I don’t think would be beneficial, because as I said above the drawback of having a migration path and supporting two ways of doing things for at least the next 3 years (but more likely 5), in my opinion, outweights any potential benefits we might get with the table format. And this is assuming the table format is easier to read, on what we can’t really all agree.

In best case scenario we’ll end up in a place where these dependencies are a bit easier to read/write. However, to get there we’d have to support in the mid-term both across various packaging tools and provide assistance for people mixing up the two (not to talk about the time themeselves would waste when inevitably they use one format over another, where only one of them is allowed). IMHO spending both maintainer and new users time on this is not the most effective usage of our (very limited) resources.

For example an editable mode for PEP-517 is a much more important topic if we have availabilty.


They are not actually: one is a “standard” that is specific to Python while the other is a more generic standard that spans multiple languages and tools.

So, you agree that the TOML approach has more advantage than PEP-508 from a metadata file standpoint?

Because you know the specification. What about new users or occasional Python developers?

And yet, when I see the number of files that get it wrong (or use programmatic checks instead), I feel that it might not be the clearest specification.

I chose it because of the following main reasons:

  • Readability (debatable)
  • Discoverability: it’s easier to find if a dependency exists in a dict than in a list
  • Explicitness
  • Programmatic manipulation: You can’t easily manipulate PEP-508 strings to change parts of it compared to TOML elements.
  • Consistency with what exists in other popular languages. This was the principal factor that led me to this decision.
1 Like

Nothing stops you from loading that list into a dictionary once you read the file. And they’re equal.

Not sure what part of it you consider it to be more explicit?

Why? What’s wrong with You update the property, and then call str on it?

The biggest problem here is that to get where other languages are would be a lot of pain. So the question is are we prepared to hurt a lot in the next few years just to be on par with other languages? And while getting there also loosing some current features (e.g. copy-paste-ability of specifications).

1 Like

That’s an extra step just to circumvent an issue that could be solved at the specification level.

You have the name of the elements specified directly in the file (like extras) which helps make it more self documenting.

That’s an extra dependency you need to have while you could rely on the fact that any TOML parser will return native types that are easily manipulable.

So, we just give up and don’t try to improve on what we have? Shouldn’t this be a goal in itself? To provide a user/developer experience that is on par with what other languages have?

That being said, one reason other languages were able to pull this off is because they mostly have one or two tools of reference, instead of several tools like we have in Python, so making a transition like this is easier.

I think the point of Donald was:

  • one is a string following a single specification: PEP-508,
  • the other is a string following & combining several specifications: TOML & PEP-508.

If we find out that most people outside of our bubble prefer the TOML way, then the answer should be a resounding “yes” from all of us here.

That’s a good thing! There should be a core dependency parser to avoid duplicate work by setuptools, Poetry, Hatch, Flit, etc.

1 Like

I don’t think this is an issue becuase even in your specification validating e.g. the version strings are valid requires an extra step. So given that you already need an extra step, that extra step might actually do this operation too automatically.

AFAIK the only part of PEP-598 that doesn’t specify the key explicitly is the extras. You get just as explicit with platform/python version keys. So in this sense the table format helps a bit, but just for extras.

The validation the TOML offers is at best a light validation. You can specify a lot of entries that’s valid TOML, however incorrect python specification. Package names, version and python requirement specifiers come immediately to mind (but could go onto what happens when someone passes a list instead of dict as value in the table, or uses integer as key instead of string, etc). Considering for good UX you’d want to validate all these you are already looking at an extra dependency.

We should improve things, but we need to balance benefits against the price we as a community have to pay to get there. And in this case from what I seen until now we’re talking about marginal benefits with significant resource investment to get there.

We need to work with what we have, not what we wish we had. Going down the TOML path will put strain on the entire ecosystem, not just 1-2 tools in Python, sadly.

I agree that the TOML approach has advantages when you’re utilizing multiple features of PEP508 at once. I think it’s either slightly worse or about the same in the simple cases.

Like I said. I don’t think this is a situation where either solution is just better across the board for the end users experience. I think you can construct real world situations where either one “wins” depending on which aspects you personally want to optimize for.

If one solution was just better in every situation I think you’d see a lot more enthusiasm for standardizing on that one solution. When there is no clear winner, status quo is typically the winner.

I would be very interested in hearing from people like @rhettinger who teach or have taught Python professionally. Do any of us here have regular interactions with newcomers outside of bug reports?

When it comes to a PEP, the PEP author makes a call and the PEP delegate either agrees or doesn’t. :slight_smile:

Oh, I’m not. We will reach a conclusion somehow.

FYI I asked once on Twitter which people preferred who knew both formats, and the results were inconclusive/leaned towards PEP 508. But once again, that was a selected audience that had exposure to both.

I think we have acknowledged that everyone in this conversation is bringing biases based on the tool(s) they maintain and what that tool currently supports. That pretty much guarantees a clean answer will not happen among ourselves.

So, how do we want to settle this? A poll here that we promote as widely as possible? A bunch of individual polls where we then come back with the results? We reach out to trainers and teachers and ask them to talk to their current classes to see if beginners have a true preference?

I think that the unstated assumption here is that we don’t realize what people find difficult about using our tools because we don’t find it difficult, but I think it’s actually really hard to get the information we want here.

In some ways, people already involved in Python packaging are the perfect people to ask about it, because we’re the ones dealing with a diverse group of users, dealing with bug reports, etc. We also tend to do the most complicated things with packaging and know the right way to do things. Also, many of us were motivated to get involved in packaging to fix problems we had ourselves.

I am not saying we should ignore our users (quite the contrary), but I also think that we need to acknowledge that often if you ask beginners whether they like X or Y, they’re often making that choice without a deeper context, and in the end we would get a worse experience by taking them at their word. (I say this as someone who has made UX suggestions that were accepted in beta tests and come to regret them enough times to feel hesitant about it.)

I don’t think we should be polling people for preferences. I think if we can agree that the question of which way to specify dependencies comes down to (or would be significantly be informed by) a disputed factual question, we should come up with a strategy to determine the truth.

That said, I’m not convinced that our differences really come down to factual questions. People don’t usually complain or have problems with PEP 508 and they don’t really complain about poetry's way to specify dependencies. That suggests to me that both are good enough and that people won’t be actively confused by using either one. Given that PEP 508 is already standardized, already in wide use, we cannot deprecate it in favor of a TOML-based system (which won’t work with all config systems) and people will need to learn it anyway makes me say that a tie should go to PEP 508. Maybe the result would be different if we were designing this from scratch in a vacuum, but I find it much more plausible that people will lament the proliferation of ways to declare dependencies than they will lament the fact that we’re using a compact DSL.


I haven’t formally polled this, but my gut-feel (based on many conversations, for example, I’ve just spent an entire day at EuroPython chatting with attendees about all kinds of stuff) is that the most popular approach would be “just pick one and tell me exactly what to do”.

And I agree with Paul (all of the Pauls :wink: ): existing standard wins over writing a new standard.


Is this true though?

Both pipenv and Poetry changed the metadata format and I didn’t see any backlash for that. So I think people are more than willing to follow any new standard.

And since we are specifying a brand new standard anyway I don’t see why we couldn’t go all in.

Note that last I heard, the long-term plan is to deprecate the concept of “extras” and make them into regular packages with [brackets] in the name. (So e.g. we’ll have a requests[security]-2.24.0-py2.py3-none-any.whl, which is a regular package that contains no code, and depends on the appropriate versions of requests + pyOpenSSL.)

So it doesn’t make a lot of sense for a new format today to hardcode the string extras, or to split up the extra from the package name.

First time I am hearing of it honestly and I would like to know the rationale behind this because it seems like a bad idea.

And I gave the example of extras but that applies to markers and git dependencies too, for instance.

1 Like

This is off-topic of course, but that sounds like an absolutely horrible plan.

1 Like