PEP 621: how to specify dependencies?

[mod hat on] Please don’t make this kind of content-free judgement about other folk’s work without even knowing the details; it’s not helpful for having good technical discussions. [mod hat off]

FWIW, my personal feeling is that most of this discussion is a red herring / classic bikeshedding. IMO the two genuinely complicated parts are learning the operators for version comparisons, and mini-language for markers. These still exist in the “exploded” version, they just have a TOML-shaped frame around them instead of a PEP 508-shaped frame.

I appreciate the comparison here, but to me it’s pretty underwhelming once you fix the spacing in the PEP 508 version to make it readable. There are a few edge cases that might favor one or the other, but seriously, who cares about "cachy ~= 0.3.0" versus cachy = "^0.3.0".

I think the only reason other languages use TOML/JSON is because they started out using those formats for all their metadata, so it wasn’t worth the bother of specifying a complete DSL for dependencies. Python OTOH doesn’t have that history and already has the DSL specified, so we might as well use it.

Anyway, that’s my 2 cents. IMO the most productive thing would be to pick one and move on :slight_smile:


That said, I guess there is a substantive question here, which is how to encode extensions to PEP 508. AFAIK every pinning format has a mechanism for doing things like requesting in-place installs, allow dev-dependencies, specifying hashes, etc., and it would be good to have a place to put those. Maybe that should be a separate thread though, since this one is so deep in the weeds?

I definitely could have been clearer about explaining it, but this is the key benefit pipenv/Pipfile get from moving the package names out as TOML table keys: it allows the value to be a string for a plain dependency, or an inline table for something more complicated.

That said, it would be reasonable to say that these standard fields are intended for the kind of dependency metadata that can go into a built package, and anything like editable installs will remain in tool specific formats for now.

Interestingly, many tools using the exploded form for specification from the beginning still invent a DSL at some point anyway. Cargo, for example, uses exploded TOML, but dependencies in Cargo.lock (which is also in TOML!) uses a string form of the same specification. Bundler uses a Ruby-based DSL for specification, but the DSL is compiled into a string form in Gemfile.lock, despite the rest of the file is in a YAML-ish custom syntax that clearly supports structured data.

I don’t know their reasoning behind it, but it does not seem that strange to me in practice to have different dependency specification formats for humans and machines.


And some of them invented yet another DSL for the CLI.

1 Like

Thus far the discussion on specifying dependencies is mainly on the format. I think another topic related to specifying deps is important to have at this point, and that is whether the chosen format should be “ready” in case we want to support specifying native dependencies or even consider cross-compilation.

One thing I would really like to see in the future is a a field where one can list the run-time executables it expects, as well as run-time libraries that are dlopen’ed. In Nixpkgs we commonly need to patch such references, which is quite a pain. With conda that should also be done, though I don’t know how much effort is put into it there. I imagine for other distro’s it also becomes easier this way to compute what native packages should be added as dependencies.

I imagine starting off with two sets:

  • dependencies-runtime-executables is a list of executables
  • dependencies-runtime-libraries is a list of libraries

Then, in the code, instead of directly using say with the executable name or find_library with the library name, one uses something like dependencies.executables("bash") and dependencies.executables("archive"). The dependencies are stored, like we do now with entry points, in a file in the dist-utils folder. That way downstreams can patch the references in that one file.

Why did I not include build-time dependencies here? Because those are the responsibility of the back-end. One could argue that’s the case for the pure-Python dependencies as well, but those are also solely run-time dependencies.

Related topics:

That’s an entirely different topic, and is likely to spark a fairly extensive debate (as far as I know, general dependency management is a significant and not-well-solved problem that goes far beyond Python).

I don’t think there’s much value in trying to make this spec “ready” for such a hypothetical future extension. Rather, we should initially focus on the question of how we’d persist such data in package metadata, and only when we have an agreed specification for that (and ideally, a working implementation so that we have an assurance that the spec is usable in practice) should we worry about how we handle the data entry side of that.

If you want to start that discussion, feel free, but please create a new topic for it.


Puh, long thread but I made it through :muscle:

Here are my 2 cents. I vote strongly for the exploded table format. And this is why:

  • The draft of PEP 621 already uses TOML syntax. Anyone who will use it, has to get familiar with that syntax. Once you get the quite simple concept of key-value, the user experience gets confused, if one suddenly find a (more or less) long string that contains multiple information.

  • For me consistency is important for a good user experience. TOML was chosen for good reasons for PEP518. It is the favorite format for the suggested PEP621. So for the sake of consistency we should follow this road consequently.

  • It’s easy to extend. If it’s necessary, new keys can be added or obsolete keys can be removed from the table. No hacky regex needs to be adopted, to be able to read the necessary information.

  • It’s imaginable to allow tools to add additional information (e.g. with a key like x-some-key=) for there work without breaking something.

  • In my opinion the argument, that one can copy&paste a PEP508 string to pip doesn’t count, because this should be out of scope of this discussion. CLI should define there own way, how the arguments should look like, as this is a question of user experience and here it is absolutely ok to have different tastes.

Is the PEP supposed to define a minimum spec, where extra fields are ignored (and tools can’t rely on their existence), or a complete spec, where an error should be raised if extra fields are found?

If tools need further information about a project dependency, it would be nice to have it all in one place, rather than in an external file or in another section of the pyproject.toml. Adding this information in a string-style (PEP-508) specifier would be unwieldy.

You’re contradicting yourself here. You never addressed one of the major concerns of this topic: the pain of having to migrate from PEP-508 to the TOML format, and the strain that puts on all the tool maintainers. Furthermore as discussed above the TOML format cannot validate just by itself the format (e.g. version constraints need their own DSL), so in practice offers a lot less benefit than you’d think: user still has to learn a DSL and tools still need custom parser beside a plain old TOML parser to manipulate it.


I was afraid that it could be read like this :slight_smile: To be more precise: The dependency definition acts as an interface. So it affects all users and they have no they have no other choice. So consistency must be within the specification itself. Having a mixture of exploded tables and PEP508 would violates this.

Tools build on top of this interface can have their own opinion about how the user experience should be. Users can choose which one they like most and use it.

Work has to be done when PEP621 is official. And that’s independent from the choice how the dependencies are described.

I believe this is not discussed now. I would like to see a spec, which defines mandatory and optional fields and a way to integrate custom fields with a defined prefix e.g x- to mark them as custom. Validators should then raise an error if fields are found that doesn’t match these specs

We already need to have custom PEP-508 DSL for version specifiers at the very least, so considering we’re violating this constraints we might go down the path of least resistance, and go down the PEP-508 entirely for it. Has the added benefit is what people already used for the last 10 years. I feel you’d had a better case if TOML could validate in table format the entirity of the PEP-508 domain, but can only do very limited part of it - exploding key-values into a table.

As someone who is maintaining multiple packaging projects let me tell you that your chances of me updating projects are better if the amount of work done is as low as possible. So going down a path of status quo for dependencies specification/parsing frees me to address other outstanding issues. PEP-517 for editable installs e.g. I consider much more important.


100% agree with this point. If you don’t want to depend on the packaging library (“hacky regex” is a personal choice, not a forced requirement), you need to fully explode the definition into TOML. Once you do that, nobody is going to be happy with translating all their version constraints.

This is not exactly true since version specifiers come from PEP-440 and are not tied to PEP-508.

Well then what’s the point of trying to specify another metadata standard in this case?

Not the totality but most of it can be represented as TOML, only more complex cases with non trivial markers – which are rare from my own experience of analyzing a lot of packages when developing Poetry – are not easily represented as a full TOML specification, that’s why we introduced the marker property in Poetry to circumvent that.

Like people won’t be happy to adopt the new metadata standard at first as well but there is no reason that it won’t be done eventually, so we might as well go all-in on this new standard.

We shouldn’t make bad changes just because we’ll win in the end. The point is to make things easier for users, not to rule over them.

Depends on how we define it in the spec. :wink: We would have to specify whether unrecognized keys were an error or simply passed on. Since a key argument from the exploded table side is validation, I suspect we would want to make it an error.

Whether it’s a bad change or not is subjective, until we have proper data to back it up. I think it would be a good change overall and considering the traction Poetry is gaining I think a lot of people think so too.

We don’t have data to make this claim. Poetry might gain traction for other reasons than how it treats requirement specification, just as we don’t have data on how many people decide to not use Poetry because of it and walk away. Considering we can’t agree within ourselves if TOML table is better or not than PEP-508, I’d guess would we have a survey users would be similarly split.


Maybe but that at least shows that people are willing to make the change to another specification if necessary.

Which we know, but this is a mindset we should not take. If we can’t show through popularity that a particular scheme is better, we need to decide based on other criteria.

The pre-existing spec argument is good enough for me. Reference it and we don’t have to write a new one.

ALTERNATIVELY, and this will be controversial, declare dependencies off-limits for PEP 621 and tell backends to do their own thing. (Which I assume will be PEP 508 as that’s their output format and what they all already use, unless Poetry has become a PEP 517 backend while I wasn’t looking.) If there really are strong enough reasons to prefer the exploded table format, people will gravitate to those backends over time, as the theory goes.