PEP 735: Dependency Groups in pyproject.toml

sirosen · November 20, 2023, 9:50pm

Hi all! This is my attempt to follow up on Projects that aren't meant to generate a wheel and `pyproject.toml` .

Some notes and questions to seed discussion:

I have mostly stayed close to @brettcannon’s proposed simplified solution from that thread, but have made a couple of changes. That’s part of why I listed him as sponsor, rather than co-author.
It may be that his original, slightly simpler, proposal is preferable, but I think the draft PEP strikes the right balance between simplicity and future-facing extensibility. I’m open to arguments on this front, but have attempted to put my reasoning into the PEP.
I did not write down user-stories in very much detail. I’m not sure how much is “enough” vs “too much”, but would be happy to elaborate on these quite a bit if they seem too terse and unclear.
There are elements of this PEP which seem related to 633 (rejected), 725 (in draft), and other discussions. If someone has recommended reading which I should examine and possibly refer back to from the PEP, I would be very thankful for that input.
I have not yet written a reference implementation. Before I do so, I’d like to gauge whether or not the one I have proposed, in the current draft, would be a satisfactory proof of concept?
Is anything underspecified? Overspecified?

jeanas · November 20, 2023, 11:03pm

I have some quick questions. As a word of warning, I did not fully read the discussion “Projects that aren’t meant to generate a wheel and pyproject.toml”^[1].

Is it intended that this can/should also be used for projects which are meant to generate a wheel? Currently, some projects (ab)use optional-dependencies for that. Does this replace this hack? The PEP motivation focuses on projects which aren’t meant to be turned into distributions, so it’s not clear whether it also addresses that case.
How is the data intended to be actually used? For example, if you define a test environment, you don’t just want to specify the dependencies, you also need what actual command is needed to run the tests (e.g., in Hatch, this would be

[tool.hatch.envs.default]
dependencies = ["pytest"]
[tool.hatch.envs.default.scripts]
test = "pytest --some --options --here"

).

What are the reasons not to reuse PEP 723’s [run] table? (I suggest to answer by adding to the “Rejected Ideas” section.)

The sudden flurry of verbosity and meta-meta-proposals made it hard for me to catch up in reasonable time. ↩︎

h-vetinari · November 20, 2023, 11:18pm

Thanks for writing this up!

In general I think the PEP is aiming in a good direction, though I think if we’re introducing a table with a name as fundamental as “requirements”, we should IMO not key it with such an ambiguous name as “packages”. Quoting from another discussion

In the meantime, PEP 725 is also proposing external.{build-requires, host-requires, dependencies}, as well as optional- versions of all those.

This is creating a zoo of ways for specifying various kinds of requirements, which is becoming increasingly confusing and inconsistent IMO. Therefore, I would really like to not close the door to eventually have requirements become a unified home for all those in some way.

IOW, if we choose requirements.packages today, we close the door on that for a long time, because it’s not clear which packages (build, host, external, wheel-build, run-time, etc.). It’d be great if we could choose a name like requirements.run for this (in the quoted post above, I used [dependencies.run] and [dependencies.optional] as strawmen – I care much less about the exact naming than about having a consistent API for dependency specification at some point)

original quote said run.{...}, I adapted it for this discussion. ↩︎

kknechtel · November 20, 2023, 11:38pm

Aside from some typos and minor matters of writing style, my initial observations (not explicitly labelled as PMI, just getting it out there):

While this idea came out of the “projects not intended to build a wheel” thread, that thread got very sidetracked. The idea is equally applicable to projects (whatever that means) that do intend to build a wheel, which typically need to list requirements for tasks in addition to wheel-building - tasks which heavily overlap with the needs of others. It would be better not to phrase the Motivation in terms of contrast with “traditional” projects. You explain this at the end of the section currently, but I think this idea needs reworking for a smoother presentation.
I’m not entirely sure of the intent behind the extra layer of namespacing. It’s hard to imagine non-package requirements, unless you mean the sort of thing that PEP 725 is covering - but it seems like they want a separate table anyway. (Ah, I see you have in mind to leave room for a future resolution for the Python version issue. Fair enough.)
Adding the ability to give a list of names of other dependency groups seems redundant and needlessly complex for a minor convenience; i.e. ["[typing,test]", "useful-types"] could just as easily be ["[typing]", "[test]", "useful-types"].
I do very much like the semantics of the leading dot referring to the current package (at first it would make sense to just use the current package’s name, but it should explicitly stand out because this information will be used in development such that the current package generally isn’t available on PyPI). That makes more sense than the original idea of requiring it before the [] enclosing references to other dependency groups (which would just be redundant signalling).
- This also has the benefit that it offers an elegant way to specify extras of the current package.
- There is, fortunately, no ambiguity between the use of []: when they enclose the name of an extra, there will always be something in front, whereas when they enclose another list name (or names, if you keep it as is) there is no prefix.
- Some people who encounter this might be confused as to why a project might want to mention itself in a dependency list. There should probably be some explicit mention here of the use case for isolated test/build/etc. environments.
It still seems premature to me to add an object representation now when it doesn’t offer any additional functionality yet. In the future, if someone specifies a dependency using an object with additional keys, and feeds it to an outdated tool that doesn’t implement the new PEP specifying the new keys… it’s not clear to me that “only recognize the spec key and assume the rest is meaningless” is actually a graceful degradation path. Maybe that does more harm than failing loudly. (And if the intended handling is instead “complain about unrecognized keys”, then “complain about receiving an object instead of a string” works just as well.)
I wasn’t aware of the requirements table name being reserved. Maybe this part should include a citation?

kknechtel · November 20, 2023, 11:49pm

As I understood the idea, the entire point is that it’s “any packages”, and up to the author to name the dependency list and assign its semantics. Unfortunately, the design of pyproject.toml has already ear-marked build and run-time dependencies as “special”, and the PEP 725 authors seem to feel strongly that non-Python build-time dependencies (i.e. for compiling and linking C code) should be kept separate from dependencies of the build system itself (and I’m not even sure which category something like CMake goes into! Or maybe it depends on whether a given CMake-like tool is implemented in Python and available on PyPI??).

I don’t think this closes the door on migrating the information currently in build-system.requires or project.dependencies to the requirements table - after all, they’re requirements, that consist of packages. There just needs to be a) the willingness to make some potentially breaking changes (or else decide how to handle having those same packages potentially listed in both places) and b) a way to ascribe the needed semantics (which is sort of where I was trying to go with my own proposal).

Essentially, I don’t really understand what you imagine as an alternative. If your thinking is that we should be more specific, like requirements.for-testing etc., I kinda already tried that and it seems to have been rejected. It doesn’t seem fair to try to exhaustively list all the “kinds” of dependencies developers might need, in advance.

sirosen · November 20, 2023, 11:50pm

FWIW, even though I have read all of it at some point, I can’t pretend to remember everything. I think we should very gently close the door on a lot of that discussion, and consider this proposal on its own merits and faults.

Yes, it is meant to support/improve both cases – and in precisely the way you suggest, by replacing the use of optional-dependencies/extras. I’ll need to think through how to make that clearer up-front.

Jean Abou Samra:

How is the data intended to be actually used? For example, if you define a test environment, you don’t just want to specify the dependencies, you also need what actual command is needed to run the tests (e.g., in Hatch, this would be
[tool.hatch.envs.default]
dependencies = ["pytest"]
[tool.hatch.envs.default.scripts]
test = "pytest --some --options --here"
).

The idea is that this defines the requirements data, but tools will need to refer back to it in the same way that some tools refer out to requirements.txt files.
For example, hatch could support the following usage:

[tool.hatch.envs.default]
dependency_groups = ["test"]
[tool.hatch.envs.default.scripts]
test = "pytest --some --options --here"

I don’t want to speak to the ideal/best way to integrate this for various tools. I know that as a long-time tox user, I’d like it best if I could use a flag in tox.ini’s deps grouping, e.g., deps = --group test. That has a nice symmetry with being allowed to use -r test.txt in deps already.

I take it that the open-ended nature of this is currently unclear from the PEP text. I’ll look into a way of refining this to be more apparent.

I think PEP 723 used the [run] table name only because it was what seemed like the front-runner for this table name at the time that it was written.

Whatever table name is chosen though, this is a valid point. Many of the other major options should be accounted for under Rejected Ideas.

I agree with this sentiment, although I hadn’t reached the same conclusion. I immediately disliked the fact that we were going to have, at a minimum, project.dependencies, project.optional-dependencies, and <NEW THING>.

I hadn’t considered taking requirements as closing the door though – I thought it was opening it (!).

It sounds like this is tightly related to the fact that I chose packages as the subtable name.

So if I wanted to push for that “door opening” feeling of starting to use requirements as the top-level name, could I achieve it with requirements.groups? Or does it need to be a more verbose name like requirements.dependency_groups?

kknechtel · November 21, 2023, 12:01am

FWIW: I agree that this could be explained better, but it also matches my default expectation anyway. pyproject.toml is supposed to be declarative; no matter how much meaning you ascribe to specific names in PEPs, it’s up to the tools to take action.

That is: sure, Hatch could do things the way you describe. Another backend might have an internal concept of what a testing task is, and thus automatically privilege the name test and look specifically for that named dependency group when asked to awesome-toolchain run-tests, without needing to be told any more.

And I think this freedom is a good thing.

Regarding PEP 723, I’m not sure how much it needs to be discussed here at all. I’ll just refer back to the “projects not intended to build a wheel” thread. But the simple answer for “why not reuse the name [run]” is that this clearly describes a single specific dependency list oriented towards a specific task (running the code), and the goal here is explicitly to specify multiple dependency lists that could pertain to separate tasks.

sirosen · November 21, 2023, 12:04am

Good point! I’ll incorporate this when I do my second draft pass. Overall, I think the Motivation section is less clear than it could be. This is probably a big part of why.

I noticed the same thing while writing it up. But at the same time, they look a lot like extras, which can be comma separated. So I’m torn – will it be more confusing if we disallow comma-separated listing? I’ll put this out there as something which I’m unsure about. If there’s broad sentiment in favor of removing support for comma-delimited lists here, I’ll make the change.

I think the question that needs answering here is “what is the behavior of a tool which implements PEP 735 if the data is extended with a future spec that the tool does not implement?”
My opinion is that the tool should apply its PEP 735 behavior as previously implemented.

That opinion – about what should happen – more or less mandates the inclusion of the object format today. Without it, we can’t describe a smooth upgrade-friendly path for future extensions.
If you hold the opposite opinion, that unknown keys should cause a hard failure, then and only then does it look premature or unnecessary.

My concern is the situation of a user using two tools, call them foo and bar, which read these data. foo supports PEP 735, bar supports a future PEP NNN, which adds a field to the data which is useful for bar but irrelevant for foo. With the current definition, foo doesn’t need to update in order for the user to start using the latest version of bar and its PEP NNN features.

So I have a pretty particular motivating scenario in mind, in which tools update at different rates, probably because a new PEP is only relevant to one of them, which drives my desire for this non-failing behavior. I should write this up in the rationale section.

My understanding is that all of the non-tool tables were reserved in PEP 621. I’ll double-check that.

kknechtel · November 21, 2023, 12:11am

Thinking about it more, maybe “they look a lot like extras” is a bad thing, and a different sigil would work better. My intuition suggests prefixing with @.

I don’t strongly hold this opinion, but my understanding is that the alternative is that foo just ignores the unknown key. As written the PEP doesn’t really seem to spell that out, and I’m worried that it could actually cause a problem to do it that way. For example, if the key is to specify some kind of hash, a security-conscious foo user might be upset that the dependency was installed without checking the hash.

But this is something where I’d want to hear more people weigh in.

Ah, that sounds familiar. I think I just read too much in to what you were saying.

h-vetinari · November 21, 2023, 12:30am

Yes, absolutely (w.r.t. to the point I was trying to make about leaving the door open for requirements being used more universally in the future).

Naming is hard, so I don’t pretend to have the right answer. I still feel “groups” has a very similar problem as “packages”, in that it’s too generic to know which one of the various requirement-roles it describes.

I’m currently tempted to have requirements.run (for the baseline requirements), and optionally requirements.run.optional (which would match your requirements.groups). I guess it could also be spelled requirements.run.extra. Anyway, that’s just my take (with hopes about a unified interface at the forefront of my mind, so don’t mind me if you have other priorities)^[1].

Perhaps I’ll have to write a PEP about my dreams for a one-stop-shop for that eventually ↩︎

kknechtel · November 21, 2023, 12:40am

Why would it say run, when the intent is explicitly to describe groups of dependencies that are not used for running the code (such as building documentation)? Again, it doesn’t describe any particular requirement-role, so the genericity is intentional.

sirosen · November 21, 2023, 1:42am

Indeed, it’s quite hard for this… Specifically, with reference to this:

I agree, it’s intentionally generic.

But! There’s a middle ground here. These requirements are not without any broader context. They exist in contrast with package dependencies and extras.

So how do we name such a thing?
The clumsiest option which comes to mind is requirements.non_distributed_dependency_groups.
It’s accurate, descriptive, and… awfully long and wordy. It has the important downside that it phrases this idea negatively as “the ones which aren’t packaged”.

Is there an alternative out in this space? requirements.dependency_groups is growing on me because it’s positively stating the idea (which, per the PEP, I want us to teach people as “Dependency Groups”).
Perhaps a rephrase: is there another valid name for this feature, other than “Dependency Groups”?

flyinghyrax · November 21, 2023, 2:05am

FWIW I agree and appreciated this part of the proposal. Does this fall under the term “forward compatibility”? (I get forward vs backward compat reversed.) I’m not sure why we wouldn’t want this, since it specifies that “old” tools can consume their supported subset of “new” metadata.

sirosen · November 21, 2023, 2:10am

I’m familiar with it under the name “future compatible”, but I assume “forward compatible” is another name for the same thing. Backwards compatibility is – in my experience – always about the new tool or spec being compatible with what already exists. Future compatibility is about planning for some future and trying to be compatible with it today.

One upshot from the discussion of this detail: it seems that it might have been a little unclear the degree to which I meant to specify this. The next draft will state explicitly that a tool “MUST NOT” error if it sees unrecognized keys – this was meant to be conveyed by “MAY warn”, but they aren’t quite the same thing.

h-vetinari · November 21, 2023, 2:11am

Because – as you can see from PEP 725 – you can have optional dependencies at several layers, e.g. at least build, host & run (latter name up for discussion). Having requirements.{x}.optional across the board would be a clear improvement IMO.

Now, w.r.t. the name “run”, the separation I proposed is certainly influenced by what I’d consider the “broad strokes” (i.e. test-specific dependencies are much much closer to “run” than to “build”), so grouping those dependencies that come after the main package has been built under “run” is a consequence of that.

Of course, it’s possible that naming and conceptual separation can be improved, though in the case of your example it might actually be worth considering if “building documentation” should go under

[requirements.build.optional]
doc = [...]

i.e. put “building docs” under build requirements.

The genericity is exactly what’s incompatible with a broad range of possibilities for future unification of the dependency specification API^[1]. Unless of course people are fine with the divergent zoo of current (resp. soon-to-be) knobs, and happy to say “we’ll never unify that”, which I would find profoundly disappointing TBH.

How does an extra called “test” interact with a dependency group called “test”? Presumably one would say: “one is published, the other is not” or “that should be an error”, but different ways to specify the same thing, with the same name, with different outcomes, and wildly diverging APIs (for what’s conceptually closely related) IMO sounds like a recipe for user confusion and frustration.

At some point we’re going to have to stop stapling on things and come up with a coherent design. Don’t get me wrong, this is not directed at @sirosen’s PEP in the slightest, and PEP {723, 725, 735} solve real issues that need to be fixed, but with each one focussed on changing one part of the dependency specification API independently, the end result would be a mighty mess.

I think it boils down to the fact that pyproject.toml (including project.dependencies) was once meant first and foremost for wheels, but has clearly outgrown that role. We should acknowledge that, instead of twisting ourselves into pretzels to cling to a v1 API which unforeseeable(!) evolution has shown to be inadequate.

that is, without yet another thing to deprecate and replace in the future, and one which we haven’t even introduced yet! ↩︎

kknechtel · November 21, 2023, 2:42am

This part, I absolutely agree with, and it’s one of the things I wanted to touch on in the other thread that I remember I haven’t started yet

sirosen · November 21, 2023, 2:54am

Just to share a couple of quick thoughts on this (I’ll probably have more to say later), I had an early idea for dependency groups which looked something like this:

[[requirements.groups]]
name = "test"
is_extra = false
deps = ["pytest", "coverage"]
[[requirements.groups]]
name = "mysql"
is_extra = true
deps = ["sqlalchemy"]

But I nixed it because I thought it was so much more complicated to explain and it would require entering into new and broader debates about how extras are defined and how build backends have to handle these new data. Plus the idea that some of these would be extras gets weird given that extras are additive on top of the package dependencies (which is not a characteristic which I think is desirable for dependency groups in general).

Maybe I backed away from such ideas too much, but I really wanted to narrow focus to something which I thought would stand a chance at passing review. I don’t think that such ideas are “bad ideas” – if nothing else, they’re interesting – I just can’t convince myself that they’d be as likely to be accepted as the current pitch.

I’m not sure we should be concerned about the possibility of an extra and a dependency group having the same name. IMO it’s in the category of “users are allowed to do confusing things”.

However, there’s one particular wrinkle in that line of thinking:
poetry and pdm already have dependency groups. poetry allows them to collide with extras, but pdm does not.

That means that allowing for collisions is incompatible with pdm’s approach but compatible with poetry.

I’m still inclined to be permissive here – I don’t see that PDM or Poetry are having trouble with user confusion between dependency groups and extras, although I’m open to being corrected by their maintainers.

My concern / fear is that there isn’t a broad appetite for starting on a v2 definition.
If my read of the community feeling is right, we’re not getting a v2 definition soon. So either we make additive changes to the existing spec, or we don’t make changes at all.

This is a matter of defining the best possible addition to the v1 pyproject.toml definition, not the best possible content for the file overall.

pradyunsg · November 21, 2023, 9:35am

First off: I haven’t been keeping up with the meandering thread that is the one about non-wheel-distribution Python projects. Life and all that.

Process notes:

Please don’t post links to PRs adding a new PEP until they are merged. Having PEPs go through review by PEP editors before we start discussing them is a good thing, especially since important details like confirming the PEP number as well as ensuring the sponsor is actually interested in sponsoring the PEP are engaged in the discussion. We (as a group) should really stop being overeager in sharing draft PRs here – we learnt that PEPs change/improve meaningfully in the initial iteration as a result of the PEP editors’ review (eg: PEP 704, 722, 723).
I’ve editted the title since “Draft PEP” has a meaning distinct from what you’ve used here.

On the actual proposal: Isn’t the idea here basically the one discussed extensively in Adding a non-metadata installer-only `dev-dependencies` table to pyproject.toml?

I have reservations about a few aspects of the current proposal here:

We intentionally avoided using “requires”/“requirements” as a name/key in PEP 621, since it is an overloaded term. Let’s not reintroduce that here.
The syntax distinction between .[test] vs [test] is extremely subtle, especially since it’s “just a dot”.

I’m not gonna comment on more aspects of this potential PEP until it actually ends up on peps.python.org.

pf_moore · November 21, 2023, 9:38am

I agree with this. I think this PEP should make an unbiased choice of the most appropriate name, and ignore the implications on PEP 723. It will then be up to the PEP 723 author and PEP delegate to decide how 723 should change to match what happens here.

More generally, though, I like this PEP.

pradyunsg · November 21, 2023, 9:52am

Oh, and… one last thing before I vanish because $work.

There are plans for pip to implement the following…

github.com/pypa/pip

Add `--only-deps` (and `--only-build-deps`) option(s)

opened 02:18PM - 08 Sep 22 UTC

flying-sheep

state: awaiting PR type: feature request

https://github.com/pypa/pip/issues/11440#issuecomment-1445119899 is the currentl…y agreed upon user-facing design for this feature. --- ### What's the problem this feature will solve? In #8049, we identified an use case for installing just the dependencies from `pyproject.toml`. As described in the solution section below `--only-deps=<spec>` would determine all dependencies of `<spec>` excluding that package itself and install those without installing the package. It could be used to 1. allow specifying environment variables that are active only while building a package of interest (without having it be active while potentially building its dependencies). 2. separate dependency installation from building and installing a package, allowing to rebuild a package in a docker build while the dependency installation step is loaded from cache. This example shows both use cases: ```dockerfile # copy project metadata and install (only) dependencies COPY pyproject.toml /myproj/ WORKDIR /myproj/ RUN pip install --extra-index-url="$PIP_INDEX" --only-deps=.[floob] # copy project source files, build in a controlled environment and install our package COPY src/mypkg/ /myproj/src/mypkg/ RUN env SETUPTOOLS_SCM_PRETEND_VERSION=2.0.2 python3 -m build --no-isolation --wheel RUN pip install --no-cache-dir --no-dependencies dist/*.whl ``` Instead of the solution from #8049, @pradyunsg prefers a solution similar to the one below: https://github.com/pypa/pip/issues/8049#issuecomment-1079882786 ### Describe the solution you'd like One of those two, or similar: 1. (used in the example above) `--only-deps` would work like `-r` in that it’s not a flag globally modifying pip’s behavior but a CLI option with one argument that can be specified multiple times. Unlike `-r` it accepts a dependency spec and not a path to a file containing dependency specs. Where `pip install <spec>` first installs all dependencies and then (build and) install the package referred to by the spec itself, `pip install --only-deps=<spec>` would only install the dependencies. 2. `--only-deps` would work like `--[no|only]-binary`, in that it requires an argument specifying what package not to install. A placeholder like `:requested:` could be used, e.g.: ```bash pip install --only-deps=:requested: .[floob] ``` ### Alternative Solutions - Re-using `-r` instead of adding `--only-deps`. I don’t think this is a good idea, since people would be tempted to do `-r pyproject.toml` which would be wrong (Dependency specs including file paths look like like `./path/to/pkg[extra1,extra2]`) - Making `--only-deps` a global switch modifying pip’s behavior like e.g. `--pre`. I have found that global switches like that are dangerous and not very intuitive. To install a dev version of your package, doing `pip install --pre mypkg` seems innocuous but will actually install dev versions of `mypkg` and *all* its dependencies that have any dev versions. It’s safer to do something like `pip install mypkg>=0.1.post0.dev0` to limit dev version installations to one package. Similarly it’s unclear what a `--only-deps` switch would apply to. Would `pip install -r reqs.txt --only-deps` install the dependencies of every package specified in the file but none of those packages? - Using e.g. [beni](https://github.com/Quansight-Labs/beni) to convert PEP 621 dependencies to a requirements.txt. This works even today but feels like is shouldn’t be necessary as it involves quite a few steps, including writing a file to disk. ### Additional context NA ### Code of Conduct - [X] I agree to follow the [PSF Code of Conduct](https://www.python.org/psf/conduct/).

With that, we’d basically bless the extras-based model as a good way to declare such dependencies since you can then do:

[project.optional-dependecies]
test = ["pytest"]

and then being able to use pip like:

pip install --only-deps .[test]

This will avoid needing to introduce a new concept at the standards-level, recognise the fact that this is something that people are already doing and enables the existing workflows without needing a substantial review of what the UX would need to look like for a completely new named-concept.

I do think it’s a bit suboptimal that this exposes information to uses, but based on how many projects on PyPI have a test / tests / testing extra, it’s clearly something that a lot of people do. I think we should recognise that this will have migration costs for users, if we introduce a new concept and syntax here.

My current feeling around this is that the tradeoffs around keeping status quo (i.e. not moving forward with adding a new table-of-named-dependency-lists) and instead implementing the aforementioned pip feature, blessing some of the existing “optional depedencies” names as a convention for the community at large (eg: tests, docs, lint) would be the approach with better trade-offs.

If the PEP author doesn’t agree, I do think the PEP needs to cover this approach in the rejected ideas section; since (as noted earlier in the thread) adding more places to put dependency lists has potential for confusion.