PEP 735: Dependency Groups in pyproject.toml

kknechtel · November 23, 2023, 9:28pm

Here’s where I get stuck on this idea: if you have “dev” dependencies and gain a benefit from a standard format for specifying them, doesn’t that kinda imply that you’re a maintainer of someone else’s “Project X”?

Like, I can understand the benefit of setting up isolated test and build environments, but that’s about it. Presumably, each dev is only expected to set up pre-commit hooks, linters etc. once. Similarly it’s hard for me to imagine that there are single-author projects out there that gain a benefit from, in essence, reinstalling Sphinx every time it’s run.

So it seems like “people who want to do things differently” (or, for that matter, people who have little concept of a “dev environment” or don’t care about it much) are only in the target audience for this feature if they’re - well, in the target audience for PEP 722/723.

And their needs might be different. Even if they’re nominally distributing an “application” that by conventional wisdom ought to pin all the dependencies, maybe they don’t know, or don’t care, about the version numbers for those dependencies. Maybe they make relatively light use of whatever SciPy stack components and expect the part they’re using to be stable. Maybe they don’t want to specify their current version because they’re part of a team where they don’t know what versions anyone else has, and don’t want to force upgrades/downgrades on their colleagues, and especially don’t want to get stuck being unable to use multiple colleagues’ work in the same “project” because they needlessly specified incompatible versions.

Aside from that, I feel like the general attitude of “our project comes with a script that will set up a dev environment according to our expectations” is counter to the spirit of open source. I’m a responsible adult. I don’t want to have to learn someone else’s toolchain - even if all the components are popular - every time I submit something to a different project, and I don’t want to have to build the entire project if I’m trying to improve one small aspect that fits within a separately testable module. I’m not expecting the project maintainers to understand or “support” my tools, because they’re my tools and I’m accustomed to using them. I just want to submit a PR that follows the project’s human-readable style guide and passes automated checks. There’s already a high enough barrier having to read yet another Code of Conduct (even though I think I’m a generally nice person and anyway don’t expect to be involved in any social interaction beyond discussing my own PR), sign a contributor covenant, fork the repo so that I can speak GitHub’s “pull request” language, also clone the fork so I can actually work on it locally…

kknechtel · November 23, 2023, 9:40pm

I think there’s an ambiguity here. I can imagine two different designs being described:

 [project.dependencies.foo]
 version = ">=1, <2"
 index-url = "https://example.com/wheels/simple"
 # similarly for other PEP 508-specifiable things, and possibly extensions

 [project.dependencies.bar]
 # etc.

 [[project.dependencies]]
 # PEP 508 data inline in a spec string
 spec = "foo @ https://example.com/wheels/foo-1.2.3-py3-none-any.whl"
 # and possibly some other keys?

 [[project.dependencies]]
 spec = "bar"
 # etc.

The second one is structured more like PEP 735 is proposing (which also allows for “inline” dependencies). The first sounds more like what I understood from your description, but maybe I misread.

brettcannon · November 23, 2023, 9:55pm

Sorry about that! I added an example to my original post.

It’s the first one. Note how I don’t mention a spec key, so I’m not sure how you’re synthesizing your second example from my post.

kknechtel · November 23, 2023, 10:02pm

In such a design, there would have to be some key under which that data was specified, so I cribbed it from the PEP draft But it’s clear now that you are proposing a fundamentally different structure.

I don’t see anything wrong with the design, and I’m inclined to support it. It does, of course raise the standard questions: “how to teach this”, and what will happen when existing tools see this new data with a structure the existing PEPs don’t allow for. (Especially since it allows for a differently-structured specification of data that can already be specified today).

brettcannon · November 23, 2023, 10:05pm

I think it’s a spectrum. For the beginner case, my idea left project.dependencies alone for what is currently supported since that’s the easiest to explain. But I also don’t think my tweak for allowing a string just for a version specifier like Cargo.toml is hard to explain either. Optional dependencies are an intermediate thing, so I think they can be more complicated.

Probably error out saying that they don’t recognize something and so they can’t handle it. There’s a reason why packaging standards can take a while to catch on.

brettcannon · November 23, 2023, 10:07pm

Yes, I’m trying to feel out where people would prefer to go in terms of direction in general for format/approach.

flyinghyrax · November 23, 2023, 11:29pm

First thank you Stephen @sirosen and Brett @brettcannon for the work you’ve been doing and to everyone else contributing to the discussion. It’s exciting to see progress happening in this area and always interesting to hear everyone’s perspectives.

I’d like to reiterate that Python is not only used for open-source libraries. There’s a lot of overlap between people involved in these online discussions and people who contribute to open source projects. But I would speculate that if you tally up all the Python in the world, you might find that most of it never leaves a single company, and most of it doesn’t use what we might think of as “normal” project structure. There’s a lot of middle ground between the audience of PEP 723 and the current state of pyproject.toml for building wheel files, and a feature not being a good fit for open source projects doesn’t mean it isn’t helpful for a big audience.

Your hypothetical examples based on Cargo were very appealing to me. It seems like a nice balance between what is already standardized and expanding the use cases that pyproject.toml can help with. To me it feels very self-consistent with existing parts of pyproject.toml and with similar project configurations from other languages.

EpicWink · November 23, 2023, 11:37pm

I propose $-format for specifying other dependency groups:

foo = ["numpy"]
bar = ["$foo", "scipy"]

I believe $ is an invalid PEP 508 start character

Perhaps relative path support could be optional, and some food may wish to not support it. I’m not convinced of this one.

Is there any problem with always specifying POSIX relative paths? I can convert them using pathlib:

from pathlib import Path, PurePosixPath
platform_path = Path(PurePosixPath(path))

To me, the project pyproject.toml table means the metadata for the installed project. It is only implemented as matching the wheel metadata. I think things which are not displayed in the installed metadata should go on other tables.

Pip’s --only-deps is not a strict superset, because you are forced to install the project dependencies, unless the project has no dependencies.

I see --only-deps useful for installing all but the project in a Docker image, so the project can be installed as editable in a mapped volume of a container of the image. I don’t think this use-case needs to be solved by dependency groups, so both features could exist.

jamestwebber · November 24, 2023, 1:22am

I’m not sure what you mean by this, in particular who is being referred to by “you” and “someone else”. In my post I was assuming that “you” are a first-time contributor to a project that has a defined dev environment. At that point: no, you’re not a maintainer, yet.

What I was trying to say is: it’s reasonable for a project to have a specific dev environment, defined in a project configuration file. It’s not mandatory that all projects do that, but it should be a supported thing that projects can do if they choose.

Defining a standard like this increases flexibility for contributors, because they can use any of the tools that implements the standard, rather than being restricted to the one tool that everyone else on the team uses for development.

sirosen · November 24, 2023, 1:40am

I’ll echo a sentiment that others have expressed in thanking everyone for keeping up a lively discussion but also avoiding getting really sidetracked!

I will make some time over the weekend or maybe tomorrow to work more on this. In the meantime, I want to let everyone know what the current direction for this PEP is, and where I’ll be putting my efforts:

Combining dependency groups with extras is complicated. I’ve decided that this PEP will focus only on declaring Dependency Groups as a new piece of data in pyproject.toml.

I accept that combining the two ideas – or, if you prefer this phrasing, extending project.optional-dependencies to cover more use cases – might be a better solution. I’d like to put in a lot more work to complete this PEP, especially in defining it’s use cases, and then we will hopefully be better able to see whether or not this is the correct path.

Brett’s proposal above after looking at Cargo.toml is still very useful thinking and input, as are other ideas like it, but I want to be clear that I don’t intend to go down that path right now.
I was more open to making such changes to the PEP only a couple of days ago, but I’ve realized

it’s harder to combine these than I initially considered
changing the PEP too much makes it harder to know what we’re discussing

An object specification will return.

We have a three important ideas to express here regarding local filesystem dependencies: path (str), editable (bool), and only_deps (bool). If I understand the cases correctly, pip install --only-deps is only important for local dependencies and in particular for ., so making it a key here matters.
Other data may also fit in this method, like references to other dependency groups, which could allow us to avoid an extended string syntax beyond PEP 508.

I’m currently thinking that we will have string and object representations, that strings will be strictly PEP 508, and that objects will be local dependencies or dependency group specifiers. I’m not aware of cases which this fails to cover, based on PDM and Poetry features, but it’s important to note that the detailed user stories for this PEP still need to be written.

I intend to move the table to [project.dependency-groups].

I haven’t heard anyone say that using project is a major issue. Everyone agrees naming is difficult and important. This seems like the best name for the new table to me.

Two appendices will be added: Prior Art and User Stories.

We have some great content in thread already for one of these, but not the other.

For the User Stories, it’s important to show how the PEP aims to satisfy each one. My current idea is that they will consist of a short explanation which sets up a use case and an example config which is appropriate to that case. I tried writing one this morning but I had to scrap it – keeping these short, clear, and complete is a difficult balancing act in terms of what information is safe to omit.

kknechtel · November 24, 2023, 2:37am

Say I’m the sole developer on something. Why would I want to record in my pyproject.toml the fact that I’ve already set up e.g. pytest in my development virtual environment? Alternately, why would I prefer to write something there and then use some meta-tool, vs. just directly installing it? I assume this can only be because I want to invite other developers on to the project, and want to recommend those tools to them (or expect them to run the tests a certain way before submitting a PR).

My understanding is that most of these lists would be used to specify “dependencies” like MyPy, Black etc. In my mind, those are “the tools”, for which I as a hypothetical developer might prefer alternates (as my actual self, generally I would prefer to go without entirely).

sirosen · November 24, 2023, 3:19am

I’m getting the feeling that this subthread is really just some kind of mix-up rather than a deeper disagreement. I could be mistaken, but let’s consider changing the level of abstraction to be more concrete.
(This doubles as a chance for me to try to write a user story for the PEP…)

Suppose I have a project which formats with black – and specifically, a certain version of it. I setup tox.ini to run that task as tox r -e format. I document it as such.
A developer, let’s name him “Michele”, shows up as a new contributor and reads the contribution doc, which says “We format with black and you can use tox to run it.” To which his reaction is that that’s all well and good, but he prefers to configure his editor to run autoformatters for him.
How can Michele configure his editor to use the version of black specified by the project? Note that the version used could change over time.

Without this spec, there is no agreement on where or how the requirements data are stored. It could be somewhere convenient like a requirements.txt or somewhere inconvenient like the tox.ini . If the formatting requirements are in a Dependency Group, Michele can configure his editor to run autoformatters in the project using that Dependency Group. So he has an easier time integrating his preferred tools with my project.

There are several assumptions here. It assumes that Michele’s editor supports this feature, for starters, which is glances awkwardly at his vimrc file nontrivial. And I have to have a nice Dependency Group setup for him to use. Without that, at best Michele files a ticket asking for support, and I oblige by moving the dependency into a Dependency Group. But for that to succeed, he must describe what he needs well and I need to be willing to accommodate him.

Finally, while I’m enumerating shortcomings, this case can be solved today with a requirements.txt file. The only issues with those are

they aren’t standardized (so supporting them requires that the tools use pip)
to be blunt, I find it very silly that I would define a dedicated requirements.txt file merely to list the version of black in use and would probably close a ticket asking for this on my projects as a wontfix

The main point stands. The maintainer is using tox and a contributor wants to use his editor integration on the same data. Under the new spec it becomes possible to do this via a standard mechanism using as little as two lines of new config in pyproject.toml .

jamestwebber · November 24, 2023, 3:56am

This is a much better explanation of what I was trying to say. And one point that is easy to miss is that a lot of this effort is already happening^[1]

If you contribute to an established project it is very likely that they’ll require some checks before merging:

your contribution is formatted according to their style
it passes any relevant tests, which of course means using their test framework
it pass any other linting/typing/etc checks that they use

Right now, either you can rely on automation and PR reviews to catch that stuff, or you can reproduce the dev environment and run it yourself^[2]. But ignoring their choices is just making busywork for everyone involved.

Having a standard for this, in the project metadata, should make it easier for anyone to clone a repo and edit with their preferred tools. edit: where “preferred tools” are not the dev-dependencies themselves, but their big-picture setup. I guess that was the disconnect.

although Github Actions, bots like Miss Islington, etc have automated pieces of it ↩︎
which is where you’ll get into “I don’t want to use that tool that way” territory ↩︎

pf_moore · November 24, 2023, 8:30am

Do you intend to change the semantics of the [project] table to make name and version optional, or do you intend to limit this feature to projects which specify a name and version?

barry-scott · November 24, 2023, 8:50am

I would hope because the tools around pyproject make its easier to get the work done then inventing my own scripts.

kknechtel · November 24, 2023, 8:54am

Not everyone has such complex needs.

My “scripts” are one-liners in my .bash_aliases, things like

# Activate the local venv, which is specially named/located.
alias activate-local="source .local/.venv/bin/activate"

# Run an acceptance test in the local private folder.
alias try-it="(cd .local/ && source run-acceptance-test)"

where the acceptance test has to be custom anyway (because the results have to be evaluated manually).

And then for unit tests I just have python -m pytest ., which doesn’t need an alias.

sirosen · November 24, 2023, 1:31pm

As I was writing up, here, a reply which asserted “yes, and here’s what I’ll change to make it happen!” I became nervous that choosing this name is biting off too much additional complexity and imperils the rest of the proposal. At the same time, I think this name, in the project table, really is the best for matching user expectations.

It’s important to me that a pyproject file with only this table is valid. I would like it if a file with only [project.dependencies] were also valid.

I would therefore like to make those fields optional, but I may end up needing help writing it into the PEP. This may end up being simply too difficult to include in this PEP – perhaps it would require a precursor which focuses only on making those fields optional?

I’m going to leave that stated as my desired name for the table but possibly use a top-level [dependency-groups] table as an interim solution, today. I’ll include notes on the inclusion of the table in [project] under the naming item in Open Issues.

sirosen · November 24, 2023, 2:01pm

What are the implications of such a workflow on this proposal?
Some of us benefit from the features and some don’t? That seems fine.

@sinoroc’s post which started this part of the discussion seems to be about the concern that a project could over specify information which isn’t it’s business. For example, the project could expect that ../my_protobuf is a source tree copy of the project’s protocol buffer package. It’s not just specifying that there is a dependency, but exactly where it has to be.
And in some projects that could be wrong or inappropriate.

I think that this issue is mostly a matter of responsible use. requirements.txt supports setting up projects in this way, but we don’t see it terribly often in open source projects because it’s… a bad idea there. Simple as that.
But, as was hinted at in the original post, it’s pretty normal for a monorepo to do this.

So, are we uncomfortable with features which could be misused? It’s always a legitimate concern, but we have seen how requirements.txt can be used, and in my experience severe misuse is rare. So in this case, I’m not that worried.

@sinoroc also mentioned editable install control. I think it’s important to be able to specify it for Poetry and PDM parity, but I’m going to need to specify how tools consume that information carefully. e.g. The spec should allow for tools to override that behavioral flag.

ofek · November 24, 2023, 5:04pm

I feel bad to be the contrarian here because I really appreciate you taking the time to enumerate use cases but for this specific scenario and if we were to generalize it to others, I think the proper solution is Brett’s proposal of every environment manager exposing metadata for consumers. In this model, every tool can access the same environment without a reimplementation and potential introduction of bugs/inconsistencies.

sirosen · November 24, 2023, 5:37pm

Please don’t feel bad! I appreciate that your perspective is quite different from my own and is informed by extensive experience as a tool maintainer.

Your response indicates that we agree that scenarios like the one described exist, even if we disagree about how to handle them.

We have to keep you and other maintainers engaged. If this spec becomes something which pip, hatch, PDM, and Poetry maintainers all dislike strongly, I’d call that a major failure – I would not support the proposal if that were to happen.
(But also, compromise is the art of making everyone almost equally unhappy. )