PEP 735: Dependency Groups in pyproject.toml

pf_moore · November 22, 2023, 5:13pm

… and can we please enumerate some more actual user scenarios that this feature is intended to support (and some that it’s not, for contrast)?

For example, my scenario above.

What’s wrong with the current options? How will the new proposal be better? How will the user know to choose the new mechanism rather than the existing ones? What are the existing mechanisms targetted at, if it’s not this scenario? What are the distinguishing features which would prompt choosing one mechanism over another?

Why POSIX? Relative paths are never going to be portable (except possibly in the case of a monorepo) as moving the source tree will break the reference. So what’s the advantage of making them portable? Also, why only relative paths? I can easily imagine having some local wheels at (say) C:\Users\Me\common. I’d suggest allowing any string that can be passed to pathlib.Path.

Also, what’s the semantics of a relative path? Can it point to a wheel? To a sdist? A source tree? Must all consumers be prepared to implement pip’s semantics for local pathnames? Because the only semantics that currently exist are pip-defined. Yes, maybe the semantics are obvious, but if so, then let’s write them down and not assume. Who knows what options might exist in 10 years?

sirosen · November 22, 2023, 5:46pm

Paul Moore:

sirosen:

a relative posix path which must begin with .

Why POSIX? Relative paths are never going to be portable (except possibly in the case of a monorepo) as moving the source tree will break the reference. So what’s the advantage of making them portable? Also, why only relative paths? I can easily imagine having some local wheels at (say) C:\Users\Me\common. I’d suggest allowing any string that can be passed to pathlib.Path.

Also, what’s the semantics of a relative path? Can it point to a wheel? To a sdist? A source tree? Must all consumers be prepared to implement pip’s semantics for local pathnames? Because the only semantics that currently exist are pip-defined. Yes, maybe the semantics are obvious, but if so, then let’s write them down and not assume. Who knows what options might exist in 10 years?

Basically +1 to all of your feedback, except that I think there is an advantage to making the paths portable.

Two scenarios I’ve encountered:

A developer works on a project on both Windows and Linux (separate machines or VMs) to do acceptance testing on both platforms. He therefore keeps two “identical” setups of the project in both places for my development needs.^[1]
A team shares a common/expected dev setup in terms of directory layout (this may or may not be enforced by a monorepo). Developers in the team use multiple platforms, but the same configuration data.

I think relative paths being non-portable would be an issue. But perhaps tooling would resolve this? I know some tools will handle POSIX-style paths (or at least / separated strings… not always clear if the / → \ conversion is full path handling) on Windows by converting them. I don’t want to work with the assumption that the paths don’t need to be portable, but I am okay with the idea that the portability concerns can be left out of the spec if we are confident that tools can handle it.

My initial gut-reaction to the question of what targets for paths should be supported is

supported distribution files (sdist and wheel today, plus any new thing that might be introduced)
source directories

However, I want to think about that more.

I don’t have this workflow myself, but I have a colleague who does. I can ask him for info or details if we have specific questions, but I think this description of his workflow is sufficient for our needs. ↩︎

sirosen · November 22, 2023, 5:48pm

I’m a little worried that expanding the Rationale section to be complete in this respect would make it unreadable. Is there precedent for having an appendix in a PEP for this sort of thing, so we can write up each use-case thoroughly?

I can’t recall seeing a PEP with an appendix, but maybe I’m just forgetting and it happens all the time.

pf_moore · November 22, 2023, 5:54pm

It’s certainly possible to write filenames portably (as you say, /-separated files should do it as long as you avoid Windows reserved names/characters). So anyone who wants to, can. All I’m saying is don’t mandate it.

pf_moore · November 22, 2023, 5:56pm

I’m happy for such use case descriptions to be posted here. I think you’re right that adding them to the PEP would be too much, but links to standalone posts here (or something like that) would be useful.

hugovk · November 22, 2023, 5:59pm

Yes, some PEPs have appendices: https://github.com/search?q=repo%3Apython%2Fpeps%20appendix&type=code

jeanas · November 22, 2023, 6:10pm

TBH, I have the opposite opinion. I much prefer having to read one structured and coherent document with all the details, even if long, over lots of Discourse posts that may overlap. For a precedent, PEP 668 had a relatively long section on identifying use cases.

ofek · November 22, 2023, 6:44pm

I think you’re referring to PEP 639 which defines licenses and I think was many many pages long before being trimmed down.

sirosen · November 22, 2023, 7:10pm

I’ve just done some more reading and tinkering with Gemfile and Gemspec, for some prior art from Ruby. I plan to put this into an appendix for prior art. Please read the below as a rough draft of a part of the PEP.

I’m not sure that I’ll include a full section on npm/NodeJS. There is no official support for multiple dependency groups but there is devDependencies as a separate section. There’s clear community interest out there for such a thing, and their core team seems receptive to the idea, but there’s still no official or mainstream implementation.

Ruby & Ruby Gems

Ruby projects may or may not be intended to produce packages (“gems”) in the ruby ecosystem. In fact, the expectation is that most users of the langauge do not want to produce gems and have no interest in producing their own packages. Many tutorials do not touch on how to produce packages, and the toolchain never requires user code to be packaged for supported use-cases.

Ruby splits requirement specification into two separate files.

Gemfile: a dedicated file which only supports requirement data in the form of dependency groups
<package>.gemspec: a dedicated file for declaring package (gem) metadata

The bundler tool, providing the bundle command, is the primary interface for using Gemfile data.

The gem tool is responsible for building gems from .gemspec data, via the gem build command.

Gemfiles & bundle

A Gemfile is a ruby file containing gem directives enclosed in any number of group declarations. gem directives may also be used outside of the group declaration, in which case they form an implicitly unnamed group of dependencies.

For example, the following Gemfile lists rails as a project dependency. All other dependencies are listed under groups:

source 'https://rubygems.org'

gem 'rails'

group :test do
  gem 'rspec'
end

group :lint do
  gem 'rubocop'
end

group :docs do
  gem 'kramdown'
  gem 'nokogiri'
end

If a user executes bundle install with these data, all groups are installed.
Users can deselect groups by creating or modifying a bundler config in .bundle/config, either manually or via the CLI. For example, bundle config set --local without 'lint:docs'.

It is not possible, with the above data, to exclude the top-level use of the 'rails' gem or to refer to that implicit grouping by name.

gemspec and packaged dependency data

A gemspec file is a ruby file containing a Gem::Specification instance declaration.

Only two fields in a Gem::Specification pertain to package dependency data.
These are add_development_dependency and add_runtime_dependency.
A Gem::Specification object also provides methods for adding dependencies dynamically, including add_dependency (which adds a runtime dependency).

Here is a variant of the current rails.gemspec file at time of writing^[1], with many fields removed or shortened to simplify:

version = '7.1.2'

Gem::Specification.new do |s|
  s.platform    = Gem::Platform::RUBY
  s.name        = "rails"
  s.version     = version
  s.summary     = "Full-stack web application framework."

  s.license = "MIT"
  s.author   = "David Heinemeier Hansson"

  s.files = ["README.md", "MIT-LICENSE"]

  # shortened from the real 'rails' project
  s.add_dependency "activesupport", version
  s.add_dependency "activerecord",  version
  s.add_dependency "actionmailer",  version
  s.add_dependency "activestorage", version
  s.add_dependency "railties",      version
end

Note that there is no use of add_development_dependency. Some other mainstream, major packages (e.g. rubocop) do not use development dependencies in their gems.

Other projects do use this feature. For example, kramdown ^[2] does make use of development dependencies, containing the following specification in its Rakefile:

      s.add_dependency "rexml"
      s.add_development_dependency 'minitest', '~> 5.0'
      s.add_development_dependency 'rouge', '~> 3.0', '>= 3.26.0'
      s.add_development_dependency 'stringex', '~> 1.5.1'

The purpose of development dependencies is only to declare an implicit group, as part of the .gemspec, which can then be used by bundler.
See details on the gemspec directive in Gemfiles: Bundler: gemfile

The integration between .gemspec development dependencies and Gemfile/bundle usage is best understood via an example.

gemspec development dependency example

Consider the following simple project in the form of a Gemfile and .gemspec.
The cool-gem.gemspec file:

Gem::Specification.new do |s|
  s.author = 'Stephen Rosen'
  s.name = 'cool-gem'
  s.version = '0.0.1'
  s.summary = 'A very cool gem that does cool stuff'
  s.license = 'MIT'

  s.files = []

  s.add_dependency 'rails'
  s.add_development_dependency 'kramdown'
end

and the Gemfile:

source 'https://rubygems.org'

gemspec

The gemspec directive in Gemfile declares a dependency on the local package, cool-gem, defined in the locally available cool-gem.gemspec file.
It also implicitly adds all development dependencies to a dependency group named development.

Therefore, in this case, the gemspec directive is equivalent to the following Gemfile content:

gem 'cool-gem', :path => '.'

group :development do
  gem 'kramdown'
end

Lessons from the Ruby Model for Python Dependency Groups

??? TODO ???

(I haven’t really drawn conclusions yet, but surely we will be able to draw some?)

If I include this in the PEP, are there copyright implications or other issues of ownership to concern us? rails is MIT licensed. ↩︎
Also MIT licensed. ↩︎

brettcannon · November 23, 2023, 1:15am

Because it’s an “if”, not something that always happens.

No, I would say we expand the definitions if needed. My point wasn’t to throw out standards, just that having the standard specify how data is to be interpreted in regards to other data in the file could be considered constraining on tools.

I feel like phrasing it that way is a bit too negative. I think what you’re trying to say is, “anything new should at least subsume one of the other approaches”. This phrasing triggers things like xkcd: Standards which I find disheartening when we’re trying to improve the situation.

That’s honestly your call as the PEP author as to whether you need to care.

My takeaway is Gemfiles are just like project.dependencies and project.optional-dependencies, but the install tools give flexibility on how to process the relationships of the groups (including the anonymous/default one). Gemspecs are a lot like package.json from Node where a dependency is either a runtime dependency or a dev dependency and that’s it; just two buckets.

sirosen · November 23, 2023, 2:11am

I think your interpretation of the Ruby situation is correct, but I’d also point out that Gemfile dependency groups are never published.
I don’t really know what it all “means” for us, other than that I’m a little envious of the new developer experience in Ruby with Gemfiles.
I’d like to get a similarly smooth experience for new users listing dependencies in pyproject.toml .

I take the local relative path issue pretty seriously. PDM is telling us directly that this is an important use-case. If the spec can cover this case, it goes a long way towards removing another tool specific behavior for PDM.
The other non-standard thing I see, looking at PDM docs, is the ability to specify editable installs vs non-editable.

I’m strongly considering the merits of bringing back an object format for these data. But I don’t at all regret removing the one I had included.

My current thought – which is too vague to write up in great detail – is to have an object spec with fields like “include” for including other dependency groups, “editable” for controlling editable installs, and “path” for the path to a local repo or built artifact. And some rules or schema for what combinations of those are valid.

I think I need to add sections on the Ruby tools and PDM and Poetry. Not considering the features of these tools carefully would be a mistake, IMO.

jamestwebber · November 23, 2023, 3:02am

I know you said you are planning on it, but the spec for Cargo.toml is also worth exploring in depth. There are a lot of wrinkles there–maybe too much for this PEP, but a good argument to allow for future flexibility.

It allows for relative paths^[1] and VCS repositories, as well as extras (called features) and dev and build dependencies (separate named tables, which are not extras). There’s also a way to specify an alternative registry. One difference is that there is always a version specified for every dependency.

I think they just use posix paths and cargo converts them? ↩︎

ofek · November 23, 2023, 5:18am

I would definitely echo what James says here and check out Cargo. The ability to define relative paths is not at all what you want but rather you want the concept of workspaces, trust me

And workspaces is an entirely different behemoth… I plan to implement this in the spring in Hatch and we’ll see how folks like it then hopefully we can standardize.

kknechtel · November 23, 2023, 8:58am

With respect, I feel like this mindset prevents us from ever actually improving anything.

Large improvements get ruled out because they require a featureful system that is deemed too complex.

Smaller improvements get ruled out because every change supposedly incurs a cost that isn’t outweighed by the improvement. Either the old ways are deprecated and people complain about a too-fast deprecation cycle (in 2023 that might sound like “we just migrated to pyproject.toml and you’re changing it again?”; later down the road people will still make excuses - I know this because Stack Overflow received questions tagged python-2.5 this year); or they aren’t and people complain about having too many ways to do something.

The existence of %-style formatting, or string.Template, didn’t prevent the development of str.format, or the refinement offered by f-strings. The benefits there are small enough that, for example, nobody is champing at the bit to retrofit logging to a .format-based interface (and, again, risk annoying users with very old code); yet they were added.

I would love to get some clarity on this point. Are the words “requirement” and “dependency”, as used in PyPA documentation, data specifications etc., intended to be anything other than synonyms? If so, what is the intended distinction? If not, why?

pf_moore · November 23, 2023, 9:11am

A couple of people have made similar comments. I apologise, my post was a lot more negative than I intended. My intention was not so much to argue against the proposal, but rather to simply note things that might affect the design.

For the record I strongly support making incremental improvements like this one.

frostming · November 23, 2023, 11:07am

In a sense, it is indeed true that supporting Workspaces on top of PEP 508 now is almost impossible.
We need a standard for more flexible definitions of dependency specifications, instead of starting to invent new for every tool.

sinoroc · November 23, 2023, 6:34pm

I read things like “editable”, “relative”, and “VCS” dependencies. I wonder if there is a risk to fall again in the trap of abstract vs. concrete dependencies. If I am contributing to a project I do not necessarily want the project to dictate me how I should install dependencies in my own dev environment. Maybe I do not like editable dependencies, maybe I want to lay out my dependencies differently on my file system, maybe I have my own forks of dependencies using a different VCS system.

I am not against having standard notations for those. But maybe these do not belong in pyproject.toml, but rather in some file that is local to my own dev environment only (see “overrides”). Or maybe this should be reserved for monorepos or something like that.

Apologies if I misunderstood things and there is no need to worry about this.

brettcannon · November 23, 2023, 6:48pm

I agree and view it as a prerequisite of whatever we come up with.

PEP 621 was originally going to have a more object-oriented structure for defining dependencies as pushed for by Poetry, but in the name of ease of transitioning we decided not to go that far. But I don’t think that doesn’t mean we can’t expand beyond the default PEP 508 string representation and add on an object representation for when more control is warranted.

jamestwebber · November 23, 2023, 6:51pm

Hmm, I can see both sides of this, though. Some projects might say “bring-your-own dev environment!” and others could reasonably say “contributors should develop this way, it’s a pain to deal with anything else”. One example is projects that vendor dependencies and/or use submodules.

Maybe you want to do things differently and that’s fine most of the time, but if you want to contribute to Project X you should do things their way. I think that should be up to them, not you.

brettcannon · November 23, 2023, 7:41pm

The quick summary for specifying dependencies in Cargo:

Each dependency is listed as a key in a [dependencies] table
Simple use case is specifying the version requirement as a string, e.g. requests = "2.31.0" (as James points out, you must specify a version, although "*" is allowed as a wildcard but rarely used thanks to Rust being a compiled language and being able to scope dependencies per project)
You specify the registry to use in Cargo.toml directly if it isn’t crates.io
You can specify a git repository via a git key; regex = { git = "https://github.com/rust-lang/regex.git" }
You can specify by path; hello_utils = { path = "hello_utils" }
Specifying version always refers to the version from a registry, thus if it is specified on a table with a git or path key then it represents the version of that dependency to get from the registry so that any code on a registry can get access to all of its dependencies while developing locally lets you specify by path or repo
They support the equivalent of platform markers
They have development dependencies via [dev-dependencies]; these do not need to exist on a registry if you upload your project
Rust lets you customize builds via a build.rs file, and [build-dependencies] is like our build-system.requires
Their version of extras is features, but a bit more powerful, e.g. you can specify default features as a project, and then your consumers can choose to ignore those default features if they want while still specifying specific features
There’s a whole other set of abilities around overriding dependencies via patching
And then there are workspaces which are sort of like a monorepo set up for your various projects that are somehow related to each other (this ties back into the whole specifying a path and a version thing as it lets you automatically pull your local version of one of your dependencies while developing but still push to a registry and have it use the published version of that same dependency)

I think the key thing here is that Cargo.toml has a somewhat similar experience we have with a simple default of name/version experience like we have with dependency specifier strings in an array, but they also allow for a table representation when the defaults don’t work. The Poetry folks pushed for a table format for PEP 621, but the decision at the time was that it would be easier to migrate to [project] if we stuck with dependency specifier strings. But there’s also nothing really saying we couldn’t support both approaches.

This is where Stephen’s point about “considering the merits of bringing back a table format for these data” potentially makes sense. Taking project.dependencies as an example, you could keep what we currently have as the simple approach (especially for beginners who just want to say dependencies = ["requests>=2.31.0"]. But for more control you can move to a [project.dependencies] table where you can get more specific like Cargo.toml with each key representing a distribution and the value being a table. That table would have potential keys like:

version for a version specifier string; if we thought this would be the common case of just wanting this, specifying just the version specifier string as the value to the project name key could be allowed, although we may want to consider some grammar changes to allow for things like * for any version or inferring == when just a number is specified like 0.23.*
markers for an environment markers string (or we break out all the possible environment markers into their own key like python_version = ">=3.8"
path if you’re pointing at a directory (or more specific this could be broken down into pyproject, sdist, and wheel)
git, etc. for any VCS that’s supported
extras for an array of desired extras

Effectively we “explode” dependency specifiers into a TOML table format which allows for fine-grained control when that’s needed (as well as potentially providing a path towards freeing ourselves a bit from dependency specifiers and their domain-specific language being hard to expand/change).

As an example:

[project.dependences]
requests = {version = ">=2.31.0", python_version = "<3.10"}
trio = ">=0.23.1"

And for project.optional-dependencies, you do something similar: make it a table where each sub-table is a dependency group:

There’s a specifiers/dependencies/deps sub-table that matches what project.dependencies allows for (naming this is a bit annoying due to “dependencies” already being in the name)
dev flags whether the dependency group is a “public” extra or a “private” thing for development (no comment on the default value)
standalone could flag whether project.dependencies should be implicitly included or not (no comment on the default value)
group_deps array of other dependency groups to consider as dependencies

It would look something like:

[project.optional-dependencies.test]
dev = true
standalone = false

[project.optional-dependencies.test.specifiers]
pytest = {version = ">=6.0.0"}
pytest-xdist = ">3.5.0"

So that is what I think things would look like it we took inspiration from Cargo.toml while keeping what we already support around.