Pyproject.toml optional dependencies redundancy aka DRY extras

In the current PEP 621 state, there is no way to specify an optional dependency that occurs in several groups, without repeating yourself.

Let’s consider an imaginary package, called example. This example package is having 3 sets of optional dependencies, specified in pyproject.toml file as follows:

[project.optional-dependencies]
test = [
  "pytest < 5.0.0",
  "pytest-cov[all]"
]
lint = [
  "black",
  "flake8"
]
ci = [
  "pytest < 5.0.0",
  "pytest-cov[all]",
  "black",
  "flake8"
]

As you can see, there are some dependencies repeated between the groups, which violates the DRY principle and it cannot be resolved on the project level easily. The only current solution to this problem is to ask the user to specify multiple extras when installing the package (to assemble all the dependencies he needs), which is not optimal.

Poetry solves this problem partially, by introducing an optional argument to the list of the main project dependencies that marks this dependency as one not to be installed by default, but only when used in optional dependencies (called extras in poetry configuration). Such dependency, when originally declared in the main list of dependencies, will be mentioned only by its name inside the extras and all constraints for this dependency will be copied over from the main list under the hood. This way, constraints for any optional dependency needs to be specified only once.

This solution solves the issue only partially, as the name of the dependency still has to be mentioned in each set of optional dependencies requiring it (which is a more obvious problem when having a bigger list of dependencies shared between optional sets). It also introduces another problem to handle: how to treat a dependency marked as optional, if it is not being used in any of the optional dependencies set?

My proposal to fix the solution would be to allow the inheritance of the optional dependencies, by introducing some additional dependency syntax, which will be valid only in this single context. For an example, I’m assuming an entry starting with > character followed by a name of another optional dependencies set should be used. Given that, here is above example again, but with the use of my proposal:

[project.optional-dependencies]
test = [
  "pytest < 5.0.0",
  "pytest-cov[all]"
]
lint = [
  "black",
  "flake8"
]
ci = [
  ">test",
  ">lint"
]

Additionally, there may be an option to declare an optional dependencies set that can only be used in another sets. For example a name starting with an underscore (_) will be treated as one used only in other sets and will not end up in the parsed requirements on its own. The choice of the underscore is very random and a delimiter that is currently not allowed at the beginning of the optional dependencies group name should be used instead (I haven’t checked if underscore is currently allowed).

1 Like

Does depending on yourself work?

In package beaglevote

[project.optional-dependencies]
test = [
  "pytest < 5.0.0",
  "pytest-cov[all]"
]
lint = [
  "black",
  "flake8"
]
ci = [
  "beaglevote[test]",
  "beaglevote[lint]"
]
1 Like

Why do you feel that isn’t optimal? You have separate the concerns of what you’re installing, so if you want testing and linting, why is it sub-optimal to install those things specifically versus a “CI”/“dev” roll-up?

What happens if you accidentally have a name clash with a package as well? The extra names win? Expect a warning?

And just an FYI, flit install has a --dep option which can take develop which installs the test, doc, and dev extras.

This looks like something that will cause a circular dependency. Not sure how each tool will handle that when encountered, but it’s undocumented at best. But seems like a valid proposal too (instead of adding additional special syntax from my proposal, this can be a valid solution as well, if it will end up in some specification as a documented and official thing).

Maybe not in this exact scenario, as this is pointed towards the specific use case in the CI process, but in a very complicated package for which you will need a single base of additional requirements for several sets of features to be available, you will need to educate the end user to install a “base” for each of the extras he needs. A good example may be some file processing library that requires a separate library for recognizing video files and specific libraries for each of the video codecs you want to support. You’ll need to educate users that they need to specify two separate extras, where one of them is common for each video format. Additionally, this common extras may be useless on its own.

This is why I’m proposing an extension to the syntax described in the PEP 508 which is considered invalid currently, so it won’t cause any name clash. As an alternative, syntax described above by dholth may be used, which is a valid PEP 508 expression and shouldn’t cause any name clash either.

In the past it has always worked to depend on your own extras, and it should continue to work. You can think of them as lists of dependencies aliased by the string packagename[extra-name]. Since packagename is already in the set of dependencies to be resolved, mentioning it in its own dependencies doesn’t cause a cycle.

If I was inventing an alias for packagename in this arrangement I would use ".[extra-name, extra2-name]"

Circular dependency is a feature that Python packaging is explicitly designed to allow, so it works and should continue to work. But I agree with Brett, duplication is not undesired in your example. There are cases where de-duplication is desired (as you mentioned), but in all situations I’ve encountered, it is a sign a project should be broken into smaller parts (so each extra becomes basically a separate package) instead of being a monolith that each extra set only uses a subset of the code. So I’m inclined to say this could be a good idea, but only if there is a concrete real-world example that makes it required instead of wanted.

4 Likes

Good to know that self-referencing is something that should already work in most of the cases. I think it’s worth explicitly documenting it, so package maintainers will know they can just use it.

My idea comes mostly from the Poetry capabilities in that regard. I don’t have any real-world example where de-duplication will be a must have. I’m also raising this argument in case Poetry decides to implement PEP 621, so there will be an option to replace the existing feature with something more or less equivalent.

I’m a huge fan of “combined” dependencies, and I’ve looked for them with "[x]" and ".[x]" syntax; it’s the reason almost all my pure-python repositories have a setup.py - I am manually combining dependencies there. Look at almost any repo in Scikit-HEP (regardless of whether I worked on it, see Packaging - Scikit-HEP), or at cibuildwheel/setup.py at 31770e9ac7161f7df417e46f712cc3368e0a6aa6 · pypa/cibuildwheel · GitHub to see an example. If I can replace it with <packagename>[extra], this will remove the need for setup.py’s in a lot of places for me!

The main reason for having “combined” dependencies are adding common terms, like “dev” - which is “all the development dependencies”, and “all”, which is all the extras. The reason this is important: there is no way to do discovery from PyPI via pip. So if I want to setup a dev environment for a package, I have to go look in setup.py, setup.cfg, or pyproject.toml and find what dependencies are defined. It might be something like [test,plot,docs]. Having a “dev” is really handy if most packages (like most scikit-hep packages) all provide it.

Aliases could also be useful, if you rename an extra but need to provide the old name for compatibility, etc.

A shortcut syntax would still be nice, having some way to see what extras are available would be nice, and having a way to install only extras would be great. But I’m very happy if even just this packagename[extra] works :slight_smile:

1 Like

I tried this and it does not work fully. It does not pull the extra from the current package, but rather from the existing cached packages. So if you add a new extra (bin in my case) and then depend on it, it starts spewing out a long list of package checking:

Collecting cibuildwheel[bin]
  Using cached cibuildwheel-1.11.1-py3-none-any.whl (1.5 MB)
WARNING: cibuildwheel 1.11.1 does not provide the extra 'bin'
  Using cached cibuildwheel-1.11.0-py3-none-any.whl (1.5 MB)
WARNING: cibuildwheel 1.11.0 does not provide the extra 'bin'
  Using cached cibuildwheel-1.10.0-py3-none-any.whl (1.5 MB)
WARNING: cibuildwheel 1.10.0 does not provide the extra 'bin'

And so on, not resolving, but spewing things like this forever. If you’ve already published a version with this extra, then it will “work”, but it’s not actually grabbing the current package’s requirements, but the last package requirements, so this isn’t usable.

This has probably got something to do with how pip merges <local path> with cibuildwheel (or not) rather than caching. If you request .[all] then pip doesn’t know <local path> == "cibuildwheel". If you build a wheel and install that, then it’ll probably work.

I can’t find a pip issue tracking what you discovered here, so I filed one: Pip doesn’t allow you to self-depend · Issue #10393 · pypa/pip · GitHub

If anyone here filed one before, I’ll close mine.