Extending PEP 508: support dependencies in environment markers

I would like to propose extending PEP 508 to also allow for dependencies to appear in environment markers.

An example syntax in pyproject.toml might look like:

dependencies = [
    "a",
    "b",
    "a<2;b<1",
]

In this example, the package depends on both A and B. However, if B is at version < 1, A must be at version < 2. This would allow package authors to correct dependency issues in historical releases or in libraries that they do not control.

Another example might be:

dependencies = [
    "a",
    "a[b];a>1",
]

In this example, package A has an extra named B, but this extra was not added until version > 1. All versions of A work, but if A is version > 1, also install extra B. This would allow support for required dependencies that were moved to optional extras.

1 Like

I don’t think I understand how this comes up in the first place.

It seems like this is trying to get around broken dependencies in other packages? If there’s some version relationship between A and B it should be expressed in those packages and the installer solves the problem on its own. If this relationship is being induced by this package somehow…it shouldn’t be doing that?

In either case, I don’t think it should become easier to do this.

1 Like

The first example situation is trying to get around broken dependencies in other packages. Yes, this should be fixed in package A or B. However, these packages might be unmaintained, and even if this is fixed, it won’t be fixed for older releases.

The second example situation has nothing to do with broken dependencies. For newer version of A, you require the extra. But for older versions of A, the extra did not exist. I’m unaware of any way to encode this without access to the dependency version in an environment marker.

Do you have a concrete example of this?

Seems like it should be fine to list only a[b] > 1, even though it means that we are excluding a <= 1 for no good reason.

But why not just say A >= new_version? What’s the benefit to allowing the older version?

It feels like allowing this enables convoluted, hard-to-resolve environments when the better solution is to be a little more strict with dependencies (or fix/vendor the broken ones)

Yes, that’s currently the only workaround.

Allowing a larger range of versions makes it easier to resolve complex environments containing many packages.

For a concrete example, my library (torchgeo) depends on torchvision. Torchvision 0.17.1+ adds an optional dependency on gdown. However, this dependency was not documented (I’m currently trying to add an extra for it). So unless I want my library to only support unreleased versions of torchvision, I need some way to say:

dependencies = [
    "torchvision",
    "gdown>=4.7.3;torchvision>=0.17.1",
]

or:

dependencies = [
    "torchvision",
    "torchvision[gdown];torchvision>=0.19"
]

Note that in the second option, torchvision 0.17.1–0.18 would remain forever broken, and 0.19 has not yet been released. Yes, I could just always depend on gdown regardless of whether or not torchvision uses or needs it, but I’m trying to find a way to describe when this dependency is actually needed.

1 Like

Or you could have torchvision < 0.17.1, and release an update when your PR is in the next release?

Is this a problem you’ve encountered a lot before, or is it just this one time? This feels like swatting a fly with a sledgehammer to me. You don’t need to modify the requirements format for everyone, you need to wait a month for the next release of torchvision. Even if everyone agreed to this today, it wouldn’t be available for use for far longer.

It would be difficult to pin to older versions of torchvision for now. The deep learning field moves very quickly, and users will likely want support for newer features.

It would also be difficult to drop support for almost all versions of torchvision immediately. CPython is maintained for 5 years, and SPEC 0 dictates 3 years of support for scientific software.

Both example situations I described are extremely common in the Python ecosystem. For example, once numpy 2 is released, the vast majority of packages that depend on numpy will have broken dependency constraints for older versions. Should I require numpy 2 in my package? What if another package doesn’t yet work with numpy 2?

Another example I didn’t mention is when an extra is renamed or removed.

Only supporting a single version of a package is unfortunately not a solution, and only makes resolving a complex environment more difficult (see Should I be pinning my dependencies? - #10 by sonotley and Should You Use Upper Bound Version Constraints? -).

Other package managers have no trouble supporting this. In Spack, this would look like:

depends_on("py-gdown", when="^py-torchvision@0.17.1:")

I wouldn’t describe this feature as a sledgehammer, but as a powerful tool. Once dependencies can refer to each other, we can more accurately describe the intricate interplay between dependency constraints.

1 Like

If it’s a powerful tool, it should be applicable in a fairly large range of circumstances (at least, my definition of a “powerful tool” tends to involve being a general idea with wide applicability - you may have a different view). Assuming that’s the case, it would be great to see examples of the various ways this could be helpful. Note that:

  1. Fixing bad project dependencies is, like it or not, not a compelling use case. Bad dependencies are a well known issue in the packaging ecosystem, and the general consensus is that it should be fixed by fixing the packages, not by providing ways for users to hack around the problem. Maybe that’s not the right long term solution, but you’d need to chacge the community view first.
  2. Packaging standards change slowly. You won’t get far presenting an immediate problem and proposing a standards change to address it. You’ll need to present the underlying long-term fundamental issue that is causing this (and presumably future) problem - and explain why a standards change is a better solution than addressing the issue directly.
  3. You’ll need to look at the issues this new standard might cause. Are dependency resolution algorithms able to handle this sort of situation, where dependencies refer to each other? Is it even possible to solve such systems in an efficient manner, in the general case? (Even our current dependency systems, where the dependency graph gets new nodes added as resolution proceeds, is too complex for a number of traditional algorithms, like SAT solvers).

Just to point it out, this can be fixed with metapackages. The idea is: you define a package that pins the oldest permissible version of A as a dependency, and requires any corresponding compatible versions of B, and does nothing else; and you call this the first release of C (either inventing a new version scheme, or perhaps just copying the A version numbers). Then iterate with new C versions for each A version. Now other packages that need A and B, can just require C.

A mechanism like this could be used, e.g., as a way to solve the usual GPU problem with selector meta-packages:

[project]
name = "mypackage"
dependencies = [
    "mypackage-cuda; select-gpu-cuda>0.0.0",
    "mypackage-rocm; select-gpu-rocm>0.0.0",
]
% pip install select-gpu-cuda  # I have a CUDA device
% pip install mypackage otherpackage # will install packages with CUDA support
2 Likes

This seems like a heavy-handed approach for something that should be done at the level of the installation. I think the proper solution is an installer constraints/overrides file.

2 Likes

If it isn’t possible to encode proper dependency constraints directly in pyproject.toml, I don’t think an additional constraints file would be a valid solution to our thousands of users.

1 Like