Proposal: Intersect and Disjoint Operations for Python Version Specifiers

This is loosely a follow up to “Handling of Pre-Releases when backtracking” with a proposal that helps (but doesn’t solve) that situation, and in general makes it easier for Python package resolvers to agree on handling of multiple specifiers.

Prior Art

See dep-logic created by @frostming, and a discussion on the packaging github (that includes @pf_moore and @dstufft talking about some of the difficulties). As well as internal implementations in poetry-core, uv, and probably others I’m not aware of.

Goal

The goal of this proposal is to extend the existing specification (without contradicting it) by defining that for two version specifiers S1 and S2, there is an agreed intersection specifier S3.

If no versions satisfy the intersection of S1 and S2, they are considered “disjoint”.

This extension can be used during resolution to determine disjointness (and reject incompatible specifiers) or to simplify many specifiers into a single specifier.

Non-Goal

This is not intended to define what a Python resolver should do with the information. For example a Python resolver may choose to apply this at every resolution “round” to determine if two requirements are disjoint, but they may also choose to only apply it on the initial user requirements, etc.

Current Problems

Currently there are certain version specifiers that produce an “obvious” outcome when you take their union intersection, e.g. <=1.0.0 & <=2.0.0 = <=1.0.0.

However the main difficulty is the moment you start including pre-releases, for example starting with a fairly non-controversial example <=1.0.0a1 & <=2.0.0 is intuitively <=1.0.0a1. But if we were to look at the set of versions from each specifier, V1 from <=1.0.0a1 and V2 from <=2.0.0, and took the intersection V1 & V2 then the specifier that produces this new version set is <1.0.0.

Assuming the intuitive example is adopted <=1.0.0a1 and don’t worry about the set of versions, then you have lots of other examples that are tricky:

  • Is <=1.0.0a1 & <=2.0.0 the same as <=2.0.0 & <=1.0.0a1?
  • Is <=1.0.0 & <=2.0.0a1 equal to <=1.0.0 or <=1.0.0,prereleases=True?

Proposal

I propose extending the specification where specifiers can be taken together for their “intersection” and a test for if they are “disjoint”. This would allow you to “simplify” a specifier set by taking the intersection of all of them to reduce to a single specifier.

To guide all of the answers to the tricky questions above I propose it is made clear that for 2 versions specifiers S1 and S2, then S1 & S2 = S2 & S1 (i.e specifiers under intersection are commutative) and that if S1 or S2 imply or explicitly include prereleases then S1 & S2 must include prereleases.

This would need to be expanded with lots of examples, but I think it is sufficient to answer all the questions about pre-releases. And then for when you don’t have questions about pre-rereleases S1 & S2 is the specifier that produces the version set V1 & V2, where V1 is the version set from S1, and V2 is the version set from V2. Again, a full spec would need to include lots of examples.

Open Question

If S1 and S2 are disjoint, should the result of S1 & S2 resolve to a standardized representation (e.g., EmptySpecifier as dep-logic does) or be None? Or should it be left to implementations to decide?

Next Steps

If there is consensus on this being a standard. I would be happy to draft a PEP, but I’m not familiar with that process. I assume I would need to find a sponsor?

Get through the initial discussion first, and then write the PEP. You don’t need the sponsor until you (a) present it for approval, or (b) need to commit it to the peps repo. Neither of those is required to post drafts here (or on your own repo/fork of the peps repo) and discuss them.

2 Likes

What Steve said, but I think the key thing is that in addition to what you said in your post, you need to specify what your proposed semantics would be.

There’s no point opening the floor for discussion with nothing to start from - you’ll either get a deafening silence (most likely) or a huge, rambling debate that reaches no useful conclusions. Whereas if you say “we should have an intersect operation that does this”, you’ll either get generalised approval[1] or you’ll get some specific feedback on edge cases that you either didn’t cover or didn’t think through well enough. Either of which is a win.


  1. It’s a pretty niche topic, I suspect few people will be that bothered :slightly_smiling_face: ↩︎

2 Likes

I think the key thing is that in addition to what you said in your post, you need to specify what your proposed semantics would be.

Thanks for the feedback, I think this is ‘just’ expanding what I wrote in my proposal to concrete semantics, I think everything I wrote in the proposal section will logically fill out the full semantics. I’ll start working on a pre-PEP.

There’s no point opening the floor for discussion with nothing to start from - you’ll either get a deafening silence (most likely) or a huge, rambling debate that reaches no useful conclusions

I’m honestly interested if there is enough of a different viewpoint here to start a debate. While logically you can develop a completely different notion of an intersection, I think that all real world resolvers take this same viewpoint as I’ve outlined in my proposal.

Whereas if you say “we should have an intersect operation that does this”, you’ll either get generalised approval

Well, that is what my proposal section does… Perhaps I’ve written it in the wrong voice.

It’s a pretty niche topic, I suspect few people will be that bothered

Well, it’s fairly important to anyone who wants to write a Python package resolver and keep it consistent with other resolvers, but admittedly that’s probably only a handful of people.

Maybe. I think I was looking for something that looked more like code, and missed that it was actually more precise than I realised. My bad. Having said that, I think you should at least work through the problem cases you mentioned in the “Current problems” section, and probably add a number of other examples just to demonstrate how it would work in practice.

You could also do with stating explicitly whether the 3 implementations you mention follow the spec you’re proposing, or would they need changes to conform?

On the open question, I think you have to formally introduce the concept of a specifier that matches nothing at all. It doesn’t need a string representation, as no-one would ever need to type it into a requirement, but it does need to exist. And it should be what S₁ & S₂ returns if S₁ and S₂ are disjoint - we’d want the & operator to be type safe. How implementations represent such an “empty” specifier can be left for them to decide.

1 Like

This is very helpful feedback thanks

I think I was looking for something that looked more like code,

I purposefully avoided code because increasingly implementations are less likely to be written in Python, but it would make sense as an example to give people a concrete idea of what I’m talking about, if not as the actual spec.

1 Like

Here’s what uv is doing for preleases:

In uv, we parse the version specifiers with pep440_rs and then translate them to Ranges, a type of segments of inclusive/exclusive/unbounded-ended intervals that we can all the usual set operations to. Ranges have a single canonical form that every transformation preserves.

In uv, we use a really simple heuristic for prereleases (Resolution | uv). One of the following conditions need to be met:

  1. If the package is a direct dependency, and its version specifiers include a pre-release specifier (e.g., flask>=2.0.0rc1).
  2. If all published versions of a package are pre-releases.

There are also CLI flags as overrides. Despite this leaving out a lot of the heuristics we could be doing, we have gotten very few user reports about this.

We record if either of the above condition is met, but we don’t do any transformations on versions based on it. We report to the user if there a conflict while a prerelease version was not used, so they can add a prerelease specifier to the direct dep or set --prerelease allow if they want to.

For the rest, I need to explain pubgrub decision making (badly, see PubGrub: Next-Generation Version Solving | by Natalie Weizenbaum | Medium and https://pubgrub-rs-guide.pages.dev/ for the real thing): In pubgrub, you select a package you want to make try picking a version next. You get a range that’s the obligations from the previous decisions minus previously discarded versions, because we don’t want to try things we already know don’t work. If a version is incompatible, pubgrub asks you to pick another version, until you say there is no version, at which point it backtracks. (This loop goes until all packages have compatible decisions or know there can be no solution.) In the version selection step, we usually pick the highest, not yet tried version, and then go down until we’re all out of compatible versions.

Looking at the highest version in range, this can either be a prerelease or a stable release. If neither of the above two conditions is met and no CLI override is given, we skip over the prereleases. We only report stable releases to pubgrub, and we say we’re out of versions even if there would be prereleases in the range. We have some extra logic around simplifying ranges because pubgrub doesn’t know if there is another version between two version (having tried 1.2.4 and 1.2.3, can a 1.2.3.1 exist?), but the real prerelease handling happens only in version selection. This way, we avoid dealing with prereleases in version set operations.

1 Like

Thanks for the detailed explanation of what uv does.

I think though uv has made some design choices that don’t work quite the same as other resolvers, for example if the user requires A==1.0.0, and A-1.0.0 requires B >= 2.0.0a0, and the set of versions of B available are {1.0.0, 2.0.0a1} my understanding is uv will report this as an impossible resolution?

I’m pretty sure this isn’t compliant with the standard. Specifically,

Pre-releases of any kind, including developmental releases, are implicitly excluded from all version specifiers, unless they are already present on the system, explicitly requested by the user, or if the only available version that satisfies the version specifier is a pre-release.

Note that the final possibility is if the only available version *that satisfies the version specifier" is a pre-release. So if foo 1.0 and foo 2.0a1 exist, the standard says that foo>1.0 should pick foo 2.0a1. Your rules don’t do that.

Maybe what you do is “good enough”, but if you’re not following the standards, you need to either say so, or propose an amendment to the standards based on your real-world experience.

Note that the final possibility is if the only available version *that satisfies the version specifier" is a pre-release. So if foo 1.0 and foo 2.0a1 exist, the standard says that foo>1.0 should pick foo 2.0a1. Your rules don’t do that.

This was discussed at length in previous discussions on discuss, and pip and uv github issues trackers. There are multiple ways you can read the different prerelease lines in the specification.

uv matches packaging here:

>>> from packaging.specifiers import SpecifierSet
>>> list(SpecifierSet('>1.0').filter(["1.0", "2.0a1"]))
[]

Edit: I realized I might be using the wrong interface and Specifier (rather than SpecifierSet) does behave how this line is interpreted:

>>> list(Specifier('>1.0').filter(["1.0", "2.0a1"]))
['2.0a1']

Which is interesting, because pip definetly doesn’t, I am going to do some further investigation here.

Do you have references? And regardless, someone needs to propose an update to the spec so that the interpretation I just described is clearly excluded. I’ll do it myself if there’s a clear consensus.

Apologies if it turns out that I was involved in those discussions. I genuinely don’t recall anything specific about this.

That would be the obvious intersection, not a union.

Are you also interested in unions? If not, why not?

But if we were to look at the set of versions from each specifier, V1 from <=1.0.0a1 and V2 from <=2.0.0, and took the intersection V1 & V2 then the specifier that produces this new version set is <1.0.0

I do not think this is correct. Is your point intended to be that <=2.0.0 excluded pre-releases? But that is not what the specs say, inclusive comparisons do not exclude pre-releases.

And even <2.0.0 only excludes pre-releases of the specified version. So the intuitive <=1.0.0a1 is the correct intersection all along, no?

I think the correct values of unions and intersections are already implied by existing specifications, though - whether I am right or wrong about your examples - we can agree that there is room for more clarity. ie we do not need a standard to present any new definitions: but lots of examples would be welcome.

There are multiple ways you can read the different prerelease lines in the specification.

well I am about to demonstrate that this is true: because I cannot see how excluding 2.0a1 from >1.0 is justified. I do not see a way to read the specification to say that; but if you do then you do!

Do you have references? And regardless, someone needs to propose an update to the spec so that the interpretation I just described is clearly excluded. I’ll do it myself if there’s a clear consensus.

https://github.com/pypa/pip/issues/12469
https://discuss.python.org/t/handling-of-pre-releases-when-backtracking/40505/6
https://github.com/prefix-dev/rip/issues/118
https://github.com/astral-sh/uv/issues/1641#issuecomment-1951393681

I’m not sure there is a clear consensus because Poetry implements it according to how you just interpreted the spec (and how I’ve argued it reads in the past but I got very tired of making that case).

That would be the obvious intersection, not a union.

Thanks, copy and paste typo, I’ve fixed it.

Are you also interested in unions? If not, why not?

No, I’m not interested because you can’t specify unions in SpecifierSets, only intersections, i.e. you can’t make a Python packaging requirement with a union (e.g. <1 or >3).

I do not think this is correct. Is your point intended to be that <=2.0.0 excluded pre-releases? But that is not what the specs say, inclusive comparisons do not exclude pre-releases.

The term inclusive / exclusive is only meant to refer to the fact when a specifier version itself is a prerelease does it also imply it is looking at prereleases, and the only exclusive operator is the “!=”:

https://github.com/pypa/packaging/issues/776#issuecomment-1900515985
https://github.com/pypa/packaging/issues/788
https://github.com/pypa/packaging/pull/794

And even <2.0.0 only excludes pre-releases of the specified version. So the intuitive <=1.0.0a1 is the correct intersection all along, no?

That is not how the spec is interpreted by packaging, poetry, or uv.

I think the correct values of unions and intersections are already implied by existing specifications

The spec very much does not say what you should do in the face of two specifiers, as discussed in https://discuss.python.org/t/handling-of-pre-releases-when-backtracking/40505

And the nuances that are already brought up by Paul in https://discuss.python.org/t/proposal-intersect-and-disjoint-operations-for-python-version-specifiers/71888/9 and myself in https://discuss.python.org/t/proposal-intersect-and-disjoint-operations-for-python-version-specifiers/71888/8 show that resolvers do not take the same approach when implicitly intersecting specifiers.

well I am about to demonstrate that this is true: because I cannot see how excluding 2.0a1 from >1.0 is justified. I do not see a way to read the specification to say that; but if you do then you do!

Because of the following line in the spec:

Pre-releases of any kind, including developmental releases, are implicitly excluded from all version specifiers

That is not how the spec is interpreted by packaging, poetry, or uv.

You are mistaken, try this:

from poetry.core.constraints.version.parser import parse_constraint

v1 = parse_constraint("<2.0.0")
v2 = parse_constraint("<=1.0.0a1")
print(v1.intersect(v2))

outcome: <=1.0.0a1

Pre-releases of any kind, including developmental releases, are implicitly excluded from all version specifiers

I think that relying on this is a fairly tortured read: though again I emphasise that I agree that the clarity could be improved. Clearly the spec does not require that all version specifiers exclude all pre-preleases. This section is more about the behaviour of installers interacting with available packages than it is about the intrinsic meaning of the version specifier.

You are mistaken, try this:

from poetry.core.constraints.version.parser import parse_constraint
v1 = parse_constraint(“<2.0.0”)
v2 = parse_constraint(“<=1.0.0a1”)
print(v1.intersect(v2))

outcome: <=1.0.0a1

Sorry, maybe I was misreading your post. The point I was making is that with the specifier “<2.0.0” you still won’t get the version “1.0.0a1” if there is a non-prerelease available even if it is smaller than “1.0.0a1” e.g. “0.9”.

You are correct that Poetry takes this same intuitive approach to intersecting specifiers, and I would like to formalize this intuitive approach.

Pre-releases of any kind, including developmental releases, are implicitly excluded from all version specifiers

I think that relying on this is a fairly tortured read: though again I emphasise that I agree that the clarity could be improved. Clearly the spec does not require that all version specifiers exclude all pre-preleases. This section is more about the behaviour of installers interacting with available packages than it is about the intrinsic meaning of the version specifier.

You might be correct but the context I read it in is they are the first words in the subsection “Handling of pre-releases” in the section “Version specifiers” of the spec: https://packaging.python.org/en/latest/specifications/version-specifiers/#handling-of-pre-releases

And as I understand it this is what all implementations follow (I didn’t maintain any of these implementations though so please take it up with them or start a new thread if you don’t think they are following the spec).

I think that the way to make sense of the spec is to treat the version specifiers (and their intersections) as having some intrinsic meaning. Then, additionally, the handling of that meaning by a resolver - especially when it comes to pre-releases - has further rules.

eg again from poetry

from poetry.core.constraints.version.parser import parse_constraint
from poetry.core.constraints.version.version import Version


v1 = parse_constraint("<2.0.0")
ver = Version.parse("1.0.0a1")
print(v1.allows(ver))

outcome: True

but even so: if there are matching non-prereleases available then the pre-release should not be chosen.

My understanding is that uv is the outlier in its handling of pre-releases and is indeed not spec-compliant. I think they already know this, anyway I do not particularly care to raise an issue about it.

eg again from poetry

from poetry.core.constraints.version.parser import parse_constraint
from poetry.core.constraints.version.version import Version

v1 = parse_constraint(“<2.0.0”)
ver = Version.parse(“1.0.0a1”)
print(v1.allows(ver))

outcome: True

That wasn’t the example I gave, I said when a non pre-release is available, you’re likely running into this part of the spec:

accept remotely available pre-releases for version specifiers where there is no final or post release that satisfies the version specifier

It’s the same for packaging:

>>> from packaging.specifiers import Specifier
>>> list(Specifier('<=2.0.0').filter(["1.0a1"]))
['1.0a1']

But if you add a candidate that is not a prerelease it will not select “1.0a1”:

>>> list(Specifier('<=2.0.0').filter(["0.9", "1.0a1"]))
['0.9']

This has also been discussed several times, please at least read the thread I linked originally.

My understanding is that uv is the outlier in its handling of pre-releases and is indeed not spec-compliant. I think they already know this, anyway I do not particularly care to raise an issue about it.

This is off topic, if you have a concrete example of uv not following the spec please do raise this on the uv issue tracker, I’ll be happy to look there.

@pf_moore FYI, I made a correction in https://discuss.python.org/t/proposal-intersect-and-disjoint-operations-for-python-version-specifiers/71888/10

I’ve made a packaging issue: https://github.com/pypa/packaging/issues/856

Depending on the answer of packaging maintainers it may be required to move pip away from SpecifierSet to Specifier where possible.

I think you have misunderstood my intention. I believe we agree on the expected behaviour. I am trying to provide a mental model in which that behaviour is consistent and relatively straightforward to understand - which I hope in the end will help your proposed document to contain correct and useful examples.

This is off topic, if you have a concrete example of uv not following the spec please do raise this on the uv issue tracker

There are several messages in this thread on the topic of how uv behaves and whether or not that is compliant, starting with your own - which also contains a relevant example. There have also been issues reported already to uv, `uv pip install` fails to install package while `pip install` succeeds · Issue #9078 · astral-sh/uv · GitHub is a recent real world example. Personally I prefer not to provide astral with free labour.

From a quick skim, I think these links demonstrate more that the spec is unclear than that there’s multiple readings that people are willing to defend. But I agree that it needs more discussion, rather than being a simple clarifying edit.

Nevertheless, I don’t think this proposal can proceed until we have clarity on the prerelease behaviour in the spec. After all, the definition of intersection is obvious[1] and uncontroversial apart from how it interacts with the prerelease rules. So you need to get the foundation stable before you build on it.

I’m happy to have the initial discussion about prereleases here, but ultimately I think it requires its own thread, for visibility if nothing else.

For what it’s worth, my main thoughts are as follows:

  1. It was a mistake to mix version comparison and prerelease rules. Version comparison itself is simple, and the prerelease rules are less about whether a prerelease version satisfies a comparison, and more about whether selections should “see” prereleases or not. I’d rather that a revised definition separates these two aspects cleanly.
  2. The basic rule should be that prereleases are ignored unless the user explicitly asks for them to be included. Again, that’s simple. The problem lies with the “do what I mean” exceptions.
  3. The fact that “>=2.0” doesn’t include 2.0a1 has proved confusing to people, and I can see why. But I think changing that should be kept firmly out of scope here, as it’s a much more disruptive change.

  1. barring a couple of edge cases - compatible release and arbitrary equality, which you need to add to the spec for intersection ↩︎

1 Like