Proposal: Intersect and Disjoint Operations for Python Version Specifiers

Nevertheless, I don’t think this proposal can proceed until we have clarity on the prerelease behaviour in the spec. After all, the definition of intersection is obvious[1] and uncontroversial apart from how it interacts with the prerelease rules. So you need to get the foundation stable before you build on it.

I’m happy to have the initial discussion about prereleases here, but ultimately I think it requires its own thread, for visibility if nothing else.

I was quite careful to construct my proposal so it didn’t rely on how to interpret prereleases for a single specifier, the proposal tells you how you get from two specifiers to one specifier, it doesn’t tell you how you should interpret a single specifier and prerelease versions, that’s left to the rest of the existing spec, so I agree that how a single specifier interacts with prereleases is offtopic here and it should be taken to another thread.

If you would like to start that discussion I would be happy to participate if I felt my voice was helpful, but I’ve tried multiple times to point out the spec is unclear on prereleases and each time it hasn’t yielded constructive outcomes, it seems I am missing some important community communication skill or something.

1 Like

My instinct is that it’s not possible to do that successfully. I’m happy if you want to develop your proposal into a PEP on the assumption that it is possible, but if you do, I will try to break it with edge cases, and I won’t accept it[1] unless it’s proved robust. IMO, the last thing we want is to make the confusion around pre-releases worse than it currently is.

I guess I might have to :slightly_frowning_face: I don’t think we can leave things as they are indefinitely - especially as we now have multiple implementations of the relevant specs and tools that rely on them, we need more clarity.

Pointing out that it’s unclear seems to just result in people saying “yep, I guess so, but I think it means X”. But there’s no obvious route to consensus with that. I think that what’s needed is mostly just for someone to say “this is what the rules should be” in a way that is clear, and dare anyone to object :slightly_smiling_face: It’s much easier to debate an actual proposal than a broad statement that “something needs to be done”. Of course, I could easily be proved wrong…


  1. assuming I’m PEP delegate ↩︎

1 Like

My instinct is that it’s not possible to do that successfully. I’m happy if you want to develop your proposal into a PEP on the assumption that it is possible, but if you do, I will try to break it with edge cases, and I won’t accept it[1] unless it’s proved robust. IMO, the last thing we want is to make the confusion around pre-releases worse than it currently is.

Great, I wouldn’t want it to be accepted if it had breaking edge cases. This discussion, while I still feel not strictly related to my proposal, has been invaluable to thinking about how to word a PEP.

Pointing out that it’s unclear seems to just result in people saying “yep, I guess so, but I think it means X”. But there’s no obvious route to consensus with that. I think that what’s needed is mostly just for someone to say “this is what the rules should be” in a way that is clear, and dare anyone to object :slightly_smiling_face:It’s much easier to debate an actual proposal than a broad statement that “something needs to be done”. Of course, I could easily be proved wrong…

Yes, I guess I was coming in with the assumption that there would be an existing group of people actively working on and willing to change things in the face of issues. I’m more used to reporting problems with software than specifications, and my expectations are probably misaligned from that.

FWIW, I’m happy to help if you have questions on the PEP process. As I’m likely to be PEP delegate, I’d prefer not to also be the sponsor for the PEP, but that doesn’t mean I can’t help if you need it. In return, when I have a draft of the pre-release handling proposal, would you be willing to give it an initial review before I post it?

By the way, having just done a deep dive into the specifier spec, I’m much clearer on where the ambiguities lie, and I think they are probably not going to impact your proposal here much. What will impact your proposal is the rules for how < is handled, which are not unclear at all, but which have some nasty mathematical properties that you’ll need to consider[1]. I think you’re alright (I just deleted a long example that I thought would fail, because I realised it was fine!) but you should check things through carefully.


  1. Specifically, just because a version satisfies < 1.0a1 does not mean it satisfies < 2.0, even though 2.0 is greater than 1.0a1 ↩︎

when I have a draft of the pre-release handling proposal, would you be willing to give it an initial review before I post it?

Very happy to help.

Specifically, just because a version satisfies < 1.0a1 does not mean it satisfies < 2.0, even though 2.0 is greater than 1.0a1

Oh, I know, and it’s closely tied to the first problem I outlined.

Without significantly breaking existing behavior I think any clarification needs to separate the concept of specifiers and version sets. And that there are two version sets a specifier can act on, the final+post version set, and the final+post+prerelease version set. Then a specifier by default acts on the final+post version set, but if some clearly outlined condition is met (such as the specifier including a version that only exists in the final+post+prerelease version set) then it acts on the final+post+prerelease version set. What’s important, and why I feel my proposal is needed, is because this is not normal mathematics, and intersection of specifiers does not always equal intersection of version sets they represent. But happy to follow up with you directly.

1 Like

I look at it the other way round. There are two different operations specifiers participate in. One is “does this version match this specifier?” This one is clearly and unambiguously defined in the spec. The second is “what versions from this list match the specifier?” (a filtering operation). This is the one with the badly-worded special cases, and even then, only if the filtered set contains both pre-release and final versions. In that case, and that case alone, “did the user specify --pre” applies. I haven’t finished working through the details, but I’m pretty sure those are the key details - and worded that way, they aren’t ambiguous (or hard to explain).

As the filtering operation works as “first, filter the set using the matching rule, then decide what part of the resulting set to return”, the two operations are consistent and the special case rules are nicely limited in their impact.

Of course, there’s now the problem of deciding which operation is the appropriate one in any given situation, but that’s an application problem, not a specification one :slightly_smiling_face:

Oh, very definitely!

IMO the mistake here is thinking of specifiers as representing sets. They don’t, they represent operations on sets of versions. Thinking that way makes things a lot clearer (IMO).

As the filtering operation works as “first, filter the set using the matching rule, then decide what part of the resulting set to return”, the two operations are consistent and the special case rules are nicely limited in their impact.

My concern with this approach is how do you extend this to when you have multiple specifiers, either in my example where you want to simplify (or intersect) two specifiers into one, or in the case of packaging where you have a SpecifierSet, or in the case of having a Python package resolver and you must decide if “0.9a1” is valid given the two specifiers “<1.0” and “<2.0a1”.

Unless the spec includes a way to reduce two specifiers to one specifier (a.k.a my proposal) then Packaging, Poetry, and uv can all disagree on how to handle this and still claim they are following the spec.

IMO the mistake here is thinking of specifiers as representing sets. They don’t, they represent operations on sets of versions. Thinking that way makes things a lot clearer (IMO).

Agreed, if I write up a PEP I’m going to drop the term “Intersect” and “Disjoint”, the less we think of specifiers in terms of the version set they normally represent, and the more we think about them in terms of operations to apply on sets the easier it will be to define them consistently.

1 Like

Noted. I think the steps as I described them still apply, but I’ve yet to test that. I’ll make sure to add it to my list of things to do.

That’s a “match” operation, not a “filter” one, and it doesn’t hit any of the ambiguous cases in the spec (IMO). So it’s straightforward - 0.9a1 doesn’t match <1.0 because 1.0 is a final version (pre-releases never match). It does match <2.0a1 because 2.0a1 is a pre-release version. False and True is False, so it doesn’t match.

You can’t say “but the specifier mentions a pre-release, so we include pre-releases” because this isn’t a single specifier, it’s a combination of two. Combinations are introduced in the “dependency specifiers” spec, which is unfortunately heavy on syntax but light on semantics - but combinations are clearly intended to be interpreted as requiring every individual part to match (a logical “and” operation).

I choose to interpret the filter operation on a specifier set as “match using the specifier set, then apply the pre-release heuristics” rather than “filter using the individual specifiers and then combine” for the practical reason that there’s no well-defined rule for combining two sets of versions, which may have been built using different pre-release matching rules. I agree that’s a debatable choice, but defining rules for merging two version sets requires defining what a “version set with a specific pre-release rule applied” even is, and that’s very much more in the territory of a new spec, rather than a clarification. I’ll note it in the “rejected ideas” section of my proposal, though.

Sorry - this is the “version specifier semantics” discussion that we intended to make into a new thread. I haven’t finished my proposal yet so I’m not ready to start that thread. But the discussion is useful anyway, so hopefully you don’t mind it happening here.

It’s quite possible given the way this discussion is heading that specifier intersection will fall naturally out of the clarified pre-release rules. IMO, that would be a good thing, because it demonstrates that the whole thing is nicely self-consistent :slightly_smiling_face:

That’s a “match” operation, not a “filter” one, and it doesn’t hit any of the ambiguous cases in the spec (IMO). So it’s straightforward - 0.9a1 doesn’t match <1.0 because 1.0 is a final version (pre-releases never match). It does match <2.0a1 because 2.0a1 is a pre-release version. False and True is False, so it doesn’t match.

Apologies, I was light on my description, I meant this in the context of a filter operation where there are other non-prerelease versions available. I agree with your description for single version matching, as you say it is currently unambiguous, so by default it’s not what I’m thinking about.

I choose to interpret the filter operation on a specifier set as “match using the specifier set, then apply the pre-release heuristics” rather than “filter using the individual specifiers and then combine” for the practical reason that there’s no well-defined rule for combining two sets of versions

I am concerned that will either have breaking edge cases or big backwards incompatibilities, but I am happy to look over the final proposal and try to find them, rather than muse on incomplete descriptions here.

I have been mostly nodding along but I do not agree with this bit. Or perhaps I have your filter and match terminology backwards.

Exclusive ordered comparisons “specifically exclude pre-releases, post-releases, and local versions of the specified version.” (emphasis mine).

Since 0.9 is not “the specified version” this clause is not fired, and 0.9a1 does match <1.0 (though it will almost always be filtered out).

Where did I go wrong?

Maybe it is unclear how to group the sub-phrase “pre-releases, post-releases, and local versions of the specified version”? My reading is that “of the specified version” applies to all three of pre-releases, post-releases and local versions. Perhaps you read it as applying specifically only to local versions?

Edit: there is more text which I think supports my reading

The exclusive ordered comparison <V MUST NOT allow a pre-release of the specified version unless the specified version is itself a pre-release

(emphasis again mine)

1 Like

Whoops, no, that’s simply me having misread the spec. Apologies for that - the definition is still complex, even if it’s not (IMO) ambiguous. I’ll make sure to get it right in my actual proposal when I complete it.

Thanks for the correction!

1 Like

Exclusive ordered comparisons “specifically exclude pre-releases, post-releases, and local versions of the specified version.” (emphasis mine).

Whoops, no, that’s simply me having misread the spec. Apologies for that - the definition is still complex, even if it’s not (IMO) ambiguous. I’ll make sure to get it right in my actual proposal when I complete it.

I agree that this is difficult to parse and I am looking forward to a rewrite.

But to be clear about what is currently written, it is saying it must not allow a pre-release of the “specified version” (presumably “specified version” is the non-pre non-post part of the version but that’s never defined), but it is not saying anything about what pre-release non-specified versions it should include, that is covered by other parts of the spec.

So as an example, it does not imply that “0.9a1” should be selected by “<1”, it could be but that line doesn’t cover this. But it does imply “1.0a1” should not be selected by “<1”, I think even when “–pre” is passed or prereleases=True is explicit, otherwise this whole line is pointless.

Apologies if this post is redundant and that was clear to everyone.

Correct. And I’d say that because it doesn’t say anything explicit, that implies that 0.9a1 does match <1.

[Edit: Hit “send” too soon :slightly_frowning_face:]

It doesn’t just imply that, it states it. To put it in my preferred terminology, 1.0a1 does not match <1. (It also does not match >1, or ==1, so specifiers are weird, mathematically, but we knew that…)

When used to filter a set of versions, the above applies without qualification. But once the set of matching versions is determined (a set which won’t include 1.0a1), the question of whether other pre-releases are included depends on whether the user specified --pre (or some equivalent).

It brought up a good point, so thanks for mentioning it.

1 Like

One more thing before I forget, the spec author intended “!=” never imply a prerelease: https://github.com/pypa/packaging/issues/776#issuecomment-1900515985

And this is tested for in packaging: https://github.com/pypa/packaging/pull/794
And implemented in uv: https://github.com/astral-sh/uv/pull/7974

So it would be good to mention that explicitly.

Correct. And I’d say that because it doesn’t say anything explicit, that implies that 0.9a1 does match <1.

Given how complicated it is to interpret this topic I would prefer we didn’t have to imply from things not written. And I’m pretty sure that would be a breaking change with the canonical implementation of the spec:

>>> from packaging.specifiers import Specifier
>>> Specifier('<1').contains("0.9a1")
False
>>> Specifier('<1a1').contains("0.9a1")
True

Which I think is derived from this line which is explicit:

Pre-releases of any kind, including developmental releases, are implicitly excluded from all version specifiers, unless they are already present on the system, explicitly requested by the user, or if the only available version that satisfies the version specifier is a pre-release.

Unless I’m misunderstanding the matching/filtering context again (which is highly possible, I find this very confusing and I’m not even sure what the value of matching is outside filtering).

1 Like

I don’t think that’s directly relevant here, but the reason is subtle (and potentially controversial, so I’m going to explain it here and see if I get shot down :slightly_smiling_face:)

The discussion you link to is about the fact that merely mentioning a pre-release version in a specifier is enough to qualify as “the user explicitly requested pre-releases”. But that is, in my view, an extremely liberal interpretation of “explicitly requesting” something, and it’s definitely not covered in the spec. And it could easily have been - a simple example could have been given stating that “>=1.0a1 is an example of the user explicitly requesting pre-releases”.

I’m on very shaky ground in saying that, though, as it’s well-known that mentioning a pre-release is the same as specifying --pre. So I think I need to add something explicit to cover that. But it’s technically a change to the spec, and it makes certain specifier sets unreducible (for example <1.0, <2.0a1). So I want to think about how I document it.

No, you simply spotted the part of the spec that I thought I remembered, but couldn’t find when trying to analyze this case. I went with what I could find, which led me to the wrong conclusion. Although if we take “Pre-releases of any kind, including developmental releases, are implicitly excluded from all version specifiers” as applying here, it makes me wonder what the point is of the whole comment about pre-releases in the “Exclusive ordered comparisons” section :slightly_frowning_face:

Well, resolvelib needs to check if a candidate matches a specifier. That’s the obvious example.

Filtering is how pip gets a list of candidate versions from the index (the finder). That’s where we include (or don’t) pre-releases. Matching is what the resolver uses to confirm if a candidate matches a newly discovered constraint.

But that is, in my view, an extremely liberal interpretation of “explicitly requesting” something, and it’s definitely not covered in the spec. And it could easily have been - a simple example could have been given stating that “>=1.0a1 is an example of the user explicitly requesting pre-releases”.

Well, one thing I would say is user workflows rely on this interpretation, e.g.: https://discuss.python.org/t/handling-of-pre-releases-when-backtracking/40505/24. So this would be a breaking change in more than a nuanced technical way.

There is an argument that this line explicitly supports the current interpretation:

Allowing pre-releases that are earlier than, but not equal to a specific pre-release may be accomplished by using <V.rc1 or similar.

As discussed here: https://github.com/pypa/packaging/issues/788

Well, resolvelib needs to check if a candidate matches a specifier. That’s the obvious example.

Filtering is how pip gets a list of candidate versions from the index (the finder). That’s where we include (or don’t) pre-releases. Matching is what the resolver uses to confirm if a candidate matches a newly discovered constraint.

Yeah, and this might be a good example of why matching semantics are not useful.

So first of all by “resolvelib” we’re actually talking about the pip resolvelib provider (as an aside resolvelib proper is super generic and has no concept of prereleases, or even versions or specifiers, it asks the provider, that given a bunch of constraints, “are there any candidates?”).

Within the pip resolvelib provider every time it uses the match semantics (i.e. Specifier.contains) it has to use prereleases=True, and rely on the the filtering semantics during collection to get the right version candidates:

Actually, everywhere pip uses Specifier.contains it uses prereleases=True, probably for the same reason.

This does seem to me that match semantics needs to be “fixed”, or removed, and filter semantics need to be preserved. But perhaps I am drawing the wrong conclusions.

It implicitly supports that interpretation, but it’s extremely annoying that there is no explicit statement. It’s just another part of the fact that the whole “Handling prereleases” section is written in the form of a bunch of usage examples, rather than a proper specification. Sigh - hindsight is so much easier, isn’t it? :slightly_smiling_face:

See, in my interpretation, prereleases=True should be irrelevant to matching. And the fact that pip always specifies it implies to me that matching with prereleases=False is the useless operation, which is why my formulation basically removes it.

Nope, that’s basically what I’m saying. But I’m coming to the conclusion that like it or not, this could be a breaking change. It may only break code that’s never used in practice (as you discovered, it looks like contains(prereleases=False) may be unused in practice, and if so, its behaviour is irrelevant) but it still means tools would need fixing.

On the other hand, I do still think that it should be possible to simply describe the status quo better. Whether match/filter is a reasonable way of doing so is still up for debate, but if it isn’t, then the packaging API is going to be just as difficult to define cleanly as the spec itself…

1 Like