Clarify that arbitrary equality is supposed to be case insensitive

dimbleby · November 16, 2025, 8:04pm

I do not agree that the spec today is unclear. I agree with the view that the proposal is not a mere clarification, but a real change.

notatallshaw · November 16, 2025, 8:08pm

If the current spec is clear, then please can you clarify for me in the very first sentence of “Arbitrary equality”:

Arbitrary equality comparisons are simple string equality operations which do not take into account any of the semantic information such as zero padding or local versions.

What is “semantic information” and, more importantly, what is not “semantic information”?
What are “simple string equality operations”? (In particular noting the plural: operations)

Because I have not been able to fully reconcile my understanding on these phrases.

dimbleby · November 16, 2025, 8:16pm

I feel some danger of heading down a rabbit hole here, so let me make a couple of caveats first

There is a sense in which, if you say that you find the spec unclear - then you are automatically right that it is unclear, no matter what I think! So I start by acknowledging that
The parts that you now describe as unclear are not altered by your proposed change, so feel like red herrings for current purposes

Nevertheless…

It seems to me clear that the spec is distinguishing between all the meaningful version information that it has previously described (major, minor, local, and all the rest); and “just a string”.

notatallshaw · November 16, 2025, 8:27pm

Well the reason it’s relevant to this discussion, is because if we all agree that case sensitivity is not meaningful version information (semantic information), then we we don’t have to change the spec, because by not being semantic information the arbitrary equality operator can be insensitive to it.

But if we don’t all agree with that interpretation then the spec is unclear, and therefore clarifying it would help.

Now, I’m not using this as an argument for why the spec should be clarified to be case sensitive or insensitive (I’ve already made that argument in detail earlier in the thread), rather just as a counter to your point that it should not be clarified because it is already clear.

dimbleby · November 16, 2025, 8:43pm

This was not my point. My position is that what you are proposing is not a clarification but a change

I would expect to be in favour of a clarification. You have identified two phrases that you find not clear - perhaps you would like to propose clarifications to them?

notatallshaw · November 16, 2025, 9:04pm

I disagree this is a change, and you did state the spec was already not unclear.

My clarification, and purpose, is already proposed and explained in detail.

If you are -1 on it then it will be up to if that is acceptable in the standards process, or not, to add the clarification. I’m not going to engage in large open discussions that does not relate to this proposal, you’re welcome to start your own thread on a new proposal.

dimbleby · November 16, 2025, 9:08pm

Imagine the hypothetical organisation that has managed to upload versions “foobar” and “FooBar” of a package to its local repository.

My reading is that as of today a spec-compliant tool could distinguish those versions, and that this would not be the case if your proposal were accepted. Therefore the proposal is a change, rather than a clarification.

Is that not your reading too? If not, perhaps this is at the root of our confusion.

notatallshaw · November 16, 2025, 9:16pm

No, because a spec complaint tool would need to adhere to the following sentence:

This operator is special and acts as an escape hatch to allow someone using a tool which implements this specification to still install a legacy version which is otherwise incompatible with this specification.

The operator’s intent is to match legacy versions, not to match against binary identity, and legacy version parsers used case insensitive matching (at least for ASCII).

Let me flip the question, are you arguing that packaging is not currently spec complaint?

>>> from packaging.specifiers import *
>>> Specifier('===1.0+foobar').contains('1.0+FooBar')
True

dimbleby · November 16, 2025, 9:19pm

yes, packaging is not currently spec compliant

Perhaps another thought experiment. Supposing someone from that hypothetical organisation turns up with a bug report saying that they want to distinguish “foobar” and “FooBar”; and citing the spec in favour of this.

I think the likely outcome - in this unlikely scenario! - would be a bug fix in packaging, not a change in the spec.

notatallshaw · November 16, 2025, 9:20pm

As a maintainer of packaging I would be a strong -1 on that until the spec was clarified.

We would not know the impact of breaking backwards compatibility would be for other users of packaging.

pf_moore · November 16, 2025, 10:47pm

I’m going to invoke my authority as PEP delegate for packaging interoperability specs, and state officially that:

This is a change to the spec, although only a minor one. I don’t believe there is any point in debating this fact - it’s a side issue that only affects whether the change is a “clarification” or a “textual change”, and regardless of which it is, I’m happy to approve it. So there’s no practical difference, and we can simply agree to disagree, if necessary.
Implementing arbitrary equality is non-negotiable for any tool or library that wants to say they implement PEP 440. Whatever we think of it, it’s part of the spec.

It’s quite reasonable to argue that arbitrary equality should be case insensitive, because that matches all of the other version comparison rules in PEP 440, and because all of the legacy version comparison implementations that date back to when PEP 440 was written were case insensitive. And indeed, that’s the most compelling argument for me that we should accept this change.

Equally, the fact that packaging has implemented case insensitive arbitrary equality for many years is strong evidence that either case insensitivity is better aligned with user requirements, or that the question is irrelevant in real-world cases.

Nothing has been brought up in this sub-thread that suggests that the correct action to take here is to clarify^[1] that the spec does mean case sensitive comparison, and demand that packaging either change (breaking backward compatibility) or accept that they don’t implement the spec correctly.

@dimbleby if you have any evidence that changing the spec to require case insensitivity will break actual use cases, please provide details. Or if you know of any tools that implement their own version comparison and use case sensitive arbitrary equality (so that the proposed change would mean they needed to change their code), let us know which tools they are.

If we can’t find any examples of actual code or workflows that will break if arbitrary equality is made case insensitive, then I will be approving the change. If we can find such cases, I’ll make a decision on what is best for the ecosystem overall^[2]. But I honestly don’t think this is a significant enough change to warrant wasting people’s time and energy on hypothetical consequences, or rulebook lawyering.

Anyone who wants to remove arbitrary equality from the version comparison specs, or to make it optional, should start a new thread and be prepared to write a PEP. It’s not relevant to this discussion, and I’d appreciate it if we could keep things focused here.

Even if you think the current spec is unambiguous, it’s objectively clear that it can be interpreted otherwise, and that should be fixed. ↩︎
And spoiler, it’s quite likely that I’ll still choose case insensitivity. ↩︎

dimbleby · November 16, 2025, 10:55pm

uv implementation looks case sensitive to me. Hard to tell, maybe it is only supported for non-legacy versions and maybe to_string() is normalizing. (Which would make this implementation non-compliant anyway, according to me)

I have no idea how things work for pdm, conda, pixi, any other tools I either forgot or do not know about

(the burden of proof seems the wrong way round here)

notatallshaw · November 16, 2025, 11:13pm

They are normalized when they are first parsed as version objects, e.g. local segment is lower cased: uv/crates/uv-pep440/src/version.rs at bf99f0a1956b484c360baeaed787c4c1d44b3ec5 · astral-sh/uv · GitHub

You can validate yourself by adding tests in the version_specifier.rs

#[test]
fn test_arbitrary_equality_case_sensitivity() {
    assert!(
        VersionSpecifier::from_str("=== 1.0+local")
            .unwrap()
            .contains(&Version::from_str("1.0+LOCAL").unwrap())
    );

    assert!(
        VersionSpecifier::from_str("=== 1.0a1")
            .unwrap()
            .contains(&Version::from_str("1.0A1").unwrap())
    );

    assert!(
        VersionSpecifier::from_str("=== 1.0a1.post2.dev3+local")
            .unwrap()
            .contains(&Version::from_str("1.0A1.POST2.DEV3+LOCAL").unwrap())
    );
}

And to run:

$ cargo test -p uv-pep440 test_arbitrary_equality_case_sensitivity 
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.12s
     Running unittests src/lib.rs (target/debug/deps/uv_pep440-6312b73b34ba63da)

running 1 test
test version_specifier::tests::test_arbitrary_equality_case_sensitivity ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 49 filtered out; finished in 0.00s

pf_moore · November 16, 2025, 11:20pm

This isn’t about proof. It’s about helping me to judge what’s best for the ecosystem when I make my decision on the change.

My reasoning is that case insensitivity (preserving the behaviour packaging has implemented for years) is the best way forward, but I want to give anyone who will be impacted by the change a chance to flag that in order for me to factor their use case into the cost/benefit calculation.

Yes, it’s not ideal to have to rely on “if you don’t speak up, then tough” - but the alternative (as we’ve learned over the years) is to end up paralysed, unable to make any change for fear of breaking something.

dimbleby · November 16, 2025, 11:24pm

Cool, thanks

I think that this is strictly a mistake - per my reading, version normalisation rules do not apply when dealing with arbitrary equality (how else to be consistent with “do not take into account … zero padding”?)

Would you want to change that too? Something like: if a version parses according to the rules set out elsewhere in this specification, then the normalization rules do apply during arbitrary equality checks?

notatallshaw · November 16, 2025, 11:27pm

I am only concerned with case insensitivity, as it’s the only sticking point I have attempting to fully implement arbitrary equality in packaging, due to it’s history in packaging and setuptools, and the wording in the specification. Regardless of the outcome of this proposal that will inform how I proceed with the implementation.

I am not interested in broadening this proposal, if you would like to push for an alternative proposal please do that on a different thread.

dimbleby · November 16, 2025, 11:33pm

>>> Specifier("===1.1").contains("1.01")
True

Do you consider this a bug, or something that will eventually want a spec clarification?

Better to do it all in one go I would think

notatallshaw · November 16, 2025, 11:51pm

Again, I am not going to be drawn into expanding this proposal.

If you are concerned about packaging bugs please report them to https://github.com/pypa/packaging/issues, I am working my way through arbitrary equality bugs and issues in packaging at the moment, there are many, e.g. Fix arbitrary equality intersection preservation in `SpecifierSet` by notatallshaw · Pull Request #951 · pypa/packaging · GitHub.

If you would like to propose this be clarified feel free to start a separate discussion.

dimbleby · November 17, 2025, 12:00am

That’s good, I think.
I would just ask you to consider whether there are any other specifics of packaging behaviour that you would wish to insert into the spec (for similar reasons as this case insensitivity one).

A spec being a moving target is not ideal, I do think it would be better to make such changes in one batch than piecemeal.

notatallshaw · November 17, 2025, 12:43am

Not currently. But issues do come up over time:

The discussion Are Developmental releases a type of pre-release? resulted in the spec being clarified: Clarify that dev releases are considered pre-releases when handling them by brettcannon · Pull Request #1857 · pypa/packaging.python.org · GitHub.

The discussion Proposal: Intersect and Disjoint Operations for Python Version Specifiers eventually led me to update packaging: PEP 440 handling of prereleases for `Specifier.contains`, `SpecifierSet.contains`, and `SpecifierSet.filter` by notatallshaw · Pull Request #897 · pypa/packaging · GitHub.

The Python packaging specification is massive, with a rich history, and full of edge cases.