I do not agree that the spec today is unclear. I agree with the view that the proposal is not a mere clarification, but a real change.
If the current spec is clear, then please can you clarify for me in the very first sentence of āArbitrary equalityā:
Arbitrary equality comparisons are simple string equality operations which do not take into account any of the semantic information such as zero padding or local versions.
- What is āsemantic informationā and, more importantly, what is not āsemantic informationā?
- What are āsimple string equality operationsā? (In particular noting the plural: operations)
Because I have not been able to fully reconcile my understanding on these phrases.
I feel some danger of heading down a rabbit hole here, so let me make a couple of caveats first
- There is a sense in which, if you say that you find the spec unclear - then you are automatically right that it is unclear, no matter what I think! So I start by acknowledging that
- The parts that you now describe as unclear are not altered by your proposed change, so feel like red herrings for current purposes
Neverthelessā¦
It seems to me clear that the spec is distinguishing between all the meaningful version information that it has previously described (major, minor, local, and all the rest); and ājust a stringā.
Well the reason itās relevant to this discussion, is because if we all agree that case sensitivity is not meaningful version information (semantic information), then we we donāt have to change the spec, because by not being semantic information the arbitrary equality operator can be insensitive to it.
But if we donāt all agree with that interpretation then the spec is unclear, and therefore clarifying it would help.
Now, Iām not using this as an argument for why the spec should be clarified to be case sensitive or insensitive (Iāve already made that argument in detail earlier in the thread), rather just as a counter to your point that it should not be clarified because it is already clear.
This was not my point. My position is that what you are proposing is not a clarification but a change
I would expect to be in favour of a clarification. You have identified two phrases that you find not clear - perhaps you would like to propose clarifications to them?
I disagree this is a change, and you did state the spec was already not unclear.
My clarification, and purpose, is already proposed and explained in detail.
If you are -1 on it then it will be up to if that is acceptable in the standards process, or not, to add the clarification. Iām not going to engage in large open discussions that does not relate to this proposal, youāre welcome to start your own thread on a new proposal.
Imagine the hypothetical organisation that has managed to upload versions āfoobarā and āFooBarā of a package to its local repository.
My reading is that as of today a spec-compliant tool could distinguish those versions, and that this would not be the case if your proposal were accepted. Therefore the proposal is a change, rather than a clarification.
Is that not your reading too? If not, perhaps this is at the root of our confusion.
No, because a spec complaint tool would need to adhere to the following sentence:
This operator is special and acts as an escape hatch to allow someone using a tool which implements this specification to still install a legacy version which is otherwise incompatible with this specification.
The operatorās intent is to match legacy versions, not to match against binary identity, and legacy version parsers used case insensitive matching (at least for ASCII).
Let me flip the question, are you arguing that packaging is not currently spec complaint?
>>> from packaging.specifiers import *
>>> Specifier('===1.0+foobar').contains('1.0+FooBar')
True
yes, packaging is not currently spec compliant
Perhaps another thought experiment. Supposing someone from that hypothetical organisation turns up with a bug report saying that they want to distinguish āfoobarā and āFooBarā; and citing the spec in favour of this.
I think the likely outcome - in this unlikely scenario! - would be a bug fix in packaging, not a change in the spec.
As a maintainer of packaging I would be a strong -1 on that until the spec was clarified.
We would not know the impact of breaking backwards compatibility would be for other users of packaging.
Iām going to invoke my authority as PEP delegate for packaging interoperability specs, and state officially that:
- This is a change to the spec, although only a minor one. I donāt believe there is any point in debating this fact - itās a side issue that only affects whether the change is a āclarificationā or a ātextual changeā, and regardless of which it is, Iām happy to approve it. So thereās no practical difference, and we can simply agree to disagree, if necessary.
- Implementing arbitrary equality is non-negotiable for any tool or library that wants to say they implement PEP 440. Whatever we think of it, itās part of the spec.
Itās quite reasonable to argue that arbitrary equality should be case insensitive, because that matches all of the other version comparison rules in PEP 440, and because all of the legacy version comparison implementations that date back to when PEP 440 was written were case insensitive. And indeed, thatās the most compelling argument for me that we should accept this change.
Equally, the fact that packaging has implemented case insensitive arbitrary equality for many years is strong evidence that either case insensitivity is better aligned with user requirements, or that the question is irrelevant in real-world cases.
Nothing has been brought up in this sub-thread that suggests that the correct action to take here is to clarify[1] that the spec does mean case sensitive comparison, and demand that packaging either change (breaking backward compatibility) or accept that they donāt implement the spec correctly.
@dimbleby if you have any evidence that changing the spec to require case insensitivity will break actual use cases, please provide details. Or if you know of any tools that implement their own version comparison and use case sensitive arbitrary equality (so that the proposed change would mean they needed to change their code), let us know which tools they are.
If we canāt find any examples of actual code or workflows that will break if arbitrary equality is made case insensitive, then I will be approving the change. If we can find such cases, Iāll make a decision on what is best for the ecosystem overall[2]. But I honestly donāt think this is a significant enough change to warrant wasting peopleās time and energy on hypothetical consequences, or rulebook lawyering.
Anyone who wants to remove arbitrary equality from the version comparison specs, or to make it optional, should start a new thread and be prepared to write a PEP. Itās not relevant to this discussion, and Iād appreciate it if we could keep things focused here.
uv implementation looks case sensitive to me. Hard to tell, maybe it is only supported for non-legacy versions and maybe to_string() is normalizing. (Which would make this implementation non-compliant anyway, according to me)
I have no idea how things work for pdm, conda, pixi, any other tools I either forgot or do not know about
(the burden of proof seems the wrong way round here)
They are normalized when they are first parsed as version objects, e.g. local segment is lower cased: uv/crates/uv-pep440/src/version.rs at bf99f0a1956b484c360baeaed787c4c1d44b3ec5 Ā· astral-sh/uv Ā· GitHub
You can validate yourself by adding tests in the version_specifier.rs
#[test]
fn test_arbitrary_equality_case_sensitivity() {
assert!(
VersionSpecifier::from_str("=== 1.0+local")
.unwrap()
.contains(&Version::from_str("1.0+LOCAL").unwrap())
);
assert!(
VersionSpecifier::from_str("=== 1.0a1")
.unwrap()
.contains(&Version::from_str("1.0A1").unwrap())
);
assert!(
VersionSpecifier::from_str("=== 1.0a1.post2.dev3+local")
.unwrap()
.contains(&Version::from_str("1.0A1.POST2.DEV3+LOCAL").unwrap())
);
}
And to run:
$ cargo test -p uv-pep440 test_arbitrary_equality_case_sensitivity
Finished `test` profile [unoptimized + debuginfo] target(s) in 0.12s
Running unittests src/lib.rs (target/debug/deps/uv_pep440-6312b73b34ba63da)
running 1 test
test version_specifier::tests::test_arbitrary_equality_case_sensitivity ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 49 filtered out; finished in 0.00s
This isnāt about proof. Itās about helping me to judge whatās best for the ecosystem when I make my decision on the change.
My reasoning is that case insensitivity (preserving the behaviour packaging has implemented for years) is the best way forward, but I want to give anyone who will be impacted by the change a chance to flag that in order for me to factor their use case into the cost/benefit calculation.
Yes, itās not ideal to have to rely on āif you donāt speak up, then toughā - but the alternative (as weāve learned over the years) is to end up paralysed, unable to make any change for fear of breaking something.
Cool, thanks
I think that this is strictly a mistake - per my reading, version normalisation rules do not apply when dealing with arbitrary equality (how else to be consistent with ādo not take into account ⦠zero paddingā?)
Would you want to change that too? Something like: if a version parses according to the rules set out elsewhere in this specification, then the normalization rules do apply during arbitrary equality checks?
I am only concerned with case insensitivity, as itās the only sticking point I have attempting to fully implement arbitrary equality in packaging, due to itās history in packaging and setuptools, and the wording in the specification. Regardless of the outcome of this proposal that will inform how I proceed with the implementation.
I am not interested in broadening this proposal, if you would like to push for an alternative proposal please do that on a different thread.
>>> Specifier("===1.1").contains("1.01")
True
Do you consider this a bug, or something that will eventually want a spec clarification?
Better to do it all in one go I would think
Again, I am not going to be drawn into expanding this proposal.
If you are concerned about packaging bugs please report them to https://github.com/pypa/packaging/issues, I am working my way through arbitrary equality bugs and issues in packaging at the moment, there are many, e.g. Fix arbitrary equality intersection preservation in `SpecifierSet` by notatallshaw Ā· Pull Request #951 Ā· pypa/packaging Ā· GitHub.
If you would like to propose this be clarified feel free to start a separate discussion.
Thatās good, I think.
I would just ask you to consider whether there are any other specifics of packaging behaviour that you would wish to insert into the spec (for similar reasons as this case insensitivity one).
A spec being a moving target is not ideal, I do think it would be better to make such changes in one batch than piecemeal.
Not currently. But issues do come up over time:
The discussion Are Developmental releases a type of pre-release? resulted in the spec being clarified: Clarify that dev releases are considered pre-releases when handling them by brettcannon Ā· Pull Request #1857 Ā· pypa/packaging.python.org Ā· GitHub.
The discussion Proposal: Intersect and Disjoint Operations for Python Version Specifiers eventually led me to update packaging: PEP 440 handling of prereleases for `Specifier.contains`, `SpecifierSet.contains`, and `SpecifierSet.filter` by notatallshaw Ā· Pull Request #897 Ā· pypa/packaging Ā· GitHub.
The Python packaging specification is massive, with a rich history, and full of edge cases.