Clarify that arbitrary equality is supposed to be case insensitive

To be clear, I don’t think it’s worth doing a whole PEP process to deprecate this operator, the “heavily discouraged” note in the spec already transports what we need to tell users. I got pinged on this change as we’re maintaining a PEP 440 implementation, and found this thread with several people and 40+ comments including procedural and unicode discussions around it, which seemed we were building very involved solutions for something that was discouraged on inception, hence my comment. I’m definitely not opposed to making the minimal text change to document what’s already done by packaging and uv anyway.

4 Likes

FWIW, I think most of those comments were an unfortunate waste of time. I gained very little by reading them and lost almost all interest in this thread when it diverged from trying to fix one sentence of spec language.

To me, the spec ambiguously, weakly indicates that the comparisons should be case sensitive. But packaging and uv lowercase names. And we have >15 years of practice which aligns with that as well.

I think the solution is simple, so long as we don’t get lost in the weeds again. Improve the spec to be less ambiguous and affirm the tool choice to lowercase things. I proposed a wording which does this, Paul refined it, and nobody seems to have any practical objection to it. We’re now holding in case someone shows up to tell us that we’ve missed something.

7 Likes

To be pedantic, you still need to be clear about what you mean by “lowercase” with dealing with Unicode because of the Turkish “I”.

The Turkish alphabet has the uppercase/lowercase pairs I/ı (“LATIN CAPITAL LETTER I” / “LATIN SMALL LETTER DOTLESS I”) and İ/i (“LATIN CAPITAL LETTER I WITH DOT ABOVE” / “LATIN SMALL LETTER I”), whereas virtually everyone else has the uppercase/lowercase pair I/i (“LATIN CAPITAL LETTER I” / “LATIN SMALL LETTER I”), which is what Python’s lowercase method does.

1 Like

I understood the thread got a little long, but Unicode collation was the first discussion and has been decided upon: Clarify that arbitrary equality is supposed to be case insensitive - #10 by sirosen

2 Likes

I agree. I have split the whole deprecation topic into its own topic.

I also want to remind people you can flag a post as off-topic to help keep things in line.

As well, I will now be actively working to keep this topic focused as a moderator, so I will actively be hiding posts that stray (although I always prefer others do it). That means no more posts here about deprecations, process around standards (short of what @pf_moore says in terms of what’s required for the change to be accepted), or adding any other clarifications to any spec. This is entirely about case-sensitivity or not and what the directly entails as stated in the opening post.

3 Likes

While it’s been a while, I don’t recall intentionally excluding the arbitrary equality check from the ASCII case insensitivity specification described in the section on version normalisation.

Instead, the intent of the feature was to provide comparable capabilities to the old permissive version string parsing in setuptools, which implemented case-insensitive comparisons:

>>> from pkg_resources import parse_version
>>> parse_version("foo") == parse_version("Foo")
True

(^^^ is from setuptools 64.0.3 running on Python 3.10. The latest versions of setuptools don’t fall back to the legacy parsing by default anymore)

And reading it again now, I don’t see our wording on arbitrary equality as excluding the structural normalisation of the letter case (emphasis added):

… which do not take into account any of the semantic information such as zero padding or local versions.

My suggested wording addition to clarify that (and given the pkg_resources.parse_version heritage, I do consider it a bug fix/clarification rather than a change in the intent of the spec) would be to add a sentence like “Arbitrary equality implementations MUST convert ASCII letters to lowercase prior to the string comparison, and MAY convert other cased Unicode characters to lowercase.”

3 Likes

Thanks for clarifying and confirming my understanding of the intent, I also caught that the word semantic may already imply case insensitivity, but as you can see from several people reading the spec and coming to the opposite conclusion it is not clear.

That said, I don’t think there’s any need to prescribe the method for match case insensitivity to ASCII chacters (for example if someone does some clever binary masking check because it’s more efficient that wouldn’t be incorrect?), and I don’t want to define what it means to lower case a Unicode character, as seems to be the ask if we’re to define Unicode behavior in anyway at all.

I am happy to take the MUST from your suggestion though.

2 Likes

The reason for the MAY part of the suggestion is so str.lower remains an acceptable mechanism for implementing the case insensitivity, while emphasising that implementations may not do that. Using upper case non-ASCII letters in the version is just a generally poor idea all around.

1 Like

I’ve raised the PR on the PPO repo: Specify arbitrary equality case insensitivity. by notatallshaw · Pull Request #1959 · pypa/packaging.python.org · GitHub

I’ve done my best to merge Stephen’s, Paul’s, and Alyssa’s text while not breaking any of the agreed upon concerns.

2 Likes