Helping people with "broken" sorting

dg-pb · September 12, 2025, 3:52pm

Incorporated that as well. Hopefully did not break anything else.

Inconsistency: A==B and B==C, but not C==A
    A @ 0 = [@ 0] x
    B @ 1 = [@ 1] y
    C @ 2 = [@ 2] z

tim.one · September 12, 2025, 5:24pm

Franklinvp:

tim.one:
>>> sorted([{1, 5}, {1}, {3, 4}])
[{1}, {3, 4}, {1, 5}]
What do you think it “must” produce? That, or something else?
They used stable as part of the assumptions of their statement. Under that assumption, and with respect to each input list, their statement (the entire original post, not just the quoted part) is correct. {1, 5} and {3, 4} being incomparable, should have stayed in the original relative position.

Which I explicitly acknowledged in the post you’re replying to:

and went on to explain why that’s a non-starter. It’s not, and never has been, Python’s notion of “stiability”. Which relies on preserving the relative order of objects that compare __eq__, which Pyrthon’s sort cannot know for sure, but can only infer from not x < y and not y < x. The notion you’re using has never been met by Python, and I think it very unlikely it ever will be.

We’re only using a relatively small subset of the n^2 possible __lt__ compares (else sorting would be unbearably slow quadratic time at best), so just can’t detect all pairs that are incomparable. When we do detect an incomparable pair (in the sense of neither being < than the other), that’s taken as implying equality. Which is true under a total order, and from A > B == C it’s fine to conclude that [B, C, A] is sorted (which is what happens in the example you echoed at the start). A and C are never compared to each other directly in that example.

Sure, and I also showed that’s what pypy still returns, and explained in detail why CPython and pypy return different results.

While different people have expanded the originally stated goal to different degrees. the original goal of flagging cases where results may change across Python’s releases requires verifying that there’s a total order.. The example I gave is one where the result did change across releases, and set.__lt__ is a strict partial order. Which proves that checking just for the latter is insufficient to meet the most modest goal on the table.

You may want to argue that Python’s sort is itself broken because it doesn’t promise to preserve the order of elements for which none of x < y, y < x, and x == y is true, but Python’s sort has no way to detect that “even in theory”, because it promises not to invoke __eq__. Inferring the last from the failure of the first two is what it does, and always has done (well, before “rich comparisons”, all earlier versions of Python’s sort went through the earlier 3-outcome, all-purpose, __cmp__ comparison, under which it wasn’t possible to violate trichotomy: the result of a compare was always exactly one of {less, equal, greater)).

tim.one · September 12, 2025, 6:17pm

Believe it or not, I’ve even seen failures of

assert (x == y) == (x == y)

User-defined functions may not even return the same results when doing the same comparison on the same objects.

I don’t think they ever intend to, but “stuff happens”. For example, in a “sophisticated” app with complicated comparison functions, it may be that, between compares, another thread mutates the state of one of the comparands. While CPython holds the GIL in the C part of sorting, calling user-defined comparison functions can invoke arbitrary Python code, which releases the GIL for the duration.

Expanding on the theme of possibly supplying “a bunch” of independent checkers,

assert (v1 == v2) == (v2 == v1)

is one of the 3 properties that needs to be checked to verity that __eq__ on its own implements an equivalence relation.

Does everyone need that? Probably not. It’s unclear to me whether Python’s sort ever relies on that - but it may.

tim.one · September 13, 2025, 1:42am

My primary goal is to flag cases where results may change across Python releases or implementations. Toward that end, establishing a total order is necessary, In the absence of trichotomy, total order doesn’t hold. That’s not “my issue” : it’s an issue for the user if that’s something they care about. Validation will never be mandatory;

Then they simply shouldn’t use a “total order” checker.

Which I’m also in favor of, going beyond the original post: I’d like to develop a suite of independent checkers, so users you could verify just the things they do care about. For example, I’ve sometimes wanted an automated way to (just!) check that my __eq__ defines an equivalence relation, by throwing mountains of randomly generated cases at it. I certainly like and value “proofs” - but they sometimes miss realities to overly simplified assumptions, and usually not consciously so.

Another collection of independent checkers would serve to validate “total order based on just looking at __lt__ and __eq__”, which collection a convenience function/method would package for easy invocation.

This gets a little more complicated when users then have something they think will be a stable sort, but is using a type like described above in their own comparison function, because the answers for the user to fix this may involve things like a user implementation looking like:
    def __lt__(self, other):
        return (self.x, id(self.x), self.y) < (other.x, id(other.x), other.y)
to get stability if whatever x is here falls into that category

annnd… despite that being an edge case I’ve seen and dealt with before, the first thing I wrote for it when not giving this my full focus was wrong if the user wanted a comparion using (x, y). I messed up. did you spot it? Below is fixed. has to be grouped so that id is only a tiebreaker on that element.
    def __lt__(self, other):
        return ((self.x, id(self.x)), self.y) < ((other.x, id(other.x)), other.y)

I’d call that broken, but not in a way we can detect. Compares relying in any way on the relative order of id()'s is 100% reliable within a single run, but may very well return the opposite result on the next run, even if the code to construct the objects does exactly the same stuff in both runs. But 100%-consistent-within-a-single-run may be all they actually care about. I certainly don’t want to second-guess them!

Verifying “total order” is the only thing the original post here cares about. People violating that try to sort at their own risk. If there are other things they care about, that’s fine, and I’d like to help them too. E.g., given a transitivity checker that’s not “too clever”, it’s trivial to pass - as an argument - the relation whose transitivity is being checked - operator.lt, operator.ge, operator.eq, operator.is_, a custom user-defined relation, … I just wrote one of those in a few minutes total, “perfectly general”. But also “horribly slow” .

tim.one · September 13, 2025, 7:14am

A seeming impossibility, for those who think sorting “should” preserve the original order of (truly) incomparable elements. Consider the list

[{1, 2}, {3}, {1}]

Call that [A, B, C].

A and B are incomparable, so A must precede B.

B and C are incomparable, so B must precede C.

C < A, so C must precede A.

But A must precede B must precede C must precede A isn’t possible (it’s obviously a cycle),. Any linear order must violate at least one of those.

Python, whether before or after the latest changes, violates the last, leaving the list unchanged. As far as Python goes, not B < A and not C < B means the list is already sorted.

The selection sort I mentioned before would move C to the start, violating “B must precede C”.

Although I’m finding different definitions of these terms, that apparently don’t agree in key respects. All sources agree that “is a proper subset of” is a strict weak ordering. But most sources claim that incomparability is transitive in such an ordering. But above A is incomparable to B, and B is incomparable to C, but A is comparable to C. Incomparability is not transitive.

I’m not skipping sleep again tonight to wrestle with ghosts .

Stefan2 · September 13, 2025, 8:38am

How does it make a difference? With some strange x and y values? Your “only a tiebreaker on that element” explanation makes no sense to me.

mdickinson · September 13, 2025, 9:35am

It’s true.

we already know < for sets meets the criteria for a strict partial order

Agreed, but not relevant here.

[{1, 5}, {1}, {3, 4}]

< on that collection does not give a strict weak ordering: transitivity of the incomparability relation fails. {1, 5} is incomparable with {3, 4} and {3, 4} is incomparable with {1}, but {1, 5} is not incomparable with {1} . So my “if you know < defines a strict weak ordering on a collection of Python objects” premise isn’t satisfied in this case, and the check I implemented will fail.

“strict partial order” and “strict weak ordering” are not the same thing.

I’ll drop out of this thread from this point on, I don’t think my participation is productive in the way I’d hoped it would be.

tim.one · September 13, 2025, 6:10pm

But turns out it is, although in a way I didn’t realize before your reply here. I have indeed been confusing your “weak” for “partial”. You have my apologies for that, but I can’t roll back the clock ,

See above. That’s obvious to me too now. Thanks! When incomparability is in fact transitive, which is not just a property of the relation, but also of the specific set of values it’s being applied to, to Python’s eyes all incomparable elements will appear to be pairwise equal, and so stability among them will be preserved by all <-only stable sorting algorithms.

Bingo! Why didn’t you just say so at the start?

Your input is always very welcome to me! Your posts in this topic have been highly valued by all involved.

This suggests another addition to my hypothetical collection of independent checkers: distinct ones for “strict weak” and “total”. I almost always intend to implement total orderings for purposes of sorting, and “strict weak” isn’t strong enough to ensure that.

tim.one · September 13, 2025, 9:01pm

For those not following details, this has been resolved. It’s precisely because “is incomparable to” is not transitive on this specific data that Mark’s check for “strict weak order” would not pass.

So nobody was actually claiming that Python’s sort “should” define a unique result for this specific input. It can deliver different results under different stable <-only sorting algorithms, and demonstrably does.

tim.one · September 14, 2025, 1:41am

Plugging a hole here, now that I finally understand what @mdickinson was saying at the start:
:

verify that random inputs (lists of sets) that do pass (@Stefan2’s latest “LT./EQ” version of) Mark’s “strict weak ordering” test deliver the same sorted result regardless of whether sorted by Python or by my radically different stable <-only selection sort.
verify that the “strict weak” test passed if and only if “incomparable” was transitive on the input list

The latter was trivial because I already wrote an all-purpose transitivity checker, and just passed in a one-liner implementation of “is incomparable to”. Sample output from that:

Transitivity failed at indices 6, 7, 2:
    A!<>B and B!<>C but not A!<>C
    A at 6 = {21, 3, 29, 0, 5, 27}
    B at 7 = {17, 12, 8, 16, 22}
    C at 2 = {29, 3, 27, 5}

!<> is just a name I made up for “is incomparable to”.

A minor surprise was that if inputs weren’t just short lists of sets from a small universe, it was rare to find lists of sets that passed the “strict weak” test. So this did a lot less actual sort-checking than I anticipated - it was mostly throwing away inputs that didn’t pass the test.

So I’m happy enough so far (no other surprises or failures), but would be happier still to generate millions of test cases that make it through to the very end. “strict” is very restrictive. But I’ll pursue that in silence .I should perhaps boost the minimum set size so that < is less common.

dg-pb · September 14, 2025, 4:42am

Removed repeated compares from @Stefan2’s function:

validate_sort_faster_edit code

def validate_sort_faster_edit(iterable, *, assume_sorted=False):
    """Returns True if data follows strict weak ordering else False"""
    if not assume_sorted:
        iterable = sorted(iterable)
    lt, eq = [], []
    it = iter(iterable)
    x = next(it)
    if x < x:
        return False
    eq.append(x)
    for x in it:
        if x < x:
            return False

        last = eq.pop()
        last_eq = last < x
        if last_eq:
            if x < last:
                return False
            lt.extend(eq)
            eq.clear()

        elif x < last:
            return False

        for y in lt:
            if not y < x or x < y:
                return False

        if last_eq:
            lt.append(last)
        else:
            for y in eq:
                if y < x or x < y:
                    return False
            eq.append(last)

        eq.append(x)
    return True

Optimal number of comparisons for already sorted data is exactly n^2. validate_sort_faster_edit matches this.

tim.one · September 14, 2025, 6:14am

Revisiting an old one in light of recent epiphanies;

The function I actually want for myself would return False, because I almost always want to ensure I defined a total order.

But @mdickinson is right that Python’s sorts will always leave that list alone. In fact, it will always leave any of the 6 permutations of that list alone.

There is no canonical “right answer”, but that’s not what I was originally asking about. I was asking about whether a new Python release could change what it returns,

And it won’t.. Because for this specific data, “incomparability is transitive”. From not x < y and not y < x, which is satisfied by all pairs taken from that specific data, Python infers x == y. Which isn’t true of Python’s __eq__ applied to them, but that doesn’t matter. Python “believes” they’re all equal because it can’t know any better (it can only consult __lt__)..

So Python’s sort must preserve their original relative order, to satisfy its own notion of “stability”. As far Python can tell, they’re all equal to each other, so it can’t change the order at all.

For other specific lists of sets, incomparability may not be transitive, and then his function will return False, and then Python’s sorts may indeed change the result they deliver. Like

[{1, 2}, {3}, {1}]

Python believes A==B and B==C and C < A for those, but currently leaves it alone (it currently sees only that not B < A and not C < B, and from those 2 compares alone concludes “it’s already sorted”). But that’s a detail of the current implementation. There’s nothing to stop a future implementation from looking at C < A first, and moving C before A.

That kind of paradox can’t arise in cases where incomparability is transitive.

tim.one · September 14, 2025, 6:48am

Done. That worked out well, and I killed the job after over 90 million randomized test inputs made it “to the very end”. No failures. Python’s current sort produced the same permutation on those as my novel stable insertion sort (with a very different structure) on every test case that passed the “strict weak order” test, which were all and only the test cases in which incomparability was transitive.

I’ll add that for test cases that didn’t pass the strict weak order test, different permutations were often produced across the two sorting implementations,

tim.one · September 15, 2025, 6:03am

Is that true? Mark’s initial writeup convinced me it was only necessary to check all index pairs with i \lt j, and that’s n(n-1)/2 to check. His first POC code actually checked n^2:

  return all(
        bool(x < y) == (ranks[i][0] < ranks[j][0])
        for i, x in enumerate(a)
        for j, y in enumerate(a)

but @Stefan2 cut that back. closer to n^2/2, with later code.

The code has gotten increasingly obscured, although I still found @Stefan2’s EQ/LT version quite easy to follow.. It’s not like this is going to be the core of an inner loop - it’s typically going to be far slower than sorting regardless, and hard to imagine it would become routinely used. The factor of 2 was worth getting, but I’m happy to stop with that.

Stefan2 · September 15, 2025, 7:52am

Did I? I think I stayed a little over n^2. I consider about n^2/2 pairs, but use two comparisons for each pair.

dg-pb · September 15, 2025, 10:36am

I think @mdickinson’s function can be brought to exactly n^2 by:

validate_sort

def validate_sort(a):
    ranks = []
    eq_index = 0
    x = None
    for i, y in enumerate(a):
        if i > 0 and x < y:
            eq_index += 1
        ranks.append(eq_index)
        x = y

    return all(
        (x < y) == (ranks[i] < ranks[j])
        for i, x in enumerate(a)
        for j, y in enumerate(a)
        if i - 1 != j
    )

Not 100% sure it is correct. Would be useful to run through @tim.one’s 1001 test cases.

franklinvp · September 15, 2025, 1:35pm

It depends on who you ask. For some the domain of the relation is part of what the relation is (the full name being “a relation R on X”), for some it isn’t (“the relation is the set of pairs that are related”), just like for some the co-domain of a function is part of the function and for some it isn’t, for some a function can be a partial function, for some a function knows its source/domain. In mathematical practice both non-equivalent definitions are used, and even some books start with one and then later inadvertently make claims that are only true for the other.

Probably it is good to make it clear whether the input is not just __lt__ (and think you mentioned __eq__), but also the list.

gerardw · September 15, 2025, 3:30pm

Fix the documentation. It’s both incomplete and wrong.

Add strict weak ordering to the glossary.

Correct the false statment While defining an __lt__() method will suffice for sorting in the built-in Functions. Add note to documentation that __lt__ must implemenet strict weak ordering for the Python sort methods to work correctly.

tim.one · September 15, 2025, 6:28pm

Good catch! I was indeed counting the number of pairs rather than the number of comparisons. The amount by which the number of pairs exceeded n^2/2 is of no practical importance to me. “Close enough” that it’s not worth any additional complexity to reduce.

tim.one · September 15, 2025, 6:39pm

Yup, it can be subtle, and inconsistent even within a single source. Worse, “Is a proper subset of” is a “strict partial order”, period. But is a “strict weak order” only with respect to a specific collection of sets. The meaning of “strict” is very different in those two contexts.