Helping people with "broken" sorting

mdickinson · September 11, 2025, 7:08am

Just to be clear, ensuring that a future <-based sort can’t produce a different result is exactly what the code I posted achieves (and was intended to achieve). I posted in a hurry and didn’t do a good job of spelling that out - sorry.

Briefly, the logic is:

if you know < defines a strict weak ordering on a collection of Python objects, then there is only one possible valid stable sort result for any given input list containing those objects, so it doesn’t matter how much the internal algorithm is tweaked in the future - so long as it continues to sample only < for its comparisons, it has no choice but to produce that same result
the code I posted exactly establishes that < does define a strict weak ordering (and as a by-product of that, establishes transitivity of <)

It’s trivial to extend this to also check that the user’s implementations of <= and == also agree with the strict weak ordering, if that’s desired.

dg-pb · September 11, 2025, 2:04pm

I like initial validating_insertionsort function.
It keeps possibilities open without restricting any of the features.
I would keep it in that form and just make use of it in validation-only function.

class _CVObj:
    def __init__(self, i, v):
        self.i = i
        self.v = v
    def __eq__(self, other): return self.v == other.v
    def __lt__(self, other): return self.v < other.v
    def __repr__(self):      return f'[@ initial {self.i}] {self.v!r}'


def validate_ordering(seq, *, raise_on_fail=False):
    arr = [_CVObj(i, v) for i, v in enumerate(seq)]
    try:
        validating_insertionsort(arr)
    except AssertionError as e:
        if raise_on_fail:
            raise
        return str(e)

Previous example with https://gist.github.com/dg-pb/355ccb4905e8c520a8bf68344f8aeaf4

AssertionError: Cycle detected: A<B<C and C<A
    A @ 0 = [@ initial 0] 1
    B @ 1 = [@ initial 1] 2
    C @ 2 = [@ initial 2] 3

tim.one · September 11, 2025, 6:09pm

I in turn haven’t been clear enough about what I’m after. “Result changed!” isn’t the only complaint I’m trying to address, although it’s the most serious one ("backward incompatible!"), and the only one I mentioned at the start.

And that’s the rub: your notion of “stable” is not typically theirs. You (strict weak ordering) consider two distinct elements to be “already in order” even if they’re not comparable via < in either direction. Users typically don’t. That’s “an academic distinction” to them. They typically expect sorting to deliver the same result regardless of how the input list is permuted, modulo runs of elements that all compare __eq__;. That’s the concrete Python spelling of “equal”, not a strict weak ordering’s notion of “equivalent”.

To be very clear, people get confused too by that

x = [{1}, {2}]
print(sorted(x) == sorted(reversed(x)))

prints False. Of course that example is so very minimal they can generally (not always!) figure it out, but in real-life cases it can be far more obscured that trichotomy (which they’re generally assuming, whether they know it or not) doesn’t hold.

It’s just historical accident that Python’s sort sticks to just <. The intended semantics were always that sorting is well-defined if and only if total ordering obtains, and that, when it does, the output is unique regardless of how the input is permuted, modulo runs of equal (__eq__) elements.

When total ordering does obtain, all of Python’s sorts, even the earlier unstable ones, produced the same result list from every permutation of the input list, where “the same” means

assert all(map(operator.eq,
               sorted(original_list),
               sorted(any_permutation_of_that_list)))

passes, and that the result is canonical even across all sorting implementations.

The introduction of stable sorting had no effect on that, neither restricting sorting to using just <. The introduction of stability allowed making stronger guarantees, but which can’t be expressed without using the is relation.

What I’m actually after also needs to establish trichotomy. As I mentioned earlier, Python’s sort assumes that not x < y and not y < x is equivalent to x == y, but doesn’t check for that because it promises to invoke only __lt__. Verification code isn’t so restricted.

Later code in this topic is even verifying things like that x == x is true.

For the subset of users who are happy with what strict weak ordering on its own guarantees, your code is fine! But I’m guessing that’s a small subset. From all I’ve seen, people are far more caught by consequences of failing to supply a total ordering, and that’s not just about changes across releases.

Of course sorting doesn’t require a total ordering, and never will. “Verification” is optional, for those who do want the stronger guarantees a total order can make. Which I believe is most people, most of the time.

And It would be possible to offer different kinds of verification for different kinds of orders, but “mission creep” often kills the original goal.

tim.one · September 11, 2025, 6:44pm

Although I’ve been taking a different approach: different class methods to check different properties, with no concern at all for speed, The most complicated code I have is producing error messages, which have a uniform structure because they’re all created by the same fancy method. The checking code is as dead obvious as is possible; e.g., my transitivity checker is cubic time, Sample outputs:

Trichotomy failed at indices 30, 11:
    none of A<B, A==B, and B<A are true
    A at 30 = {-1}
    B at 11 = {0, 1, 2, 3, 4}
Trichotomy failed at indices 30, 12:
    none of A<B, A==B, and B<A are true
    A at 30 = {-1}
    B at 12 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1 ... [66 chars elided]

Transitivity failed at indices 0, 1, 2:
    A<=B and B<=C but not A<=C
    A at 0 = 1
    B at 1 = 2
    C at 2 = 3
Transitivity failed at indices 1, 2, 0:
    A<=B and B<=C but not A<=C
    A at 1 = 2
    B at 2 = 3
    C at 0 = 1

Not meant for production use, but to keep it as easy as possible to create a suite of independet checkers that could be pasted together in different ways for different purposes. I’m still in the prototype stage.

Here’s the code, which may well still suffer from various typos:

code

class Checker:
    def __init__(self, a, *, reprlen=40):
        self.a = a
        self. reprlen = reprlen
        self.nerrors = 0

    def _err(self, tag, msg, ixs):
        ixlist = ", ".join(map(str, ixs))
        pieces = [tag, " failed at "]
        if len(ixs) > 1:
            pieces.append("indices ")
        else:
            pieces.append("index ")
        pieces.append(ixlist)
        pieces.append(':')
        print("".join(pieces))
        print(f"    {msg}")
        for name, ix in zip("ABCD", ixs):
            x = self.a[ix]
            r = repr(x)
            if len(r) > self.reprlen:
                excess = len(r) - self.reprlen
                r = r[: self.reprlen] + f" ... [{excess} chars elided]"
            print(f"    {name} at {ix} = {r}")
        self.nerrors += 1

    def check_reflexivw(self):
        for i, x in enumerate(self.a):
            if not x == x:
                sekf,_err("Reflexivity", "not A==A", [i])

    def check_trichotomy(self):
        from itertools import compress
        for i, x in enumerate(self.a):
            for j, y in enumerate(self.a):
                raw = [x < y, x == y, y < x]
                s = sum(raw)
                if s == 1:
                    continue
                if not s:
                    msg = "none of A<B, A==B, and B<A are true"
                else:
                    pieces = compress(['A<V', 'A==B', 'B<A'], raw)
                    msg = " and ".join(pieces) + " are true"
                self._err("Trichotomy", msg, [i, j])

    def check_transitivity(self):
        def le(x, y):
            return x < y or x == y

        for i, x in enumerate(self.a):
            for j, y in enumerate(self.a):
                if not le(x, y):
                    continue
                for k, z in enumerate(self.a):
                    if le(y, z):
                        if not le(x, z):
                            self._err("Transitivity",
                                      "A<=B and B<=C but not A<=C",
                                      [i, j, k])

Stefan2 · September 11, 2025, 9:04pm

@dg-pb I think you have a bug, reporting “Cycle detected: A<B<C and C<A” when A<B is false:

A, B = [], []
class C:
    def __lt__(self, other):
        return other is A
    def __eq__(self, other):
        return other in [A, B]
    def __gt__(self, other):
        return False
C = C()
print(validate_ordering([A, B, C]))
print(f'{A<B = }')

Output:

Cycle detected: A<B<C and C<A
	A @ 0 = [@ initial 0] []
	B @ 1 = [@ initial 1] []
	C @ 2 = [@ initial 2] <__main__.C object at 0x75c20f52ae40>
A<B = False

Wombat · September 11, 2025, 10:54pm

Would this cover all cases?

def validate_sort(a: list):
    classed = []
    eq_class = 0
    for x in a:
        if classed and classed[-1][0] < x:
            eq_class += 1
        classed.append((x, eq_class))
    
    for x, xc in classed:
        for y, yc in classed:
            eq = (x == y)
            lt = (x < y)
            gt = (x > y)
            assert lt + eq + gt == 1
            assert (x != y) == (not eq)
            assert (x <= y) == (lt or eq)
            assert (x >= y) == (gt or eq)
            assert lt == (xc < yc)

dg-pb · September 11, 2025, 11:00pm

Stefan Pochmann:

A, B = [], []
class C:
    def __lt__(self, other):
        return other is A
    def __eq__(self, other):
        return other in [A, B]
    def __gt__(self, other):
        return False
C = C()
print(validate_ordering([A, B, C]))
print(f'{A<B = }')

Thank you. It needs to report A<=B<=C instead.
Or rather not B < A and not C < B

tim.one · September 12, 2025, 12:01am

For what purpose(s)? As I said, this isn’t a business I want to get into, or spend time on. That you added a check for trichotomy certainly goes a long way toward satisfying what “just folks” expect, but that’s as far as I want to go on this sidetrack.

One thing that seems unlikely here: that it verifies __eq__ all by itself implements an equivalence relation (x == x; x == y implies y == x; and transitive). If not, it could go a way toward that end by adding

                assert eq == (xc == yc)

but now I’m slipping into a pit I was determined to stay out of .

tim.one · September 12, 2025, 12:18am

From the code I last posted:

Trichotomy failed at indices 0, 2:
    A==B and B<A are true
    A at 0 = []
    B at 2 = <__main__.C object at 0x00000207E2D12900 ... [1 chars elided]
Trichotomy failed at indices 2, 0:
    A<B and A==B are true
    A at 2 = <__main__.C object at 0x00000207E2D12900 ... [1 chars elided]
    B at 0 = []

Unfortunately for clarity, the names A and B in that output have nothing to with how @Stefan2 is using the same names.

Anyway, it doesn’t find any transitivity violations, just trichotomy failures.

tim.one · September 12, 2025, 4:03am

Actually, I’m not sure that’s true. LATER: it’s not true .

Here’s a list, and we already know <for sets meets the criteria for a strict partial order:

[{1, 5}, {1}, {3, 4}]

and

>>> sorted([{1, 5}, {1}, {3, 4}])
[{1}, {3, 4}, {1, 5}]

What do you think it “must” produce? That, or something else?

As far as Python is concerned, that is sorted, because a[1] < a[0] and a[2] < a[1] both return False. You may want to claim it violates stability, because it changes the relative order of the incomparable elements {1, 5} and ``{3, 4}`, but Python never intended to preserve the order of incomparable elements, and its sort has no code whatsoever to detect such a case. “Less than or not less than” are the only things it ever asks, and as few of those as it can get away with.

Now picture instead a selection sort sticking only to <. First it finds the minimum of a[:]. It starts with a[0],. {1, 5}. Then it asks whether a[1] is smaller. Yes! The new min is {1}. Then it asks whether a[2] is smaller. No. In fact they’re not comparable, but it doesn’t know that, just that a[2] < min returned False.

So it swaps the first two elements, changing the list to

[{1}, {1, 5}, {3, 4}]

Then it looks to find the minimum of a[1:]. Is a[2] < a[1]? No. So a[1] is already the minimum remaining, and the entire sort is done.

But it’s a different result than the current sort gave.

I didn’t out-think this, I just tried throwing random stuff at the current sort and an excruciatingly slow stable <-only selection sort, and was flooded with input lists that gave different results.

Why does Python produce the result it does produce? I really don’t know - slogging into the details is exceedingly tedious, and it’s always considered the lack of a total order to be a garbage-in garbage-out case. I suspect the recent change to liberalize the internal definition of “descending run” (which the example here starts with) to “non-increasing run” accounts for it. pypy, which does not yet have that change, matches the selection sort’s result.

In fact, that’s it. First

{1, 5}, {1}

is recognized as a strictly decreasing run. Then it tries to extend it for so long as it doesn’t increase. Neither of {3, 4} and {1} is < than the other, so they “must be equal”. The sort reverses them in-place, so that their original order will be restored when the entire slice is reversed at the end. Now we’re left with

[{1, 5}, {3, 4}, {1}]

and then the entire list is reversed in-place to give the final

[{1}, {3, 4}, {1, 5}]

Still just GIGO to me, though. It’s a non-decreasing order, and it did preserve the original order of the only two elements ({1} and {3, 4}) that showed a hint of being equal, given the subset of possible < compares it tried.

EDIT: this got pretty involved. Let me bottom-line the result: call

 [{1, 5}, {1}, {3, 4}]

 [A, B, C]

for convenience.

Python currently considers that as meeting A > B == C (the last one due to that neither B nor C is less than the other). Therefore [B, C, A] must be sorted (and preserves the order of B and C).

pypy consders that as starting with a strictly decreasing run of 2 elements, and reverses that in-place to give [B, A, C]. Then C is put into place with a binary insertion sort, whose details I didn’t bother to go into at all, but suffice it to say that C < x never returns true in this example so, it leaves C at the end.

The stable <-only selection sort ended up with the same result as pypy, but via a different chain of logic.

Not mentioned before: while pypy reflects an earlier version of CPython’s sorting code, there are plenty of cases in which it too produces different results than the selection sort. I’ll leave it to you to figure out one :

input:          [{4, 1}, {5}, {1}] 
cpyhon:         [{1, 4}, {5}, {1}]  
pypy:           [{4, 1}, {5}, {1}] 
selection sort: [{1}, {4, 1}, {5}]

All sticking to <, 2 different results. BTW, while the output looks different, CPython and pypy didn’t mutate the list at all (the order in which a set’s elements are displayed isn’t defined, and {4, 1} and {1, 4} are the same set).

Stefan2 · September 12, 2025, 7:18am

@dg-pb Something your code (with your updated gist) doesn’t catch: validate_ordering([x, y, z])) finds no issue for my case with x < z == y == x.

class X:
    def __lt__(self, other):
        return self is x and other is z
    def __eq__(self, other):
        return not self < other

x, y, z = X(), X(), X()
print(validate_ordering([x, y, z]))
print(x < z == y == x)

Output (Attempt This Online!):

None
True

Stefan2 · September 12, 2025, 7:26am

I think you miss it because when inserting y you find that x==y, but you don’t remember it. So when you then insert z, you find x<z and y==z and see nothing wrong. Mark’s way catches it, by remembering all “eq classes” and then checking against those.

tim.one · September 12, 2025, 7:33am

Output from my code:

Trichotomy failed at indices 2, 0:
    A==B and B<A are true
    A at 2 = <__main__.X object at 0x000001CBEED00050 ... [1 chars elided]
    B at 0 = <__main__.X object at 0x000001CBEEC12A50 ... [1 chars elided]

A there is your z (index 2), and B your x (index 0). So it’s complaining that both z==x and x<z are true.

This is why I want dirt-simple code at the start. “Premature optimization is the root of all evil” , and far as I’m concerned we’re only now just converging on what we’re actually trying to achieve.

Stefan2 · September 12, 2025, 9:03am

Yeah, I very much like the simplicity and clarity of Mark’s “all((x < y) is (xc < yc) for all pairs)”. I also wrote a faster version (3.5 times faster than my posted simplified version in my benchmark) but I didn’t like it.

that faster version in case someone's interested

It keeps the current eq class in one list and all prior (supposedly smaller) elements in another.

def validate_sort_faster(a: list) -> bool:
    LT, EQ = [], []
    for x in a:
        if EQ and EQ[-1] < x:
            LT += EQ
            EQ.clear()
        EQ.append(x)
        for lt in LT:
            if not lt < x or x < lt:
                return False
        for eq in EQ:
            if eq < x or x < eq:
                return False
    return True

Attempt This Online!

dg-pb · September 12, 2025, 12:01pm

Thanks for the issue.

But this is not the problem.
Or at least not the simplest way to mark it.

The check that is missing is:

assert (v1 == v2) == (v2 == v1)

Never seen this practice. Has anyone?

Now:

Non commutative equality. A == B and B == A differ
    A @ 2 = [@ 2] z
    B @ 0 = [@ 0] x

dg-pb · September 12, 2025, 12:56pm

@Stefan2 , one for yours:

class Y:
    def __init__(self, v):
        self.v = v
    def __lt__(self, other):
        return self.v < other.v
    def __eq__(self, other):
        return self is not other

x, y, z = Y(1), Y(2), Y(3)

print(validate_sort([x, y, z]))   # True

franklinvp · September 12, 2025, 1:11pm

They used stable as part of the assumptions of their statement. Under that assumption, and with respect to each input list, their statement (the entire original post, not just the quoted part) is correct. {1, 5} and {3, 4} being incomparable, should have stayed in the original relative position. In the version of Python that I used they did.

>>> L = [{1,5}, {1}, {3, 4}]
>>> sorted(L)
[{1}, {1, 5}, {3, 4}]

I remain uncertain of a unique goal, not because each person hasn’t explained what they want, but because there are small details like this that have an impact on what needs to be tested.

Stefan2 · September 12, 2025, 1:45pm

I’d say it is, since that’s the reason I wrote that case

If I fix __eq__, your updated gist still doesn’t find it:

class X:
    def __lt__(self, other):
        return self is x and other is z
    def __eq__(self, other):
        return not self < other and not other < self

x, y, z = X(), X(), X()
print(validate_ordering([x, y, z]))
print(x < z == y == x)

Attempt This Online!):

None
True

mikeshardmind · September 12, 2025, 1:45pm

you may run into some issues in the usefulness of trichotomy with real-world code and other parts of the data model here. Mutable comparable objects may intentionally leave __eq__ as object.__eq__ due to an interaction with __hash__, and rely on the fact that a partial order is always correct and a stable order isn’t needed.

In that regard, any generic comparison validator would have to be able to validate only the specific properties the author cares about.

This gets a little more complicated when users then have something they think will be a stable sort, but is using a type like described above in their own comparison function, because the answers for the user to fix this may involve things like a user implementation looking like:

    def __lt__(self, other):
        return (self.x, id(self.x), self.y) < (other.x, id(other.x), other.y)

to get stability if whatever x is here falls into that category.

mikeshardmind · September 12, 2025, 1:54pm

annnd… despite that being an edge case I’ve seen and dealt with before, the first thing I wrote for it when not giving this my full focus was wrong if the user wanted a comparion using (x, y). I messed up. did you spot it? Below is fixed. has to be grouped so that id is only a tiebreaker on that element.

    def __lt__(self, other):
        return ((self.x, id(self.x)), self.y) < ((other.x, id(other.x)), other.y)

Things like this definitely point towards some validation function being great to have, I just want to be sure we’re careful in considering some interactions here with other concerns.