A "sorted" that returns a generator (mine implemented with quicksort)

boniest · May 26, 2025, 6:25pm

this might be a terrible idea but i had it, so here it is.

quicksort is essentially a DFS whereas mergesort is essentially a BFS.

the use cases might be few and far between, and might already be served plenty well by heapq, but i guess there could be cases where a user wants to iterate over sorted items using a generator.

compare:

for item in sorted(items):
  work_on(item)

for item in sorted_generator(items):
  work_on(item)

the first creates a list by sorting every value, whereas the second just produces them one at a time.

here is a proof of concept: GitHub - boniest/quicksort_generator: returns a generator of sorted values using basic quicksort.

storchaka · May 26, 2025, 6:36pm

If you need to produce one item at a time, try to use a priority queue.

efimov-mikhail · May 26, 2025, 6:38pm

It seems that your code example assumes that items is mutable sequence, not an iterator. But for lists you can just use list.sort for in-place sorting.

tim.one · May 26, 2025, 7:08pm

It’s a fine idea! Although I only recall it coming up once before in some 30+ years .

heapq.nsmallest() is the closest it gets now. That’s efficient, but you need to know in advance how many items you want. Else a recursive generator building on partitioning is “a natural” approach

One caution: I got into the business of writing Python’s standard sorts over 3 decades ago,. because platform C implementations of qsort() proved to be endlessly problematic. Some weren’t thread-safe, some weren’t even reentrant, and almost all (a) could be tricked into segfaulting by supplying an inconsistent comparison function, and (b) were easily provoked into quadratic-time disasters.

So, enchanting as it is, plain quicksort isn’t suitable for a standard library. The newer pdqsort is worth looking at. It’s a quicksort at heart, but:

Has an extremely clever way to take advantage of masses of equal elements, turning them into exceptionally good cases.
Incorporates tricks to “break up” patterns that tend to throw other implementations into quadratic-time land.
Like introsort, falls back to using heapsort if partitions prove to be, despite its efforts, “too unbalanced”.

But that’s all “fine tuning” (albeit long-winded). The basic idea is good!

boniest · May 26, 2025, 7:30pm

thanks for the feedback everyone.

dg-pb · May 26, 2025, 8:09pm

Also, I think that such would need to be stable so to deliver results identical to sorted.

tim.one · May 26, 2025, 10:09pm

It’s easy to make a quicksort stable if you don’t stress about space. Here’s a version of the OP’s idea:

Does not mutate the input.
Accepts any iterable.
Is stable.
Is “pretty darn robust” against quadratic-time cases. This is done by using what the literature calls “niner” pivot selection: pick a random sample of 9 elements. Put them in 3 groups of 3 each. Find the median of each such group. Thun use the median of those 3 medians as the pivot.
Doesn’t use recursion, so even if a horrid case comes up, it won’t blow the stack.
Actually benefits from many duplicate values.

This is barely tested and not tuned at all - but it’s “almost obviously correct”

EDIT: note thay the standard Python sorts only use __lt__ (<) comparisons. This one also uses == and is. This is a way to ensure (even in the face of inconsistent comparison results) that partitioning always makes some progress, and greatly speeds cases with many duplicate values).

Code

def medianof3(x, y, z):
    if y < x:
        x, y = y, x
    if z < y:
        return x if z < x else z
    else:
        return y

def isorted(xs):
    from random import sample

    stack = [(False, list(xs))]
    while stack:
        alleq, xs = stack.pop()
        if alleq:
            yield from xs
            continue
        if len(xs) < 20:
            yield from sorted(xs)
            continue
        lt = []
        eq = []
        gt = []
        pivs = sample(xs, 9)
        pivot = medianof3(medianof3(*pivs[0:3]),
                          medianof3(*pivs[3:6]),
                          medianof3(*pivs[6:9]))
        for x in xs:
            if x < pivot:
                target = lt
            elif x is pivot or x == pivot:
                target = eq
            else:
                target = gt
            target.append(x)
        stack.append((False, gt))
        stack.append((True, eq))
        stack.append((False, lt))

from random import getrandbits
xs = [getrandbits(15) for i in range(1_000_000)]
assert sorted(xs) == list(isorted(xs))

Rosuav · May 26, 2025, 10:19pm

Is this a typo, or are you intentionally falling back on the default sorted() implementation here? You didn’t mention hybrid sorting in the description or I’d assume that this was fully intentional.

tim.one · May 26, 2025, 10:27pm

Deferring to sorted() for small slices is intentional, partly for speed (and the cutoff should almost certainly be larger than 20 for this purpose), and to ensure that the call to sample(xs, 9) doesn’t blow up. Every variant of quicksort in real life defers to something different for small slices (usually insertion sort).

Rosuav · May 26, 2025, 10:37pm

Thought it might be. Wasn’t sure though, since the two names were so alike ^[1].

they still are, though years have rolled over their heads ↩︎

dg-pb · May 26, 2025, 11:10pm

As long as it does not hold more than 1 copy of the list I guess all good.

What about current algorithm of sorted?
Could it be adjusted to iterator variation easily?
Is it a reasonable candidate for such?

boniest · May 26, 2025, 11:33pm

someone smarter than me will need to reply, but i think that since timsort is essentially mergesort, and since mergesort needs to go all the way to the final merge to locate the minimum value, a generator applied to the algorithm would not add much in terms of time saved.

but i could be way off.

tim.one · May 27, 2025, 12:43am

The code I posted makes a copy at the start, but effectively makes another as part of top-level partitioning into disjoint lt, eq. and gt pieces. That could be ameliorated by copying into a collections.deque at first instead, and popping “from the left” one element at a time while building the partitions But I don’t care enough to pursue it..

A mostly trivial wrapper, like

def sorted(iterable, **kws):
    xs = list(iterable)
    xs.sort(**kws)
    return xs

Could it be adjusted to iterator variation easily?

Very easily. Just change the last line above from return to yield from. But doing so usefully is a different question, to which the answer is “no - not easify, or even with major effort”.

As @boniest said, our mergesort has no way to know what the smallest element is until the final merge starts. There’s scant useful incrementalism to be found.

A variant of heapsort could also work, by building a heap not from elements, but from (element, original_index) 2-tuples. Decorating with indices is needed to ensure stability among equal elements. The iterator then just pops from that heap, one at a time. Very easy to code.

But … wrapping things in 2-tuples adds significant memory burden (64 bytes each, including alignment padding), and heap operations have horrid data locality. Making a list copy only costs 8 bytes per element (to hold the pointer in the C array implementing the list). No new objects are needed (except for the single new list object).

dg-pb · May 27, 2025, 12:56am

I really like the code you shared.
It is the fastest Pure Python sort code that I have seen.

Did a couple of adjustments to it:

did not use sample - it was killing performance. Instead took equally (well almost…) stratified points
Added a check for sorted parts - small cost, but looks good when when sorted list input doesn’t take long to run

Code

_EMPTY = object()


def quick_sort(arr):
    from random import sample
    stack = [(False, arr)]
    append = stack.append
    result = []
    extend = result.extend
    while stack:
        is_sorted, arr = stack.pop()
        if is_sorted:
            extend(arr)
        elif (n := len(arr)) < 20:
            extend(sorted(arr))
        else:
            lt, eq, gt = [], [], []
            lappend = lt.append
            eappend = eq.append
            gappend = gt.append
            pivot = sorted([
                arr[0],
                arr[n // 8],
                arr[n // 4],
                arr[n // 2],
                arr[n * 3 // 4],
                arr[n * 7 // 8],
                arr[-1]
            ])[3]
            x_lt = _EMPTY
            x_gt = _EMPTY
            lt_sorted = 1
            gt_sorted = 1
            for x in arr:
                if x < pivot:
                    if lt_sorted and x_lt is not _EMPTY and x < x_lt:
                        lt_sorted = 0
                    x_lt = x
                    lappend(x)
                elif x > pivot:
                    if gt_sorted and x_gt is not _EMPTY and x < x_gt:
                        gt_sorted = 0
                    x_gt = x
                    gappend(x)
                else:
                    eappend(x)
            append((gt_sorted, gt))
            append((True, eq))
            append((lt_sorted, lt))
    return result

Rosuav · May 27, 2025, 12:59am

Does this open your version up to O(n²) worst case runtime if given maliciously organized data?

dg-pb · May 27, 2025, 1:06am

If someone was aware of this, then yes - this could be exploited.

I guess the happy middle maybe could be pre-storing random numbers to be re-used.

Rosuav · May 27, 2025, 1:07am

Always assume that an attacker knows everything about your algorithm - including the nine random numbers that you reuse every time.

dg-pb · May 27, 2025, 1:07am

I meant draw random numbers at the beginning of each function call. Just re-use the same ones in all loops.

Rosuav · May 27, 2025, 1:22am

Ahh, that would be harder to attack, though I wouldn’t bet my life on its safety.

tim.one · May 27, 2025, 1:46am

There’s a reason variants of introsort ^[1] are considered best practice now. Quadratic-time cases just do happen, and can be disastrous when they do.

Even with all my experience in the area, I see I left a glaring infinite-time trap in the code I posted: if some yahoo creates a class with a __lt__() that always returns True, everything will end up in the lt partition, empty gt and eq partitions will be pushed on the stack, and it will go on indefinitely doing the same thing with the “new” lt partition.

So not really infinite time - it will eventually run out of RAM to hold the ever-growing stack .

Now that can be fixed easily by raising an exception if the eq partition is empty (with sane comparison functions, it must always contain at least the pivot). But there’s nothing to stop a sequence of bad picks for the “niner” pivot selection. That’s just “unlikely”.

There’s another kind of vulnerability I don’t take very seriously, but some researchers do: code so very malicious that it traps comparison attempts, looks at the state of the array, and with knowledge of your algorithm dynamically picks comparison results crafted to make your algorithm’s job as hard as possible.

In theory, any implementation of quicksort can be tricked into taking quadratic time (at best!) in the face of that. They “just” have to arrange to return the same result for every comparison with the pivot, so that partitioning accomplishes nothing useful. The more careful algorithms make a special case of the pivot, to guarantee that at least one element is eliminated by each partitioning pass..

There’s apparently nothing that doesn’t turn into a bottomless pit .

monitor progress and fall back to a worst-case n log n method instead if progress is poor ↩︎