Allow `sorted` to take list for `key` argument

dg-pb · November 29, 2025, 10:29pm

hprodh:

# special case : sorted-range, autosized, returns a 'permutation' type
idxs = sorted(range, keys=keys)  # o(NlogN)

# using permutation type for sorting without recomputing order
values1.sort(keys=idxs)  # o(N)
values2.sort(keys=idxs)  # o(N)

I very much agree that this would convenient.
However, such design has some costs:

Need new permutation type, for which this would be the only use case
sort implementation/API gets more intricate
It is not immediately obvious that sort does not do sorting. For example, some reviewer will end up scolding a junior coder for doing multiple O(nlogn), when in reality he wasn’t and was just privy to immediately non-obvious implementation detail

I think very similar can be achieved without drawbacks above, while still being sufficiently close to what you are pointing at:

def list.reorder(self, order):
    ...

# Then your suggested above becomes:
idxs = sorted(range(len(keys)), keys=keys)  # o(NlogN)

# using permutation type for sorting without recomputing order
values1.reorder(idxs)  # o(N)
values2.reorder(idxs)  # o(N)

This way, design is more modular and instead of putting more strain on sort we can develop a new method. And reordering in general has some scope to explore - order input has 2 fundamental forms, equal-sized block reordering, etc.

So I think this would be a very valid direction to take a look at, but I think it is best left as a separate endeavour.

dg-pb · November 29, 2025, 10:40pm

For now I assume that mutation is ok.
And have it implemented (using keylist arg): GitHub - dg-pb/cpython at sort_keys

Lucky for me, heroic efforts have already been made and I could just copy+paste.
Few things that I thought about along the way:

Branching on whether the reference to keylist is unique. In this case none of the “sick tricks” are needed. However, the complexity of branching itself didn’t seem justified as the approach needed for the case where reference is held outside handles it smoothly and efficiently.
Actually making a copy of keylist. Rejected as it might take twice as long as sorting (in case of already sorted list).
Factoring out 2 macros for DISABLE_LIST & REENABLE_LIST. But re-enabling has directives and couldn’t figure out a way to get more benefit than headache.

Will do few more iterations to make sure all error handling sequences are correct and if there are no more further objections at this point will issue PR.

In the meantime, it is very good time to raise any additional concerns.
Build a branch, see how it feels to use it (and if it doesn’t crash… ).

Stefan2 · November 29, 2025, 11:00pm

For me, the reason I wouldn’t mutate it isn’t that it would “hurt”, but that I’d accept any iterable, and it would be weird if it gets mutated sometimes (i.e., when it happens to be a list).

tim.one · November 29, 2025, 11:10pm

dg-pb:

def list.reorder(self, order):
    ...

# Then your suggested above becomes:
idxs = sorted(range(len(keys)), keys=keys)  # o(NlogN)

# using permutation type for sorting without recomputing order
values1.reorder(idxs)  # o(N)
values2.reorder(idxs)  # o(N)
This way, design is more modular and instead of putting more strain on sort we can develop a new method. And reordering in general has some scope to explore - order input has 2 fundamental forms, equal-sized block reordering, etc.

So I think this would be a very valid direction to take a look at, but I think it is best left as a separate endeavour.

Yes to separate, and yes to interesting

We could write a suitable reorder() function right now. It’s possible to do this in-place in linear time, by dynamically “chasing the cycles” in idxs, but offhand I think it requires len(idxs) extra bits to record which indices already hold their final content. This can require more than len(idxs) steps, but linear-time nonetheless.

Haven’t thought about whether idxs could be preprocessed in some way to cut the memory and/or time cost of applying it. If so, that might justify a distinct “permuyation” type to absorb the preprocessing cost just once.

Leaving aside that Python use cases are conspicuous by absence

dg-pb · November 29, 2025, 11:34pm

From my experience in-place re-ordering generally falls into 2 categories:

The permutation structure is known and it can be exploited (e.g. perfect shuffle)
The permutation structure is arbitrary or without clear solution

For (2) strict in-place is hardly possible.
Using bitarray to track seen items is as good as I could get to.

But I didn’t dig very deep, would be interesting to see if there is more to it.
I think this deserves a separate thread.

dg-pb · November 29, 2025, 11:59pm

Well, thinking about it, “hyper-generalisation” does not necessarily seem like a bad idea.
Doing it would provide some convenience for sure.

But there is performance trade-off which I am reluctant to give up.
I will just re-iterate one of key examples.

import random
from operator import itemgetter

def dict_sort(d, by_keys=True):
    # If mutates
    keys = list(d)
    values = list(d.values())
    if by_keys:
        values.sort(keylist=keys)
    else:
        keys.sort(keylist=values)
    # d.clear()
    # d.update(zip(keys, values))

def dict_sort2(d, by_keys=True):
    # If does not mutate
    keys = list(d)
    values = list(d.values())
    keylist = keys if by_keys else values
    # Note, wouldnt need to wrap in list here
    idxs = sorted(range(len(d)), keylist=list(keylist))
    reorderer = itemgetter(*idxs)
    keys = reorderer(keys)
    values = reorderer(values)
    # d.clear()
    # d.update(zip(keys, values))

N = 100_000
keys = random.sample(range(N), N)   # Case 1. Worst
keys = list(range(N))               # Case 2. Best
d = dict(zip(keys, range(N)))

So although this will have less and less impact as number of lists to be reordered grows, the above is the next complexity level, which will be more common case than say re-ordering 3 lists based on 1 keylist. And performance benefit is non-trivial:

./python.exe -m timeit -s "$S" "dict_sort(d)"   # 23 |  2 ms
./python.exe -m timeit -s "$S" "dict_sort2(d)"  # 45 | 10 ms

In short, taking in the final object which gets mutated allows user to take ALL theoretically available opportunities for optimal performance.
Anything else is trading performance in one place or another.

Now I appreciate that maximum performance is not the only measure.
But for this specific case, the cost of extracting it seems low enough that I am very much inclined to mine it properly.

I think leaving 2-5x performance boost (for the 2-list case above) behind without a very good reason would be unfortunate.

tim.one · November 30, 2025, 1:27am

I think this is too shallow to merit a different topic

I’ve done this many times, and it’s not hard, but the details are delicate.

The basic idea is that any permutation is a product of disjoint cycles, and every cycle can be permuted in place using just one element of temp memory.

But this time I asked ChatGPT-5 for its help in working out a solution to something I never finished before: do it without using any temp memory beyond that single element.

And the bot cracked it! Not at first. At first it gave me “a solution” that didn’t actually work at all unless passed the identity permutation.

I explained that to it, and it thought for a very long time. For a bot. It usually replies as fast as I can blink after hitting ENTER, but this time it was threatening to take an entire minute .

The solution it came up with is really quite clever: it mutates the index list in-place too as it goes along, and restores its original value in a loop at the end. Relying too on that Python ints are unbounded, so that its “just add n!” way of marking a slot as visited can’t blow up, and is reversible.

It’s a trick even @Stefan2 may approve of .

I asked it why it didn’t just negate the value instead, and it shamed me: “Because 0 == -0”. The student has become the master

def reorder_in_place(xs, ixs):
    n = len(xs)
    assert n == len(xs)
    assert sorted(ixs) == list(range(n))

    # Mark visited by adding n; original value is recovered with
    # modulo n.
    for start in range(n):
        if ixs[start] >= n:  # already visited
            continue

        current = start
        start_val = xs[start]

        while True:
            next_ix = ixs[current]      # in range(n)
            ixs[current] = next_ix + n  # mark visited
            if ixs[next_ix] >= n:       # already visited => cycle closes
                xs[current] = start_val
                break
            else:
                xs[current] = xs[next_ix]
                current = next_ix

    # Restore ixs to its original permutation
    for i in range(n):
        ixs[i] %= n

So there you go. In place, essentially no extra memory, and by construction is linear time (although that takes a tiny bit of easy proof, and assuming you comment out the asserts at the top).

Note: I’ve tested it on all permutations of lists of sizes from 1 through 10 inclusive.

dg-pb · November 30, 2025, 4:15am

Neat.

So that got sorted quickly. (no pun intended)

To fit (perfectly!) into Py_ssize_t can do idx = -idx - 1.

gh-142105: Add `keylist` argument to `list.sort` by dg-pb · Pull Request #142106 · python/cpython · GitHub

tim.one · November 30, 2025, 5:06am

Haha! Pun taken

I’ll note something else, though: I never would have written the code this way. It’s correct, but “sloppy” in a way that offends me. It’s fine to stop when “the next” slot has already been visited, but it’s much sharper to stop the first time we see the initial slot again. That’s the essence of what “cycle” means.. It turns out they’re the same stopping criterion because “it’s a cycle”, but crisper to me to make it explicit that it’s “cycle” were looking for, not the seemingly weaker “saw the slot after it before”.

So here’s the code again with that subtle change:

def reorder_in_place(xs, ixs):
    n = len(xs)
    assert n == len(xs)
    assert sorted(ixs) == list(range(n))

    # Mark visited by adding n; original value is recovered with
    # modulo n.
    for start in range(n):
        if ixs[start] >= n:  # already in a cycle
            continue

        current = start_ix = start
        start_val = xs[start]

        while True:
            next_ix = ixs[current]      # in range(n)
            assert next_ix < n
            ixs[current] = next_ix + n  # mark visited
            if next_ix == start_ix:     # closing the cycle
                xs[current] = start_val
                break
            else:
                xs[current] = xs[next_ix]
                current = next_ix

    # Restore ixs to its original permutation
    for i in range(n):
        ixs[i] %= n

EDIT: and for a micro-gain in speed, replace the tail end with

        assert all(i >= n for i in ixs)
        for i in range(n):
            ixs[i] -= n

That would work too. In context (listobject.c) it doesn’t matter, because we’re working with indices rather than byte addresses. A Py_ssize_t has some high-order bits to spare.

Stefan2 · November 30, 2025, 11:14am

That’s when the list idea is your starting point. Like I said, I’ve been thinking of this for a while as well, but with a keys parameter accepting an iterable. Just like sorted already accepts any iterable. So from my perspective, this restriction to list is rather a hyper-specialization.

Btw it also just feels very unusual to require a list. Are there any built-in functions or standard library functions that accept a list but no other iterables? (Other than list methods, of course).

Both of those implementations look quite unnatural. Could you also measure this one? (For proper comparison, do include the d.clear and d.update, both because that’s actually part of the real job so the speed difference you measured actually matters less, and because we have different d.update arguments.)

def dict_sort3(d, by_keys=True):
    items = sorted(d.items(), key=itemgetter(0 if by_keys else 1))
    d.clear()
    d.update(items)

Yeah, I’ve done/seen that a few times, at least at LeetCode, maybe even for this task of applying a permutation. Though maybe with negative values (i.e., ~next_ix).

Maybe. Not if the caller has the index ints not just in that argument list but also somewhere else, in which case your new int objects can’t take the space of the original ones.

dg-pb · November 30, 2025, 11:34am

Setup

import random
from operator import itemgetter

def dict_sort(d, by_keys=True, convert_to_dict=0):
    # If mutates
    keys = list(d)
    values = list(d.values())
    if by_keys:
        values.sort(keylist=keys)
    else:
        keys.sort(keylist=values)
    # d.clear()
    # d.update(zip(keys, values))
    if convert_to_dict:
        dict(zip(keys, values))

def dict_sort2(d, by_keys=True, convert_to_dict=0):
    # If does not mutate
    keys = list(d)
    values = list(d.values())
    keylist = keys if by_keys else values
    # Note, wouldnt need to wrap in list here
    idxs = sorted(range(len(d)), keylist=list(keylist))
    reorderer = itemgetter(*idxs)
    keys = reorderer(keys)
    values = reorderer(values)
    # d.clear()
    # d.update(zip(keys, values))
    if convert_to_dict:
        dict(zip(keys, values))

def dict_sort3(d, by_keys=True, convert_to_dict=0):
    items = sorted(d.items(), key=itemgetter(0 if by_keys else 1))
    # d.clear()
    # d.update(items)
    if convert_to_dict:
        dict(items)

N = 100_000
keys = random.sample(range(N), N)   # Case 1. Worst
# keys = list(range(N))               # Case 2. Best
d = dict(zip(keys, range(N)))

./python.exe -m timeit -s "$S" "dict_sort(d)"   # 23 |  2 ms
./python.exe -m timeit -s "$S" "dict_sort2(d)"  # 45 | 10 ms
./python.exe -m timeit -s "$S" "dict_sort3(d)"  # 41 |  8 ms

./python.exe -m timeit -s "$S" "dict_sort(d, 1)"    # 24 |  7 ms
./python.exe -m timeit -s "$S" "dict_sort2(d, 1)"   # 46 | 14 ms
./python.exe -m timeit -s "$S" "dict_sort3(d, 1)"   # 42 | 12 ms

picnixz · November 30, 2025, 12:11pm

I already posted on the issue and will post here once again for visibility (I’m not participating in this discussion as I don’t have time), but changes on builtins usually require a PEP. See PEP 618 – Add Optional Length-Checking To zip | peps.python.org for a precedent where we added the strict keyword.

dg-pb · November 30, 2025, 12:28pm

Just to note, I don’t think that accepting iterable is bad idea

The closest one is random.shuffle - it modifies sequence in-place.
It accepts any mutable sequence.
But it is pure Python - so it is a bit different story.
If it was coded in C, I think it would be very likely limited to list as well.
And it is not a method of list.

But how many of them are there including list (byearray / array) methods?
Few basic ones there and there.
But the pool of in-place sequence operations (especially more involved algorithms) is fairly limited.
In other words, I don’t think there is a clear protocol in this area to offer any strong guidance.
Proof being absence of `seqtools` (`alglib` / `seqlib`).

What I am trying to say is that I think it is unfair to say that that none of the fruits are yellow when pretty much all the fruits are apples.

Thus, I am weighing what I have:

Performance
Flexibility
Ease of use
Completeness
Simplicity of implementation

Performance gains are non-trivial, so they do have a fair amount of weight.
Especially when the cost is simply needing to wrap the argument in list in cases of arbitrary iterable or mutation prevention.

This is regarding allowing any iterable.

Allowing any mutable sequence could be a good idea, but this would multiply the complexity and I don’t think this case is a good starting point to start introducing generic sequence algorithms in C.

From POV of list.sort it is in-place operation.
And the algorithm resides in list object, so I think this counter POV is at least as valid.
There is no solution to satisfy all POVs here.

It is a method of list, which accepts list argument.
A method itself mutates the main list and also mutates keylist argument.

I don’t think restricting to list is “hyper-specialization”.
Same way that I don’t actually think accepting any iterable is “hyper-generalisation”.
I am simply weighing pros and cons.

Stefan2 · November 30, 2025, 1:01pm

I’m not thinking about cost there. What I’m thinking is rather “Why the heck should I, the user, make my iterable a list? If your implementation needs a list, then make one yourself!”

I guess we just have different objectives. I’m coming from the perspective like that Stack Overflow question, where they just wanted to sort one thing by a corresponding other thing. Nothing about wanting to get that other thing sorted as well. Or like I think @tim.one put it earlier, the way I’ve always been thinking about it is simply a small variation of the key parameter, when instead of computing keys on the fly, you already have them somewhere.

Then maybe this would also be good:

def dict_sort4(d, by_keys=True):
    items = sorted(d.items(), key=partial(next, iter(d if by_keys else d.values())))
    d.clear()
    d.update(items)

That’s again a hack, but the speed might be good, and the non-hack version should be even better:

def dict_sort5(d, by_keys=True):
    items = sorted(d.items(), keys=d if by_keys else d.values())
    d.clear()
    d.update(items)

I’m not equipped to implement that (I’m only on a phone), but I imagine this would rival your dict_sort’s speed.

dg-pb · November 30, 2025, 1:24pm

I completely get it.
It just that doing it obstructs both performance and other case, where both lists need to be sorted.

sorted is a wrapper on top, which has implemented convenience on top of list.sort.

As already mentioned, could do divergence - take list for list.sort and iterable for sorted, but it is not an option as sorted and list.sort have become interchangeable by now and I think wrath from users will be much bigger than the one which forces them to type list(iterable).

So what is the user experience?

sorted(a, keylist=b)       # TypeError: 'list_iterator' object is not a list
sorted(a, keylist=list(b)) # ok

Result: 6 characters more than ideal (for this specific problem), but there are more problems in play and this is a compromise.

What else can be done?
All taken into account, what would you do? And why?

This is without converting back to dict in the end (dict case is a specific application of a more general 2-list problem anyway):

./python.exe -m timeit -s "$S" "dict_sort(d)"   # 23 |  2 ms
./python.exe -m timeit -s "$S" "dict_sort2(d)"  # 45 | 10 ms
./python.exe -m timeit -s "$S" "dict_sort3(d)"  # 41 |  8 ms
./python.exe -m timeit -s "$S" "dict_sort4(d)"  # 42 |  9 ms
./python.exe -m timeit -s "$S" "dict_sort5(d)"  # 37 |  8 ms

pf_moore · November 30, 2025, 1:38pm

The question that needs answering, though, is whether the use cases need either performance or sorting of the key list.

There are many variations of sorting that might be useful. As @tim.one has pointed out in a few cases now, lack of real-world use cases for most of them is an important guide to what’s worth implementing.

The use cases we have, as @Stefan2 pointed out, are mostly “I want to sort this sequence based on the values in this other one”. Performance isn’t an explicit requirement - no-one is ever going to object to performance, but there’s no indication that people will accept a bad (for their use case) UI just to get a bit of extra performance. Similarly, there’s no explicit demand that the key list doesn’t get mutated, but the problem statement suggests that people aren’t expecting it to be mutated.

We need more data to judge the correct trade-offs here.

Sorting just lists, and mutating the key list, seem like two plausible ways to squeeze the best performance out by deliberately restricting the problem domain. But if we lose too many use cases by restricting the problem domain that much, do we still have sufficient justification for the feature?

Stefan2 · November 30, 2025, 1:47pm

Yeah, I get that, that’s what I meant with us having different objectives. You have that other case in mind, which I had never thought of.

Probably what I’ve been doing already. Sit on the idea and do nothing, expecting it to not have a chance anyway

It’s a real (?) use case that others talked about, though. Two of the three cases you showed in your first post as background. And the “third” one you showed didn’t talk about getting the keys list sorted, either. Have you seen cases where they did? (Maybe you showed some and I forgot..)

But thanks for the measurements anyway, even though all of mine suffer from building lots of tuples so it’s not really comparable.

dg-pb · November 30, 2025, 2:01pm

With final conversion to dict:

./python.exe -m timeit -s "$S" "dict_sort(d, 1)"    # 23 |  7 ms
./python.exe -m timeit -s "$S" "dict_sort2(d, 1)"   # 46 | 14 ms
./python.exe -m timeit -s "$S" "dict_sort3(d, 1)"   # 42 | 12 ms
./python.exe -m timeit -s "$S" "dict_sort4(d, 1)"   # 42 |  9 ms
./python.exe -m timeit -s "$S" "dict_sort5(d, 1)"   # 37 |  7 ms

I have seen these. Mostly it goes along the lines of:

def foo():
    keys = []
    values = []
    for _ in something:
        keys.append(a)
        values.append(b)
    values.sort(keylist=keys)
    return keys, values

Currently, I am mostly collecting both a and b into one list of pairs, then using key=itemgetter(0). But I would transform into above for 2x (actually even more as in comparison to dict case it skips both input list extraction from dict and final transformation back to dict.

dg-pb · November 30, 2025, 2:22pm

Well the thing is that performance is quite a heavy motivation of this.
If performance does not improve much then I, personally, would not bother as everything can be done with how it is now in one way or another.

And there are several ways for doing each thing (as can be seen in benchmarks comparing numerous alternative) and none of those are straight paths to performance.
I keep forgetting which one to use to get best performance and in the end none of them what they should be.

So although this does remove the need for seq.__getitem__ to do half decent argsort:

indices = sorted(range(len(keys)), keylist=list(keys))

The biggest weight is still performance here.

Yup, this works well, and IMO performance is exactly the weak spot of current situation when the problem involves key function application to whole list.

With this I don’t agree.

On the contrary, this API covers the whole domain with maximum performance by delegating one small step to the user.

If you need to do argsort, currently:

indices = sorted(range(len(keys)), key=keys.__getitem__)

Oh but wait, keys is arbitrary iterable, maybe iterator or immutable sequence, so really:

indices = sorted(range(len(keys)), key=list(keys).__getitem__)

And after this performance oriented upgrade one can do:

indices = sorted(range(len(keys)), keylist=list(keys))

So are we loosing a use case?

I don’t think we are loosing any use cases.
Some of them are just not as perfectly convenient as if they were if this was “purely API oriented” extension.

But “purely API oriented” extension for this highly performance oriented part would hardly make it. The most probable response would be - “just make a wrapper for your use case”.

So in a way the fact that this also upgrades API is a bonus.

I am gladly willing to spend more time on this making sure there aren’t better paths forward for this.

But all taken into account, to me, this looks quite good as it is.
I mean, double sorting algorithm performance. (ok ok, only when keyfunc is involved and not in all cases, but still… )
People spend years inventing 10% faster sorting algorithm in C++…

I will sit more on this as well.

I appreciate that this isn’t ideal from all angles.
Maybe there are ways to make this better that I can not see yet.

tim.one · November 30, 2025, 7:23pm

Some follow-ups on permute-in-place:

@Stefan2, yes, the Python code can require extra memory for the new int objects it creates, under CPython. But a C implementation would work with native machine ints instead. I believe PyPy also would (it dynamically switches to an invisible-to-us subclass of list specialized to “small enough” ints, which is a big reason for why it can be much faster and consume less RAM).
We could end the practicality-vs-purity API argument by starting list.sort along these lines. I have no personal interest in catering to things that can be bashed into a list, but it would take less time to write code to just bash them when needed than to endure endless debate about it .

class list:
     ...
    def sort(self, /, *, key=None, keylist=None, reverse=False):
        n = len(self)
        if keylist is not None:
            if keylist is not a plain list object:
                keylist = list(itertools.islice(keylist, n))
                # And because the "heroic efforts" needed to protect
                # against# external mutation have a very cheap
                # _implementation_, the rest of the code should not
                # be complicated to try to "exploit" that this new
                # keylist is visible only to us. Keep it simple.
                # Neither is it worth any effort to avoid the expense
                # of islice when possible - it they're not passing in
                # a plain list to begin with, they don't care about
                # speed.
                
            if len(keylist) != n :
                raise ValueError(,,,)

For multiple applications of an index permutation, preprocessing of the permutation would pay: a function to deduce the cycle structure just once, and then another function that could be used repeatedly to apply the result of that any number of times to reorder any number of lists. The cycle structure can be represented easily as, e.g., a tuple of tuples of ints. To apply the cycles, we could even spin up a number of threads equal to the number of cycles and permute them all in place simultaneously .
One more iteration of the reorder-in-place prototype code. No change in logic, but renaming things and using enumerate() to improve clarity:

reorder_in_place code

def reorder_in_place(xs, ixs):
    n = len(xs)
    assert n == len(xs)
    assert sorted(ixs) == list(range(n))

    # Mark visited by adding n; original value is recovered with
    # modulo n.
    for dst_ix, src_ix in enumerate(ixs):
        if src_ix >= n:  # already in a cycle
            continue

        start_ix = dst_ix
        start_val = xs[start_ix]

        while True:
            src_ix = ixs[dst_ix]      # in range(n)
            assert src_ix < n
            ixs[dst_ix] = src_ix + n  # mark visited
            if src_ix == start_ix:    # closing the cycle
                xs[dst_ix] = start_val
                break
            else:
                xs[dst_ix] = xs[src_ix]
                dst_ix = src_ix

    # Restore ixs to its original permutation
    assert all(i >= n for i in ixs)
    for i in range(n):
        ixs[i] -= n