Allow `sorted` to take list for `key` argument

dg-pb · November 30, 2025, 8:17pm

Implementation has addressed this.
For mutable keylist the implementation is complete.

tim.one · November 30, 2025, 8:25pm

Although shuffle’s docstring: “Shuffle list x in place, and return None.” Heh - original use cases persist forever in docs

If it had been coded in C, yes, I believe it would have been restricted to lists. It’s far faster to do C-kevel indexing directly on a list’s underlying C array of PyObject * than to use the abstract sequence C APIs for indexing, and shuffling does a whole lot of indexing.

dg-pb · November 30, 2025, 8:37pm

This is what I am reconsidering.

Leave builtin sorted alone and just do mutable version for list.sort(..., keylist=None).
With clear motivation of performance.
It did originate mostly from the angle optimization so leaving sorted out, would reflect the intent better.

And leave sorted for later, where 2 paths can be taken:

If people get used to keylist and grow to like it, then propagate it as it is `sorted(…, keylist=None)
If they don’t, then sorted has already diverged to some degree (and in exactly the same way), so just as well can do a differently named arg for generic iterable:

def sorted(iterable, *, key=None, keys=None, reverse=False):
    if keys is not None:
        keylist = list(keys)
    values = list(iterable)
    values.sort(keylist=keys, reverse=reverse)
    return values

I quite like (2). While list.sort can be thought of as in-place method, sorted is a builtin convenience, where in comparison none of the arguments are mutated. And to make a distinction, argument names for explicit key-list / key-iterable of keys are different.

So I am currently at:

def list.sort(self, *, key=None, keylist=None, reverse=False):
    ...

And leave sorted out of this (mainly performance driven proposal), while keeping both options viable and open for sorted.

tim.one · November 30, 2025, 9:16pm

Having the signatures of .sort() and sorted() diverge is a dubious idea on the face of it. There’s another kind of compromise: copying a list is bound to be much faster than mapping even an identify function (lambda x: x) across the list, so always make an internal copy of a keylist. It’s the function call with potentially unbounded expense, and the Big Savings come from a way to say “I already know the sort keys - don’t burn cycles to compute them”.

Then, e.g., even merely iterable (not just sequence) arguments are fine everywhere.

You lose the ability to do “two for one” mutating sorts, but that wasn’t attractive to me to begin with. Just a consequence of the most obvious implementation.

If Python people want argsort()-like functionality (an easy way to permute multiple lists in the same way), better to make that a distinct issue, and leave it out of this one. I never saw much value in making just the “exactly one additional list” case easy while leaving “more than just one additional” untouched.

In short: list..sort() and sorted() then both work exactly the same way, neither ever mutates a keylist argument (because both work on a shallow copy), and everything is uniform and easy to explain.

Ya, I’m annoyed too by the extra memory burden of making a shallow copy of a keylist. But just annoyed. That’s better than making tradeoffs I’d be more appalled by .

pf_moore · November 30, 2025, 9:18pm

I like the idea of restricting this to list.sort. When thinking about the signature, it occurred to me that the point about keylist getting mutated becomes visible, because we’d presumably document the effect on keylist? I’m thinking something like:

keylist: A list of values to be used as keys for the sort, one per list item. As well as the target list, the keylist will also be sorted as a result of the sort operation.

Or is the intention not to document what happens to keylist, but just to say that it may be mutated (in some unspecified way)?

Personally, I’d be much more comfortable if the effect was documented. I can’t quite say why that feels so much better, but I think that at least in part it’s because well-defined behaviour doesn’t feel as difficult to deal with.

tim.one · November 30, 2025, 9:29pm

It’s more the case that the keylist is sorted, and the target is permuted in the same way. The target just comes along passively for the ride. Any plausible implementation would do exactly that, so, ya, that the keylist will be sorted would be documented.

In a version that does mutate the keylist. Else the docs would say that the keylist is not mutated at all (although a shallow copy of it would be under the covers).

dg-pb · November 30, 2025, 9:43pm

Of course, I am in strong agreement that keylist, which gets mutated needs to be very clearly documented.

So let’s see. Github use cases:

/\bsorted\(/ Language:Python - 6M
/\.sort\(/ Language:Python - 2.7M

Thus, I think @Stefan2’s and @pf_moore’s concern is very valid.
sorted is both more commonly used top level builtin function and is an entry point to newcomers.
Thus, making it do (relatively) complex thing of mutating one of its keyword arguments in-place is risky!

At the same time, copying a list can still be relatively expensive procedure in the context of highly optimized sorting algorithm, especially in very favourable cases.

The following is starting to grow on me:

def sorted(iterable, /, *, keys=None, key=None, reverse=False):
def list.sort(self, /, *, keylist=None, key=None, reverse=False):

New user comes:

# I need to sort something
sorted(a)

# I need to get sort indices
sorted(range(len(a)), keys=a)

No issues, no pitfalls, no side-effects.

Then the user gets a bit more crafty and finds out that list.sort can be more efficient if already have a list, but needs to understand that it is in-place:

# I need to sort a list
a.sort()

# I need to sort indices based on keylist
indices.sort(keys=a)    # TypeError('no keys argument')

# Need to read the docs
# there is no `keys` argument, but there is `keylist`
# the argument is different and can be clearly seen
# that it needs to be a list which is mutated
# in parallel with the main list
indices.sort(keylist=a)

Further on, different names: keys and keylist is a continuous reminder of the difference.
In the same way: sorted(iterable and list.sort is a reminder that one makes a copy and the other modifies in place.

But is it that bad?

I think the parallel of this is fairly close to already existent divergence of sorted and list.sort.

# Nothing mutates-------------
#           |                 \
sorted(iterable1, /, *, keys=iterable2)

# Everything mutates----------------
#           |                       \
list.sort(self_list, /, *, keylist=keys_list)

Or how they actually are being used:

# Nothing mutates---------
#           |             \
sorted(iterable1, keys=iterable2)

# Everything mutates-------
#     |                    \
value_list.sort(keylist=key_list)

dg-pb · November 30, 2025, 10:00pm

In case of in-place sorting copying the list can double the runtime:

values = list(range(100_000))
%timeit values.sort()     # 370 µs
%timeit list(values)      # 340 µs

tim.one · November 30, 2025, 10:02pm

dg-pb:

I think the parallel of this is fairly close to already existent divergence of sorted and list.sort.

# Nothing mutates-------------
#           |                 \
sorted(iterable1, /, *, keys=iterable2)

# Everything mutates----------------
#           |                       \
list.sort(self_list, /, *, keylist=keys_list)

I’m not sure I follow, but to the extent that I may be following: allow keys= in both to mean “this is an tierable of sort keys, and if it’s a mutable sequence you must not mutate it, most efficient is to pass a list”.

If it’s thought necessary to “optimize” beyond that, the .sort() method alone could grow an additional keylist= argument to mean “this is a list of sort keys, will be sorted in-place, and the list object will be permuted in-place in the same way”.

dg-pb · November 30, 2025, 10:04pm

sorted has keys: Iterable - it is high level builtin with no “side-effects”
list.sort has keylist: list - it is in-place sorting method of list object where both list and keylist are modified in-place

tim.one · November 30, 2025, 10:11pm

Of course it can, but I don’t care . list.sort() is already giving users enormous speedups in such highly favorable cases. While I wouldn’t do so on purpose, I really wouldn’t care it Python ran them, say 5x slower. They’d still be so very much faster than O(n log n) that I’d remain highly grateful .

tim.one · November 30, 2025, 10:16pm

That’s what I guessed, so doesn’t change what I said about it. There’s no reason to presume that “no side effects on the keys” isn’t also of interest to .sort() method users, giving keys= arguments to both lets them switch back and forth with no pain or surprises.

The desire for optimization extremes probably is confined to .sort() users, though (if they used sorted(), they don’t care about creating a new “just as big” list). Fine, give them a different keyword to use just as restrictive as necessary to achieve extreme optimization.

dg-pb · November 30, 2025, 10:16pm

This isn’t the main penalty. The main penalty is for the case where both values and keys need to be sorted. If keylist is also sorted in-place:

values.sort(keylist=keys)

This will be at least 2x faster than any alternative even for O(nlogn).

import random as rand
N = 100_000
keys = rand.sample(range(N), N)
values = list(range(N))
from operator import itemgetter

def foo1():
    list(values).sort(keylist=list(keys))

def foo2():
    indices = list(range(N))
    indices.sort(keylist=list(keys))
    reorderer = itemgetter(*indices)
    reorderer(keys)
    reorderer(values)

./python.exe -m timeit -s "$S" "foo1()"     # 21 ms
./python.exe -m timeit -s "$S" "foo2()"     # 43 ms

dg-pb · November 30, 2025, 10:33pm

Ok, so there is a common factor here:

list.sort(self, /, *, key=None, keylist=None, reverse=False):
    ...

keylist argument which gets mutated as part of in-place sorting.

This wouldn’t affect sorted.
And sorted(..., keys: Iterable=None) extension can be left for later either with or without extra argument to list.sort.

tim.one · November 30, 2025, 11:44pm

If this has to go through a PEP process (as @picnixz suggested), I don’t expect it can get accepted if it tries to change .sort() alone without also changing sorted(). Just a fact of life: “consistency” plays a huge role for them. Divergence without truly compelling cause is probably a non-starter.

Truly compelling use cases might do the trick. I don’t see them. numpy’s argsort() is precedent but is aimed at permuting any number of lists in the same way. You’re focused on the “two-for-one” case and no longer same to care about “more-than-one additional” cases. You did care, though, for as long as it took to realize that Python isn’t Pike .

As a plain user, I just won’t be happy until I can write, for a dict d:

keys_ordered_by_values = sorted(d, keys=d.values())

I don’t want to use .sort() for this. I don’t want to materialize a (useless to me) list of insertion-order keys first. I don’t want to use key=d.__getitem__ instead. I’d be willing to spell it list(d.values()) instead if I must, but don’t really want to. And I couldn’t care less about the sorted list of values.

And I suspect that specific use case is one many people have had in real life. Using key=d__getitem__ is likely the fastest way now, but seems beyond what most Python programmers already know (we here aren’t typical).

You’re focused on peak speed “too much” for the SC’s tastes, seemingly driving every decision now. I accepted that at first, but mostly because the simplicity of implementation was compelling in its own way.

But my own most plausible use cases mostly don’t care whether the list of keys gets visibly sorted too, and the ones that do are better suited to an argsort() approach (sort any number of lists in lockstep, such as sorting a 2D array in list-of-column-lists format by a specific column’s values - or in list-of-row-lists format by a specific row’s values).

Making tasks easer for non-elite Python programmers would score points with the SC.

dg-pb · December 1, 2025, 12:01am

That, as I understood, was largely conditional on the fact that it modifies argspec of builtins.sorted.

Does it need a PEP if this is limited to list.sort?

Many methods of builtin types were added without a PEP.

If no, then this can be framed as initially intended - “fastpath of list.sort for power users and more performant backend to future API extensions related to sorting”.

Personally, I have come full circle to exactly what was initially proposed.
And I think it is best to not try to creep into builtins.sorted at the same time.
This is largely performance related and solution for optimal performance is unambiguous.

As long as it does not obstruct future extensions, I think this might be a good idea.
And keylist argument will not have any clashes as it is clear by now that Iterable input is a preferred option for sorted and such argument will not be .*list.* (as it will not be list).
(Or if it will be in-place modifiable list, then also no problem - can just use the same as it will be the same)

Once this is available, all of the following:

def sorted(values, keys=None):
def argsort(keys: Iterable):
def sort_mapping(d: Mapping):
def parallel_sort(keys, *values):
...

have faster implementations regardless of whether these are:

CPython extensions
Wrappers written by library developers
Wrappers written by end users

tim.one · December 1, 2025, 2:29am

I suggest spending more quality time with a chatbot . Here’s what GPT-5 told me. which seems spot-on in all respects:

ChatGPT-5:

The boundary between “built-in functions” and “methods of built-in types” isn’t crisply spelled out in the PEP process, but in practice the Steering Council has treated any user-visible change to the Python language or standard library API as requiring a PEP if it’s non-trivial.

Why list.sort() is in the gray zone

list is a built-in type, part of the language’s core data model.

Its methods (.sort, .append, etc.) are not “built-in functions” per se, but they are semantically equivalent in terms of impact: they define the behavior of one of Python’s most fundamental objects.

Adding an optional argument to list.sort() changes the public API of a core type. That’s exactly the kind of thing the Steering Council wants to see justified in a PEP, because it affects every Python user and every implementation (CPython, PyPy, etc.).

Bottom line

If you want to add a new optional argument to list.sort(), you almost certainly need a PEP. The Steering Council tends to err on the side of requiring one for any change to built-in types, because:

It ensures cross-implementation consistency.

It gives the community a chance to weigh in.

It avoids “silent accretion” of features that fragment the language.

Skipping the PEP would likely mean your patch gets rejected outright, not because the idea is bad, but because the process wasn’t followed.

We then had quite a long conversation about which kinds of things get accepted. We quickly reached agreement that “purely for speed in niche cases” doesn’t have much chance. But it very much liked my idea of trying to “sneak speed in” as a kind of Trojan Horse, under the guise of making things like getting a list of a dict’s keys sorted by their associated values easier & clearer for plain old users, not just faster.

keys_ordered_by_values = sorted(d, keylist=d.values())

is pretty much the “most Pythonic” spelling imaginable, using perfectly general mechanisms in more-than-less obvious ways, and using mostly self-descriptive names. But it’s not at all focused solely on speed.

Indeed you have, and there’s real integrity in that.

Which is why it likely can’t be sold to the SC in the absence of major Python users testifying it would make a material speed difference in their production code. Else, to them, it’s just another seemingly random complication that bloats the API without real demand, and makes the API harder to digest for everyone else.

Note that Python has a long history of not adding “for speed” knobs. For example, to this day, we have no way to tell lists or dicts how large we expect they may become, or any influence over their internal “over-allocation” strategies

Adding the optional __length_hint__() method was informal, and in (undocumented!) use years before PEP 424 “introduced” it.

dg-pb · December 1, 2025, 3:13am

Thanks @tim.one

My offering here is optimization.

I thought same API could apply to sorted and solve some convenience issues, but it seems it can’t.

And optimization alone doesn’t seem to be sufficient for stdlib inclusion.

Personally, I don’t have API pain points, at least not in function signatures / argument types.
As long as it is flexible enough it is easy to adapt to situation.
While if something is unreachable (in this case performance) - nothing can be done.

So unless someone else wants to show some initiative on “ease of use” front, I think this is over.

tim.one · December 1, 2025, 5:22am

Up to you, of course, but I’d be fine with settling for a different compromise, which is in some sense circling back to my second position:

The function and the method retain identical signatures.
A single new optional keylist= keyword argument is added.
- Which acts identically across function and method.
- If it’s a legit list object, it’s mutated in place (sorted) too.
- If it’s not a legit list object, no keylist mutations are visible.
  - Instead a shallow copy is used, acting like list(islice(keylist, n)).
  - Which is needed. The implementation only knows how to sort lists, and only keys are sorted directly even if just using key= (in which case a keylist has always been constructed as an invisible implementation detail - which we’re proposing to expose when the keys are supplied by the user in a list).

You then get full speed because you’ll stick to passing a legit list.

Everyone else gets maximal convenience. At some price:

In speed. Usually minor (few people will notice the cost of a cheap copy in an O(n log n) process).
In theoretical purity. Some may be surprised if a list-object keylist is mutated, and especially so in the otherwise purely functional sorted().

BFD. They can learn to pass a copy instead if they care. “Practicality beats purity” in Python, and there’s nothing arbitrary about the tradeoffs made here.

But, in the end, what really “sells” a PEP is use cases.

Nineteendo · December 1, 2025, 7:56am

How about sorted(iterable, /, *iterables, key=None, reverse=False)? (On my graphing calculator that’s spelled as SortA(L1, L2, ...)) Then you could write:

sorted_values, keys_sorted_by_value = sorted(d.values(), d.keys())

Extending this to list.sort() seems less intuitive to me.