Although shuffleâs docstring: âShuffle list x in place, and return None.â Heh - original use cases persist forever in docs
If it had been coded in C, yes, I believe it would have been restricted to lists. Itâs far faster to do C-kevel indexing directly on a listâs underlying C array of PyObject * than to use the abstract sequence C APIs for indexing, and shuffling does a whole lot of indexing.
Leave builtin sorted alone and just do mutable version for list.sort(..., keylist=None).
With clear motivation of performance.
It did originate mostly from the angle optimization so leaving sorted out, would reflect the intent better.
And leave sorted for later, where 2 paths can be taken:
If people get used to keylist and grow to like it, then propagate it as it is `sorted(âŠ, keylist=None)
If they donât, then sorted has already diverged to some degree (and in exactly the same way), so just as well can do a differently named arg for generic iterable:
def sorted(iterable, *, key=None, keys=None, reverse=False):
if keys is not None:
keylist = list(keys)
values = list(iterable)
values.sort(keylist=keys, reverse=reverse)
return values
I quite like (2). While list.sort can be thought of as in-place method, sorted is a builtin convenience, where in comparison none of the arguments are mutated. And to make a distinction, argument names for explicit key-list / key-iterable of keys are different.
Having the signatures of .sort() and sorted() diverge is a dubious idea on the face of it. Thereâs another kind of compromise: copying a list is bound to be much faster than mapping even an identify function (lambda x: x) across the list, so always make an internal copy of a keylist. Itâs the function call with potentially unbounded expense, and the Big Savings come from a way to say âI already know the sort keys - donât burn cycles to compute themâ.
Then, e.g., even merely iterable (not just sequence) arguments are fine everywhere.
You lose the ability to do âtwo for oneâ mutating sorts, but that wasnât attractive to me to begin with. Just a consequence of the most obvious implementation.
If Python people wantargsort()-like functionality (an easy way to permute multiple lists in the same way), better to make that a distinct issue, and leave it out of this one. I never saw much value in making just the âexactly one additional listâ case easy while leaving âmore than just one additionalâ untouched.
In short: list..sort() and sorted() then both work exactly the same way, neither ever mutates a keylist argument (because both work on a shallow copy), and everything is uniform and easy to explain.
Ya, Iâm annoyed too by the extra memory burden of making a shallow copy of a keylist. But just annoyed. Thatâs better than making tradeoffs Iâd be more appalled by .
I like the idea of restricting this to list.sort. When thinking about the signature, it occurred to me that the point about keylist getting mutated becomes visible, because weâd presumably document the effect on keylist? Iâm thinking something like:
keylist: A list of values to be used as keys for the sort, one per list item. As well as the target list, the keylist will also be sorted as a result of the sort operation.
Or is the intention not to document what happens to keylist, but just to say that it may be mutated (in some unspecified way)?
Personally, Iâd be much more comfortable if the effect was documented. I canât quite say why that feels so much better, but I think that at least in part itâs because well-defined behaviour doesnât feel as difficult to deal with.
Itâs more the case that the keylist is sorted, and the target is permuted in the same way. The target just comes along passively for the ride. Any plausible implementation would do exactly that, so, ya, that the keylist will be sorted would be documented.
In a version that does mutate the keylist. Else the docs would say that the keylist is not mutated at all (although a shallow copy of it would be under the covers).
Of course, I am in strong agreement that keylist, which gets mutated needs to be very clearly documented.
So letâs see. Github use cases:
/\bsorted\(/ Language:Python - 6M
/\.sort\(/ Language:Python - 2.7M
Thus, I think @Stefan2âs and @pf_mooreâs concern is very valid. sorted is both more commonly used top level builtin function and is an entry point to newcomers.
Thus, making it do (relatively) complex thing of mutating one of its keyword arguments in-place is risky!
At the same time, copying a list can still be relatively expensive procedure in the context of highly optimized sorting algorithm, especially in very favourable cases.
# I need to sort something
sorted(a)
# I need to get sort indices
sorted(range(len(a)), keys=a)
No issues, no pitfalls, no side-effects.
Then the user gets a bit more crafty and finds out that list.sort can be more efficient if already have a list, but needs to understand that it is in-place:
# I need to sort a list
a.sort()
# I need to sort indices based on keylist
indices.sort(keys=a) # TypeError('no keys argument')
# Need to read the docs
# there is no `keys` argument, but there is `keylist`
# the argument is different and can be clearly seen
# that it needs to be a list which is mutated
# in parallel with the main list
indices.sort(keylist=a)
Further on, different names: keys and keylist is a continuous reminder of the difference.
In the same way: sorted(iterable and list.sort is a reminder that one makes a copy and the other modifies in place.
But is it that bad?
I think the parallel of this is fairly close to already existent divergence of sorted and list.sort.
Iâm not sure I follow, but to the extent that I may be following: allow keys= in both to mean âthis is an tierable of sort keys, and if itâs a mutable sequence you must not mutate it, most efficient is to pass a listâ.
If itâs thought necessary to âoptimizeâ beyond that, the .sort() method alone could grow an additional keylist= argument to mean âthis is a list of sort keys, will be sorted in-place, and the list object will be permuted in-place in the same wayâ.
Of course it can, but I donât care . list.sort() is already giving users enormous speedups in such highly favorable cases. While I wouldnât do so on purpose, I really wouldnât care it Python ran them, say 5x slower. Theyâd still be so very much faster than O(n log n) that Iâd remain highly grateful .
Thatâs what I guessed, so doesnât change what I said about it. Thereâs no reason to presume that âno side effects on the keysâ isnât also of interest to .sort() method users, giving keys= arguments to both lets them switch back and forth with no pain or surprises.
The desire for optimization extremes probably is confined to .sort() users, though (if they used sorted(), they donât care about creating a new âjust as bigâ list). Fine, give them a different keyword to use just as restrictive as necessary to achieve extreme optimization.
keylist argument which gets mutated as part of in-place sorting.
This wouldnât affect sorted.
And sorted(..., keys: Iterable=None) extension can be left for later either with or without extra argument to list.sort.
If this has to go through a PEP process (as @picnixz suggested), I donât expect it can get accepted if it tries to change .sort() alone without also changing sorted(). Just a fact of life: âconsistencyâ plays a huge role for them. Divergence without truly compelling cause is probably a non-starter.
Truly compelling use cases might do the trick. I donât see them. numpyâs argsort() is precedent but is aimed at permuting any number of lists in the same way. Youâre focused on the âtwo-for-oneâ case and no longer same to care about âmore-than-one additionalâ cases. You did care, though, for as long as it took to realize that Python isnât Pike .
As a plain user, I just wonât be happy until I can write, for a dict d:
I donât want to use .sort() for this. I donât want to materialize a (useless to me) list of insertion-order keys first. I donât want to use key=d.__getitem__ instead. Iâd be willing to spell it list(d.values()) instead if I must, but donât really want to. And I couldnât care less about the sorted list of values.
And I suspect that specific use case is one many people have had in real life. Using key=d__getitem__ is likely the fastest way now, but seems beyond what most Python programmers already know (we here arenât typical).
Youâre focused on peak speed âtoo muchâ for the SCâs tastes, seemingly driving every decision now. I accepted that at first, but mostly because the simplicity of implementation was compelling in its own way.
But my own most plausible use cases mostly donât care whether the list of keys gets visibly sorted too, and the ones that do are better suited to an argsort() approach (sort any number of lists in lockstep, such as sorting a 2D array in list-of-column-lists format by a specific columnâs values - or in list-of-row-lists format by a specific rowâs values).
Making tasks easer for non-elite Python programmers would score points with the SC.
That, as I understood, was largely conditional on the fact that it modifies argspec of builtins.sorted.
Does it need a PEP if this is limited to list.sort?
Many methods of builtin types were added without a PEP.
If no, then this can be framed as initially intended - âfastpath of list.sort for power users and more performant backend to future API extensions related to sortingâ.
Personally, I have come full circle to exactly what was initially proposed.
And I think it is best to not try to creep into builtins.sorted at the same time.
This is largely performance related and solution for optimal performance is unambiguous.
As long as it does not obstruct future extensions, I think this might be a good idea.
And keylist argument will not have any clashes as it is clear by now that Iterable input is a preferred option for sorted and such argument will not be .*list.* (as it will not be list).
(Or if it will be in-place modifiable list, then also no problem - can just use the same as it will be the same)
I suggest spending more quality time with a chatbot . Hereâs what GPT-5 told me. which seems spot-on in all respects:
We then had quite a long conversation about which kinds of things get accepted. We quickly reached agreement that âpurely for speed in niche casesâ doesnât have much chance. But it very much liked my idea of trying to âsneak speed inâ as a kind of Trojan Horse, under the guise of making things like getting a list of a dictâs keys sorted by their associated values easier & clearer for plain old users, not just faster.
is pretty much the âmost Pythonicâ spelling imaginable, using perfectly general mechanisms in more-than-less obvious ways, and using mostly self-descriptive names. But itâs not at all focused solely on speed.
Indeed you have, and thereâs real integrity in that.
Which is why it likely canât be sold to the SC in the absence of major Python users testifying it would make a material speed difference in their production code. Else, to them, itâs just another seemingly random complication that bloats the API without real demand, and makes the API harder to digest for everyone else.
Note that Python has a long history of not adding âfor speedâ knobs. For example, to this day, we have no way to tell lists or dicts how large we expect they may become, or any influence over their internal âover-allocationâ strategies
Adding the optional __length_hint__() method was informal, and in (undocumented!) use years before PEP 424 âintroducedâ it.
I thought same API could apply to sorted and solve some convenience issues, but it seems it canât.
And optimization alone doesnât seem to be sufficient for stdlib inclusion.
Personally, I donât have API pain points, at least not in function signatures / argument types.
As long as it is flexible enough it is easy to adapt to situation.
While if something is unreachable (in this case performance) - nothing can be done.
So unless someone else wants to show some initiative on âease of useâ front, I think this is over.
Up to you, of course, but Iâd be fine with settling for a different compromise, which is in some sense circling back to my second position:
The function and the method retain identical signatures.
A single new optional keylist= keyword argument is added.
Which acts identically across function and method.
If itâs a legit list object, itâs mutated in place (sorted) too.
If itâs not a legit list object, no keylist mutations are visible.
Instead a shallow copy is used, acting like list(islice(keylist, n)).
Which is needed. The implementation only knows how to sort lists, and only keys are sorted directly even if just using key= (in which case a keylist has always been constructed as an invisible implementation detail - which weâre proposing to expose when the keys are supplied by the user in a list).
You then get full speed because youâll stick to passing a legit list.
Everyone else gets maximal convenience. At some price:
In speed. Usually minor (few people will notice the cost of a cheap copy in an O(n log n) process).
In theoretical purity. Some may be surprised if a list-object keylist is mutated, and especially so in the otherwise purely functional sorted().
BFD. They can learn to pass a copy instead if they care. âPracticality beats purityâ in Python, and thereâs nothing arbitrary about the tradeoffs made here.
But, in the end, what really âsellsâ a PEP is use cases.
How about sorted(iterable, /, *iterables, key=None, reverse=False)? (On my graphing calculator thatâs spelled as SortA(L1, L2, ...)) Then you could write: