Allow `sorted` to take list for `key` argument

Even if this was a viable path for list.sort, I think making it separate from main sort is much better direction.

Reason being that the most performant path is a derivative operation as opposed to actually doing it in parallel. As doing it in parallel subjects all of more to O(nlogn) swaps.

For completeness:

def sort_many(key, *more, sort_key=True):
1 Like

No, this wouldn’t be a breaking change.
But it is not why this is a hard sell.
It is about purity.
And in this respect, there have been contracts here for very long time.

Are there any examples of this in Python, where in-place operation/method of some base type conditionally returns something?

To get back to the main topic. There are 2 viable options:

# keymode
def list.sort(self, /, *, key=None, reverse=None, keymode='keyfunc'):

# separate arg
def list.sort(self, /, *, key=None, keylist=None, keyiter=None, reverse=None):

Initial evaluation has identified both of above as valid candidates:

Aspect separate_arg_sort keymode_sort
Syntax for copy sort(keyiter=keys) sort(key=keys, keymode='iter')
Syntax for in-place sort(keylist=keys) sort(key=keys, keymode='inplace')
Implementation complexity :white_check_mark: Multiple param checks :white_check_mark: Mode enum check
Handles Iterables :white_check_mark: keyiter parameter :warning: With keymode='iter'
Handles inplace :white_check_mark: keylist parameter :warning: With keymode='inplace'
Default behavior :white_check_mark: Traditional sort :white_check_mark: Traditional sort
Learning Curve :white_check_mark: Parameter names explain intent :white_check_mark: Mode names are descriptive
Precedent in Python :warning: Uncommon pattern. :white_check_mark: Mode parameters are common
Robustness :white_check_mark: Unambiguous parameter types :white_check_mark: Unambiguous mode selection
API :warning: 2 (keyiter, keylist) :white_check_mark: 1 (keymode)
Extensibility :warning: More arguments needed :white_check_mark: Easy to add new modes
Type hints :white_check_mark: Clear per-parameter types :warning: key: Callable | Iterable | list

separate arg sort has advantage for user experience.

# keymode
indices = sorted(range(N), key=keys, keymode='iter')

# separate arg
indices = sorted(range(N), keyiter=keys)

Also, it doesn’t make use of existing key, leaving it with one type.
And other additional arguments are also of single type.

Its disadvantages are:

  1. It is a less common pattern.
  2. Further extensions in this dimension grows number of arguments linearly. Calling the function, however, would always involve only one. So the bloat is for argspec, but not using it.

(1) is real. Even if there is a similar case, I can not find it.
While keymode pattern is very common: open(mode), argparse.add_argument(action), numpy.number_of_functions_with_mode, …

(2) is less of an issue. list.sort has stayed unchanged for a very long time.
And if this materializes, there is little to no chance of extensions in the same dimension any time soon. I tried to think of what other kinds of keys could be added, but I could not find anything tangible. There are options, for example memory mapped file, but these aren’t suitable to be part of list.sort as exotic cases as such are naturally way out of its scope.


So, keymode is a safe option, which, if I had to make the safe choice right now, would be it.
However, although the pattern is uncommon, making a stretch for the sake of user experience is not alien phenomena in Python.

1 Like

And another possibility is to not add any new arguments. Instead the argument to key= could say it all.

  1. None (as now).
  2. Or a callable object (as now).
  3. Or 2-tuple, (object, mode_string).
  • Where the exact treatment of object depends on the value of mode_string.
  • And (callable, ‘call’) is equivalent to #2.

So, e.g.,

xs.sort(key=(ys, 'inplace'))

for your favorite case of 2-for-1 mutable list sorting.

I dare say I think there’s considerable “human factors” advantage in putting all the info about an argument in “a single syntactic unit”. No eyeball scanning of any of the rest of the argument list needed.

The API here may be unique in wanting to view a given argument in different ways depending on programmer intent, not discoverable from analyzing the argument object itself.

1 Like

Ok, this deserves a step back.
Removed a poll, etc from my previous post.


Ok, 3 candidates are different in these dimensions:

  1. Learning curve
  2. Terseness / simplicity on use
  3. argspec bloat
  4. typing complexity
  5. Precedent in Python
  6. Extensibility

What importance weighting would you assign on each of these from 0-5?

0 - not important at all
5 - utmost importance

Ok, so… These are the results of more granular evaluation:

# Importance: 0 - 5
weights = {
    '1. Learning curve': 5,
    '2. Terseness / Readability': 4,
    '3. argspec bloat': 3,
    '4. typing complexity': 2,
    '5. Precedent in Python': 5,
    '6. Extensibility': 1,
}

# Scores:
# 0 - hardly acceptable
# 3 - perfect
scores = {          #1  2  3  4  5  6
    'keymode':      [2, 2, 2, 2, 3, 3],
    'key_tuple':    [1, 1, 3, 1, 1, 3],
    'separate_arg': [3, 3, 1, 3, 1, 1]
}
def calculate(scores, weights)
def calculate(scores, weights):
    if type(weights) is dict:
        weights = list(weights.values())
    n_dim = len(weights)
    for sl in scores.values():
        assert len(sl) == n_dim
    divisor = 3 * sum(weights)
    return {
        name: round(sum(i * j for i, j in zip(score_list, weights)) / divisor, 3)
        for name, score_list in scores.items()
    }
print(calculate(scores, weights))
# key_tuple: 0.467
# keymode: 0.767
# separate_arg: 0.7

Tried to see how much stretch to importance weighting separate_arg would need to overtake:

weights_stretch_for_ux = {
    '1. Learning curve': 5,
    '2. Terseness / Readability': 5, # +1
    '3. argspec bloat': 3,
    '4. typing complexity': 2,
    '5. Precedent in Python': 4,     # -1
    '6. Extensibility': 1,
}
pprint(calculate(scores, weights_stretch_for_ux))
# key_tuple: 0.467
# keymode: 0.75
# separate_arg: 0.733

Even with 2 amendments, keymode remains the winner.
Setting extensibility importance to 0 finally makes separae_arg win.
So decent stretch is needed for this to change.


Winner as of now is keymode_sort.

Despite separate_arg being excellent in several dimensions of high importance, keymode is steady due to being well balanced in all areas.

Feel free to voice your concerns on both importance weightings and actual scoring.

1 Like

I disagree that separate, mutually exclusive arguments are an uncommon pattern in Python, certainly not to the point that separate_arg deserves a 1 in that regard.

Some examples in the stdlib include but are not limited to:

  • filename vs. stream vs. handlers for logging.basicConfig
  • input vs.stdin for subprocess.Popen
  • host and port vs. sock for asyncio.open_connection
  • object_hook vs. object_pairs_hook for json.load
  • capture_output vs. stdout and stderr for subprocess.Popen
  • mode=‘b’ vs. encoding, errors and newline for builtins.open
4 Likes

Oh, this is all highly debatable, but I don’t want a new time sink :wink:

For example, the real “learning curve” here is in communicating exactly what the new possibilities do, and why they can pay off. How they’re spelled is a trivial learning curve in comparison, for all of them. Nothing whatsoever is actually obvious at first glance to anyone except to a scheme’s proposer. I care much more about how easy it is to remember and use a scheme after it’s first learned. The effort to learn any of these spellings is akin to that needed to step over a line painted on a road. Approximately nothing.

For example, I actually don’t remember the precise names of the new optional arguments in separate_arg_sort I do remember ‘inplace’ and ‘iter’ as strings for the other two, although suspect ‘copy’ may be a better name than ‘iter’ (‘copy’ may more clearly contrast with ‘inplace’).

As for readability, they’re all roughly tied for me (nothing is obvious at first glance, but none can honestly be called baroque, or even mildly challenging).

As for terseness, separate_args clearly wins, but you ranked the other two in the wrong order:

xs.sort(key=ys, keymode='inplace')
xs.sort(key=(ys, 'inplace'))
xs.sort(keylist=ys)

Although nothing about the shortest one plainly says “in place”. Then again, nothing in the longer ones plainly says “must be a list”; OTOH those will “fail hard” if you don’t pass a list (raise an exception). If you pass a list to the shortest because the name contains “list”, and forget that it mutates, tough luck.

I didn’t work hard trying to think of other places where Python may decorate an argument with a qualifier, but it’s hardly novel. I guess my earlier int.to_bytes() example is a case of that, though.

Since we’re talking about sorting here, consider the Linux sort command line tool. -k is used to specify “field numbers” to sort on, and can be decorated with a large number of one-letter qualifiers to control how a field’s characters are to be treated (like ‘r’ to sort on the field in reverse order, ‘n’ to interpret the characters as numbers, ‘f’ to ignore case, on & on).

Nothing on the table here is even remotely novel in computer world at large. And thanks to @blhsing for pointing out some cases where “mutually exclusive” arguments are used already in Python.

I’ll add that ArgumentParser.add_mutually_exclusive_group() exists precisely to cater to this possibility.

There are places too where qualifiers modify the interpretation of a primary argument. Perhaps most obviously, re.compile()s primary argument is a string in regexp format, while the optional flags argument can radically change how the regexp behaves.

2 Likes

My favorite chatbot wants me to pass this on:

:rofl:

2 Likes

Many examples of high equivalence.
So what do you argue for? 2 or 3?
I mean, if builtin.open can be said to follow similar pattern, then ~2.5?

@tim.one ?

Well on this account, I think all 3 are largely similar.

Ok, so what is your suggested adjustment? 5 → 4? 5 → 3?

Readability and terseness are both packed into 1, so… I suggest:

key_tuple: 2
keymode: 2 (more verbose than `key_tuple`, but readability, I think, is better, as `(string, value)` is a bit unusual (doesn't read in either english or connection by argument names).
separate_arg 3

For now, I think it is best not to focus on actual argument/string names. Let’s assume that once we agree on best version for this, we will come up with good naming for either version.

Or we can adjust these as we go - happy to amend on every idea that is better than the current status.

Absolutely so.

But nothing would compel list.sort() to moving them all in lockstep at the same time; it just has to act “as if” it had done that, and, under the covers, use the same approach of computing the permutation’s cycle structure just once and then applying it one at a time.

Season to taste :wink: I was aiming to emulate Pike’s approach..

More important would be changing it to:

def sort_many(base, *more, sort_base=True, **kws):

to allow the first array to be sorted using any/all of sort’s keyword arguments (and the “key” argument has to be renamed then to avoid conflicting with sort’s key= argument).

The 1-liner argsort trick can’t work then. It wants key= for its own purposes.

I wormed around that, but it’s unsatisfying. And, believe it or not, this will get back to the real topic here :wink:

Best I could think of is a fancier argsort() building a temp list, list(zip(base, range(len(base)))). Then sort that with the full power of .sort(), and extract the index components of the decorated temp list.

One problem: if the user did specify key=, their function won’t at all be expecting a decorated tuple. So kws['key'] has to be replaced with a new function, which just passes on the first element of a pair to their original function.

All easy enough, but pretty horrid in memory bloat and bad for speed.

Which brings us back to the real topic here. Much better, if the PEP makes it in:

  • The fancy argsort() can apply the original key= function to the base list itself.
  • And then pass that list via key=(that_list, 'inplace'), using the original argsort trick to get a permuted index list directly. No decorated tuples needed.
  • Although note that there’s no semantic need for 'inplace` here - the code couldn’t care less how that list gets permuted. It’s done inplace to skip making a useless copy.

So that’s a semi-real use case: one that would simplify and speed an industrial-strength implementation of “many_sort()”, despite that nobody was aiming at that. It’s Good Sign™ when a proposed new gimmick finds a strong use during discussion.

1 Like

You got to the root of POV from performance perspective the way I see it.

The benchmarks are valid on their own.

But another way to look at it is: It exposes full performance capabilities of what sort algorithm can provide, eliminating friction in pretty much all the cases where one might think “but algorithm can do it, why can I not get that performance?”.

Combined with the fact that it also has potential to make code simpler by offering better patterns for things such as argsort (which is likely the 2nd most common sorting operation), it is the reason for this thread.

And this is another useful quality of inplace!

1 Like
# Importance: 0 - 5
weights = {
    '1. Learning curve': 3,     # 5 -> 3
    '2. Terseness / Readability': 4,
    '3. argspec bloat': 3,
    '4. typing complexity': 2,
    '5. Precedent in Python': 5,
    '6. Extensibility': 1,
}

# Scores:
# 0 - hardly acceptable
# 3 - perfect
scores = {          #1  2  3  4  5  6
    'key_tuple':    [1, 2, 3, 1, 1, 3],
    'keymode':      [2, 2, 2, 2, 3, 3],
    'separate_arg': [3, 3, 1, 3, 2, 1],  # precedent 1 -> 2
}

print(calculate(scores, weights))
  1. keymode: 0.778
  2. separate_arg: 0.759
  3. key_tuple: 0.556

However, both of the below lead to separate_arg overtaking:

  1. keymode readability: 2 → 1.5
  2. separate_arg precedent: 2 → 2.5

This is basically a statistical tie given the subjective nature of the scoring.

I can’t think of any.

None. Seriously, while assigning numbers to various things can help focus concentration and force consideration, I simply don’t believe it’s an “objective” way to weigh tradeoffs. When it comes to readability, the most important thing is deliberately being put off: the specific names picked.

For example, to this day I still mentally translate dict.setdefault() to dict.getorset(), because the latter says flat out what it does, while the former is A Mystery.. Names matter.

My mild preference for not adding any new arguments is that, in an ideal world, “the obvious” thing would be done: sort takes a key argument, which can be of different kinds, but fundamentally the same in purpose: the keys to use to sort a list. That’s primary to me.

And in that ideal world, there’s a very Pythonic way to distinguish them: by the class the key object is an instance of.. So, e.g.,

key=Callable(f)
key=CopyIterable(it)
key=InplaceList(L)

But such classes (trivial wrapper classes) don’t exist, and it would be nuts to add them to the built-ins too.

So I settle for something that gets the same effect with a plain old string tag instead. And, try as you may :wink: , I simply don’t believe that anyone finds the meaning of, say, (ys, ‘inplace’) inscrutable in this context. While nothing is obvious at first glance, the various spellings on the table are all trivial things to learn. There’s no real difference in this respect among any of them.

So my mild preference remains “one argument for one fundamental concept”.

How about one arg not tagged? If key is callable, use as now. If it’s a list, mutate it. Otherwise, iterate it. If you have a list and don’t want it mutated, call iter() on it.

See mixed_sort variation in Allow `sorted` to take list for `key` argument - #107 by dg-pb

In general, too unstable for builtin method.
What if I have Iterable, which is also callable?

Then you might say, “one could just wrap it in iter”, but then we go back to the initial version with only inplace list option, which is much simpler with the same solution: “just wrap it in list”.

Also suffers from accidental mutation.
Someone notices calls sorted(arg1, key=iterable) and just mindlessly puts in sorted(arg1, key=list) without realising that it is going to be mutated.

Not the worst thing in the world, but in the light of 2 current contenders, it is very unlikely to be a better choice.

If the doc tells the treatment/order of callable, list and iterable, and you don’t read it, it’s your fault.

Then we can go back to the very initial version:

def inplace_sort(self, /, *, key=None, keylist: list=None, reverse=None):

Which takes list that always gets mutated.

Which is simpler and more elegant.
And apply the same argument.

I disagree.