Add a dict.sort() method

aoeu · November 14, 2020, 5:07am

Python dictionaries are now ordered but there’s no way to reorder a dictionary in place, you have to create a new one:

from operator import itemgetter
my_dict = {2: "a", 1: "b"}
my_dict = dict(sorted(my_dict.items(), key=itemgetter(0)))

It would be nice if they could be reordered in place like lists, with the same interface.

The argument passed to the function passed in as the key= parameter to dict.sort() would be the key, since that’s how sorted(my_dict, key=some_func) behaves and in general that’s what you get when you treat a dict as an iterable. To sort by values you would do

my_dict.sort(key=my_dict.get)

cameron · November 14, 2020, 7:24am

Python dictionaries are now ordered but there’s no way to reorder a
dictionary in place, you have to create a new one:
from operator import itemgetter
my_dict = {2: "a", 1: "b"}
my_dict = dict(sorted(my_dict.items(), key=itemgetter(0)))
It would be nice if they could be reordered in place like lists, with the same interface.

That seems overly elaborate. What’s wrong with this?

my_dict = dict(sorted(my_dict.items()))

The argument passed to the function passed in as the key= parameter
to dict.sort() would be the key, since that’s how sorted(my_dict, key=some_func) behaves and in general that’s what you get when you
treat a dict as an iterable. To sort by values you would do

I’m curious,. When do you want to sort by other than the keys? Genuinely
curious, not saying this is in any way wrong - I’ve just never wanted to
do it myself.

Also, what’s the benefit of sorting a dict in place?

This seems simple enough to me enough to not warrant a special method on
dict.

Cheers,
Cameron Simpson cs@cskk.id.au

aoeu · November 15, 2020, 4:41pm

You’re right, nothing.

“How do I sort a dictionary by value” is the 15th most upvoted question under the [python] tag on Stack Overflow.

The same as sorting a list in place: saving on memory and having dict have a similar interface to list, since both are ordered it seems like an arbitrary limitation that one can be sorted in place and the other can’t.

methane · November 16, 2020, 5:50am

There is a big difference between list and dict. List is implemented by array. During sort, the list is accessible from the key function or __cmp__ method although the order during sort is implementation detail.

On the other hand, dict is implemented by hash table and array. key function or __cmp__ may access the dict during sort. So dict.sort() needs to keep hash table consistent with array during sort. It is difficult and sort performance may be horrible.

dict(sorted(my_dict.items())) will be much faster and safe compared to inplace dict sort.

uranusjr · November 16, 2020, 6:40am

If something must be added, maybe it would make sense to do something like OrderedDict.sorted_by_key() instead that basically does the same as the sorted() one-liner.

Marco_Sulla · May 31, 2021, 8:46pm

I agree, but the key is that dict(sorted(my_dict.items())) is Python.

Since moving an element of the items array and update the hashtable every time is an overkill, we can do the same of dict(sorted(my_dict.items())), but in C. I suppose it will be much faster. Instead of recreate the hashtable every time, we sort the items array, recreate a new dict from it and we will do a memcopy of the new hashtable over the old one.

methane · June 1, 2021, 12:33am

But all inner loops are implemented in C. We can remove only calling overhead.

If it justify implement new combined function, we need to add infinite combined functions. It doesn’t make sense. (e.g. str.lower_and_strip() instead of str.lower().strip()).

So we need more strong rationale for adding new method for builtin types.

ericvsmith · June 1, 2021, 1:09pm

Is there any known code where the performance difference would matter? Unless there’s a demonstrable and widespread need, there’s no reason to optimize this.

Marco_Sulla · June 1, 2021, 6:36pm

Well, I found very unintuitive to sort a dictionary using its items. It’s not something that comes into mind immediately.

I had the need to sort a dict one time or two. Less than lists, but not so sporadically.

ericvsmith · June 1, 2021, 6:57pm

But the question is: does the performance of the method presented here matter?

dict(sorted(my_dict.items()))

The bottom line: there’s already a way to sort dicts that uses the same methods Python uses for sorting and for creating dicts. The tools compose well. There’s no evidence that the performance of doing so is a problem. So, there’s no action here.

Marco_Sulla · June 2, 2021, 8:18am

I think yes, because items() is a view, not a list. sorted() implicitly transform it in a list.

On the other hand, a builtin implementation of sort() for dict can sort inplace the internal array of items DK_ENTRIES(mp->ma_keys). Then we can regenerate the hashtable with build_indices().

This will skip the creation of a new list and the creation of a new dict.

steven.daprano · June 2, 2021, 9:16am

I think you are missing Eric’s point here.

Nobody denies that we can make some performance improvement, even if
only a small one, by moving the operation into C instead of pure
Python.

The question is, when does it matter? Under what circumstances is this
proposal a bottleneck in your code?

If it takes 500 microseconds (say) to sort your dict in pure Python,
using the obvious method of composing the items(), sorted() and dict()
functions, and you can save 2 microseconds (say) by moving the code
into C, under what circumstances does that 2 microseconds matter?

Obviously I made up those numbers, and they will depend on the size of
the dict and the speed of your CPU. But the point still stands.

It isn’t that performance doesn’t matter at all. But we care more about
speeding up basic functionality that about moving arbitrary combinations
of operations into C just for the sake of making it a bit faster.

It seems that not very many people care about sorting dict items. And
the performance benefit may not be very much. It probably won’t be a
bottleneck in your code. If you really care about this issue, to the
point that you are willing to write a PEP, you should:

find evidence that many other people care about sorting dicts
in place;
show evidence that applications spend a significant amount of
time sorting dicts (the profile module will be your friend here);
and demonstrate that it is plausible that moving these operations
into C will speed that up by a non-trivial amount.

If you know C, possibly the best way to do the third one is to actually
implement it youself and demonstrate a speed up.

Remember that sorting is an O(N log N) operation, while copying items
and constructing a new dict is O(N), so for large dicts, the time will
be dominated by the sorting, not the copying.

Marco_Sulla · June 2, 2021, 9:55am

Steven, it seems you’re not confident with the internals of dictobject.c. If you read carefully my post above, you can see that using the internals you’ll be able to not create an intermediate list and a new dict, and I bet this is a huge improvement in speed. See DK_ENTRIES(mp->ma_keys) in dictobject.c.

The fact this method will be effectively useful to a good portion of users, it’s another matter and I already expressed my idea about this.

ericvsmith · June 2, 2021, 2:47pm

The internals of the dict implementation do not matter. If there’s no need for an optimization, the implementation is a moot point. And no need has been demonstrated.

And with that, I’m done commenting on this issue.

Marco_Sulla · June 2, 2021, 6:27pm

For who is already interested in the topic:

Google search: sort dict python

A lot of results, and the first StackOverflow question has an answer with 5724 votes. So I think people is interested in sorting dicts.

methane · June 3, 2021, 1:11am

It is difficult. See my this comment.

pylang · June 3, 2021, 2:26am

Taking a step back, in what applications would someone need to sort a dict?

Marco_Sulla · June 3, 2021, 5:00pm

As I posted here, a lot of people.

Why do you think so? I think we can sort dk_nentries inplace and then recreate the whole hashtable with build_indices(). Of course we need some sort of lock, but I suppose that also lists can’t be accessed during sorting.

cameron · June 3, 2021, 11:55pm

Inada-san made this clear in the comment he cited to you:

On the other hand, dict is implemented by hash table *and* array.  
`key` function or `__cmp__` may access the dict during sort. So 
`dict.sort()` needs to keep
hash table consistent with array during sort. It is difficult and 
sort performance may be horrible.

`dict(sorted(my_dict.items()))` will be much faster and safe 
compared to inplace dict sort.

Cheers,
Cameron Simpson cs@cskk.id.au

steven.daprano · June 4, 2021, 12:33am

Playing devil’s advocate, how is this different from sorting a list? The
key function or comparison methods could also access the list. The
solution to that is some shenanigans that makes the list appear empty
during the sort:

>>> mylist = [3, 0, 2, 1]
>>> mylist.sort(key=lambda item: print(mylist) or item)
[]
[]
[]
[]
>>> mylist
[0, 1, 2, 3]

Also, a technical point: there is no __cmp__ dunder any more, that has
been gone since Python 3.0.1 over a decade ago.

https://bugs.python.org/issue1717