It would be nice if they could be reordered in place like lists, with the same interface.
The argument passed to the function passed in as the key= parameter to dict.sort() would be the key, since that’s how sorted(my_dict, key=some_func) behaves and in general that’s what you get when you treat a dict as an iterable. To sort by values you would do
It would be nice if they could be reordered in place like lists, with the same interface.
That seems overly elaborate. What’s wrong with this?
my_dict = dict(sorted(my_dict.items()))
The argument passed to the function passed in as the key= parameter
to dict.sort() would be the key, since that’s how sorted(my_dict, key=some_func) behaves and in general that’s what you get when you
treat a dict as an iterable. To sort by values you would do
I’m curious,. When do you want to sort by other than the keys? Genuinely
curious, not saying this is in any way wrong - I’ve just never wanted to
do it myself.
Also, what’s the benefit of sorting a dict in place?
This seems simple enough to me enough to not warrant a special method on
dict.
The same as sorting a list in place: saving on memory and having dict have a similar interface to list, since both are ordered it seems like an arbitrary limitation that one can be sorted in place and the other can’t.
There is a big difference between list and dict. List is implemented by array. During sort, the list is accessible from the key function or __cmp__ method although the order during sort is implementation detail.
On the other hand, dict is implemented by hash table and array. key function or __cmp__ may access the dict during sort. So dict.sort() needs to keep hash table consistent with array during sort. It is difficult and sort performance may be horrible.
dict(sorted(my_dict.items())) will be much faster and safe compared to inplace dict sort.
If something must be added, maybe it would make sense to do something like OrderedDict.sorted_by_key() instead that basically does the same as the sorted() one-liner.
I agree, but the key is that dict(sorted(my_dict.items())) is Python.
Since moving an element of the items array and update the hashtable every time is an overkill, we can do the same of dict(sorted(my_dict.items())), but in C. I suppose it will be much faster. Instead of recreate the hashtable every time, we sort the items array, recreate a new dict from it and we will do a memcopy of the new hashtable over the old one.
But all inner loops are implemented in C. We can remove only calling overhead.
If it justify implement new combined function, we need to add infinite combined functions. It doesn’t make sense. (e.g. str.lower_and_strip() instead of str.lower().strip()).
So we need more strong rationale for adding new method for builtin types.
Is there any known code where the performance difference would matter? Unless there’s a demonstrable and widespread need, there’s no reason to optimize this.
But the question is: does the performance of the method presented here matter?
dict(sorted(my_dict.items()))
The bottom line: there’s already a way to sort dicts that uses the same methods Python uses for sorting and for creating dicts. The tools compose well. There’s no evidence that the performance of doing so is a problem. So, there’s no action here.
I think yes, because items() is a view, not a list. sorted() implicitly transform it in a list.
On the other hand, a builtin implementation of sort() for dict can sort inplace the internal array of items DK_ENTRIES(mp->ma_keys). Then we can regenerate the hashtable with build_indices().
This will skip the creation of a new list and the creation of a new dict.
Nobody denies that we can make some performance improvement, even if
only a small one, by moving the operation into C instead of pure
Python.
The question is, when does it matter? Under what circumstances is this
proposal a bottleneck in your code?
If it takes 500 microseconds (say) to sort your dict in pure Python,
using the obvious method of composing the items(), sorted() and dict()
functions, and you can save 2 microseconds (say) by moving the code
into C, under what circumstances does that 2 microseconds matter?
Obviously I made up those numbers, and they will depend on the size of
the dict and the speed of your CPU. But the point still stands.
It isn’t that performance doesn’t matter at all. But we care more about
speeding up basic functionality that about moving arbitrary combinations
of operations into C just for the sake of making it a bit faster.
It seems that not very many people care about sorting dict items. And
the performance benefit may not be very much. It probably won’t be a
bottleneck in your code. If you really care about this issue, to the
point that you are willing to write a PEP, you should:
find evidence that many other people care about sorting dicts
in place;
show evidence that applications spend a significant amount of
time sorting dicts (the profile module will be your friend here);
and demonstrate that it is plausible that moving these operations
into C will speed that up by a non-trivial amount.
If you know C, possibly the best way to do the third one is to actually
implement it youself and demonstrate a speed up.
Remember that sorting is an O(N log N) operation, while copying items
and constructing a new dict is O(N), so for large dicts, the time will
be dominated by the sorting, not the copying.
Steven, it seems you’re not confident with the internals of dictobject.c. If you read carefully my post above, you can see that using the internals you’ll be able to not create an intermediate list and a new dict, and I bet this is a huge improvement in speed. See DK_ENTRIES(mp->ma_keys) in dictobject.c.
The fact this method will be effectively useful to a good portion of users, it’s another matter and I already expressed my idea about this.
The internals of the dict implementation do not matter. If there’s no need for an optimization, the implementation is a moot point. And no need has been demonstrated.
Why do you think so? I think we can sort dk_nentries inplace and then recreate the whole hashtable with build_indices(). Of course we need some sort of lock, but I suppose that also lists can’t be accessed during sorting.
Inada-san made this clear in the comment he cited to you:
On the other hand, dict is implemented by hash table *and* array.
`key` function or `__cmp__` may access the dict during sort. So
`dict.sort()` needs to keep
hash table consistent with array during sort. It is difficult and
sort performance may be horrible.
`dict(sorted(my_dict.items()))` will be much faster and safe
compared to inplace dict sort.
Playing devil’s advocate, how is this different from sorting a list? The
key function or comparison methods could also access the list. The
solution to that is some shenanigans that makes the list appear empty
during the sort: