Sorting ordered dicts

I think it would be good to add a way of sorting OrderedDicts.

I read this topic which sought to add the .sort() method to builtin dicts. I don’t think it fits in with the “spirit” of the builtin dict, whose ordering is only a sidenote. And there are apparently numerous implementation and performance issues.

However, these objections don’t apply to collections.OrderedDict, whose ordering aspect comes first, if we believe its doc section. I think it makes sense to add a .sort method which would reorder the entries in-place, which has similarities with what move_to_end does. If a sorting algorithm can fit well with the double-linked list format (I think) OrderedDict uses, that could even be a performance improvement over the existing way, OrderedDict(sorted(od.items())).

4 Likes

dict keys can be of any hashable type and are not always sortable. How would one sort a dict like {0:1, ‘a’:2, int:3} ? If the keys are sortable, you can just create a new sorted dict with a comprehension:

sorted_d = {key:d[key] for key in sorted(d.keys())}
1 Like

By that argument, list should not have a sort method either.

>>> x = [0, 'a', int]
>>> x.sort()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'str' and 'int'

That just means you would need an appropriate key function to say how to compare two otherwise non-comparable values, not that the idea of sorting doesn’t apply.

Hypothetically,

d = OrderedDict{0:1, 'a': 2, int: 3)
d.sort(key=some_appropriate_function)
8 Likes

Yes, with a good key function we can do wonders.

x = [0, 'a', int, 9, 3, True]
x.sort(key=lambda thing: str(thing))
print(x)

Output:

[0, 3, 9, <class 'int'>, True, 'a']

So, if we were to define a means of sorting an OrderedDict, we’d just need to be able to supply a good key function that would work for every item. An issue would be what should happen if the key function couldn’t do the job.

from the Zen:

Errors should never pass silently.

So it might be best to raise an exception in that situation.

If no key function is given, then we would either need a default action, or to raise an exception. What do you all feel would be best for that occasion?

I would expect the exact same behaviour that sorted() and list.sort() use.

But OrderedDict can be sorted with the comprehension as given above, and I’m really not sure of the use-cases for this, so I’m dubious that a new method is needed.

3 Likes

Can you come up with an algorithm that would benefit from the method? Because it seems to me that you actually need a different data structure, e.g., sorted containers.

2 Likes

As a recent example, I have a list of entities which the player has affinities with (it’s in a game context), and I have a mapping storing the affinity for each entity.
In my case I frequently have to show the entities sorted by value, I’m currently using sorted(dic, key=dic.get) to get a list of those but then I lose the easy access to the affinity values from the sorted list of keys.
When I show it in a menu where you can manually change the value for each key, it would help UX that the items don’t change ordering as a result of these increases, and stay the same until you close the menu - so if I were using sorted containers, which seem like a great thing I could use in other contexts, I would have to copy the keys in a fixed-order list each time otherwise they would bump one another around when their values change.
In my use case it would then be easiest to sort the dict when the menu is opened, or when it is similarly accessed.

Also, this is purely speculation on my part since I don’t know much about sorting algorithms and container performance in Python, but maybe there could be a (considerable ?) performance improvement in OrderedDict when using an in-place sorting rather than generating the items, making a list out of them, sorting that list, and going through the entire double-linked list generation from the beginning each time you want to sort it using the snippet I used or @bverheg’s.

Otherwise yes, the error handling mechanism would be the exact same one as for the list sorting API, the default sort would be by keys kecause then it’s dead easy to sort by value using d.get as the key function, and only slightly more complex to sort by items using a lambda k:(k, d[k]).

If you want to optimize for computational complexity, then I think you could use different data structures.

Keep the mapping: M from entity to value.
Add a separate list L of entities displayed.

Modification of entity value is O(1), opening the display list is O(nlogn) to copy from M to L and sort.

Alternatively, make M a sorteddict and keep L for display.
Modification of an entity value is O(logn), opening the display list is O(n) to copy from M to L.

1 Like

Could you tell me the name of the algorithm for sorting doubly-linked list faster than creating temporal list?
Can the algorithm support key function efficiently? Or should the OrderedDict.sort() function use cmp function?

In that case I still end up with data redundency, since adding or removing an entity to the dict doesn’t reflect on the list and vice versa, an issue I’m currently having which forces me to keep the list relatively short-lived and have to manage different containers holding similar data at the same time.

Also, but this enters speculations on my part, the first example sorts the same unordered list each time (with sometimes elements getting added or removed). I think there are sorting algorithms which waste very little time when the data is already mostly sorted. With an inplace sort, there could be considerably less loss of time if I show the menu twice and only one or two values changed and need to be reordered within a mostly sorted dict.

Read again, I said if a sorting algorithm can fit well with it. I have no idea whether or not this is the case, my point is, OrderedDict and dict are not implemented the same way so something inefficient on the one may be better on the other.
Also why not creating a temporary list ? Can’t you create one, and reattribute the begin and end pointers to the secondary list when the sorting is done ?

I just wanted to confirm what “in-place” and “sorting algorithm can fit well with the double-linked list” in your first post mean.
I thought that you assumed sorting double-linked list in-place. But I was wrong.

I mean in-place as : the dict (or ordereddict) object stays the same, it’s mutated instead of another instance being created and put in the same variable - as you currently have to if you want a sorted dict.
That does not mean you can’t create a temporary linked list inside the dict instance to do the sorting behind the scenes. I don’t really care if we do that or not, it’s not part of the public API anyway, so whatever favors performance is fine.

Something like this?

for key in sorted(od):
    od.move_to_end(key)

That would be one good way to do it, yes. But a method accessing the links themselves would be more performant : for example instead of setting the end pointer n times, it would do so one time.

I think this is the best approach. OrderedDict.sort() wouldn’t faster than it. And OrderedDict is memory inefficient.

You don’t lose easy access to the values. You can get it by dic[key]. It’s easy enough.

1 Like

Accessing the sorted values is not : you have to iterate through the sorted list of keys, and index the dict for each one. Compared to being able to sort the dict, it’s about the same inefficiency as doing

for i in range(len(l)):
    l[i]

instead of just iterating the list.
Not to mention my previous point about sorting methods that are faster on already sorted data, which may exist and be useful when sorting the dict in-place but can’t when sorting a copy of the keys.
That and the link being lost if you add/remove a key.

This overhead would be much smaller than OrderedDict.sort().
If you need to iterate value very frequently, you can sorted(dic.items(), key=itemgetter(1)) instead.

1 Like

No, that’s the opposite : yours is a list of the keys ordered by values, whereas I was talking about iterating the values ordered by keys.
How would sorting something from scratch repetitively instead of tweaking mostly-sorted data, and manipulating two containers instead of one, add less overhead than in-place sorting ? That doesn’t make sense.

You’d be amazed how efficient timsort is at tweaking mostly-sorted data. Given that a dictionary remembers insertion order, you could fairly easily re-sort and return to dict form, and then it’s mostly sorted for next time.

Then, dict(sorted(dic.items(), key=itemgetter(1))).

What you call “in-place” is not true “in-place”. It is out-of-place sort + updating linked list based on the sort result.
Linked list is slow and memory inefficient. Dict is faster and more efficient because it is array-based.

If can not believe me, try implementing OrderedDict.sort().

1 Like