'insert', 'swap', 'get_by_index' for `OrderedDict`

Given the fact that OrderedDict is not very flexible. E.g. I can not swap 2 items, or insert new item in a selected place, I almost never use it anymore as simple dict covers 99% of use cases of orderedmapping (at least for me).

And for more complex use cases I end up writing my own objects. E.g. I can not use OrderedDict for implementation of SortedDict, HeapDict, QueueDict, because adjusting order with move_to_end is much slower than with alternative approaches.

Implementing swap, insert, get_by_index for OrderedDictcould result in its recovery.

Can anyone else relate to this? Also, would implementation of these be difficult?

1 Like

Instead of having a get_by_index method, I would very much like if both dict_items ,dict_keys, dict_values and odict_items, odict_keys, odict_values were subscriptable.

That’s perfectly okay. A SortedDict isn’t really a subclass of OrderedDict; it’s completely reasonable for it to be its own thing.

Wouldn’t get_by_index and the subscriptable views be inefficient?

Not really, no. I rarely if ever need features beyond what the standard dict provides. Very occasionally, the fact that dict preserves insertion order is convenient, but that’s about as far as I go. If I needed a more capable ordered container, I’d normally look on PyPI - the sortedcontainers library is supposed to be very good, for example.

As to implementation, I don’t know how complex an implementation would be. But there would likely be a lot of debate over what additional methods should be added (your list isn’t self-evidently correct, at least not to me), and making the implementation high performance could be tricky (my understanding is that OrderedDict is designed to be efficient, and existing users of the class won’t want to lose that efficiency for the sake of methods they don’t use).

But that’s as much as I know. If you want to take this idea further, it’ll probably need a PEP at some point, and in that PEP you’ll likely need to answer questions like:

  • Why add methods to OrderedDict leaving the user to implement the actual data types they want (SortedDict, HeapDict, QueueDict, …)? Would it not be better to provide the data types themselves?And if the answer is that we don’t know what types the user might need, how do we know the proposed methods are sufficient?
  • Why aren’t existing container types from PyPI sufficient? Why would a user implement their own by wrapping an OrderedDict rather than using an existing implementation?

Obviously, you don’t have to answer these now, but at some point before anything gets implemented these questions will need to be addressed.

2 Likes

I don’t know. I did manage to make dict with efficient sub-scriptable views, but it combines several containers to achieve so. To be more precise - a separate list to store keys, then implemented dict methods that manage that list at the same time. After implementing several different-feature dictionaries I noticed that what is common between them all is that they can be achieved more easily, efficiently and elegantly if OrderedDict was more flexible.

I copied a lot of SortedContainers approach to make above data structures. I also wrote my own SortedList / SortedDict trying to improve upon SortedContainers. I did manage to make slightly more efficient objects, but nothing major. But that is beside the point. The point is that currently SortedDict/QueueDict/… need to hold another object for keys to provide efficient insertion and ordering methods and if there was a way to efficiently improve OrderedDict in ways I have mentioned it would lead to memory and possibly computational improvements of such objects.

Also, this post is just to share my opinion. I already have implemented these objects to a satisfiable standard and this is purely to see what others think in case this is something that could be useful to more people and someone knows how to implement it quickly and nicely.

Just to make sure: you are aware that collevtions.OrderedDict is currently implement in pure python, right? You can check out the source code in collections/__init__.py. From what I can tell, _collections, the C level companion module currently does not provide an implementation in C for this data type (it does for a few others in the module)

No I wasn’t. :confused:

Somehow I was always sure it is implemented in C. Probably due to the fact that implementation is so efficient (as far as I remember my benchmarks).

Thanks for this, in this case my proposal doesn’t apply… Unless there is some magic that I can’t see, I doubt that structures I have mentioned can be improved any further in pure python - I have spent a fair bit of time trying to find most optimal solutions and wasn’t able to improve much on say SortedContainers.

OrderedDict is really an implementation of a circular doubly linked list with key-based mapping access, so a get_by_index method makes zero sense in terms of efficiency.

But I do support the OP’s idea of insert and swap methods since there are perfect use cases for them in queue management where a mapped doubly linked list is the most efficient data structure.

The insert method by the way should really be two methods, insert_before(ref_key, new_key, value) and insert_after(ref_key, new_key, value).

It will also satisfy recurring calls (like this, this and that) for a built-in data type of a doubly linked list on this forum, especially if we add two more methods next(key) and prev(key). Also nice to have an append_left (or append_first) method instead of having to assign by key and then move_to_end(key, last=False).

All of the methods mentioned above can be trivially implemented since the hard work has already been done.

OrderedDict’s usage has dwindled ever since dict became ordered so I share the OP’s sentiment that OrderedDict should justify its existence with broadened capabilities. The benefit-to-cost ratio in this case is just too good not to do it IMHO.

1 Like

While the _collections module does not implement OrderedDict itself, it does add OrderedDict’s C implementation in odictobject.c to the module’s namespace so that collections/__init__.py can import it.

Since collections/__init__.py will always try to use the C implementation when available, in order to run the pure Python version of OrderedDict in CPython, we would have to delete it from _collections’s namespace first:

import sys
sys.modules.pop('collections', None) # in case collections is pre-imported
import _collections
del _collections.OrderedDict
from collections import OrderedDict, _Link

After that, we can then subclass the Python version of OrderedDict to prototype some changes since the _OrderedDict__map attribute wouldn’t be otherwise available from the C implementation:

class InsertableOrderedDict(OrderedDict):
    def _insert_between(self, prev_link, next_link, key, value):
        self._OrderedDict__map[key] = link = prev_link.next = next_link.prev = _Link()
        link.prev, link.next, link.key = prev_link, next_link, key
        dict.__setitem__(self, key, value)

    def insert_before(self, ref_key, key, value):
        next_link = self._OrderedDict__map[ref_key]
        self._insert_between(next_link.prev, next_link, key, value)

    def insert_after(self, ref_key, key, value):
        prev_link = self._OrderedDict__map[ref_key]
        self._insert_between(prev_link, prev_link.next, key, value)

    def swap(self, key1, key2):
        link1, link2 = map(self._OrderedDict__map.__getitem__, (key1, key2))
        prev1, next1, prev2, next2 = link1.prev, link1.next, link2.prev, link2.next
        (link1.prev, link1.next, prev1.next, next1.prev,
         link2.prev, link2.next, prev2.next, prev2.prev) = (
            prev2, next2, link2, link2, prev1, next1, link1, link1
        )

so that:

d = InsertableOrderedDict(a=1, b=2, c=3)
a = d.copy()
a.insert_before('b', 'x', 0)
print(dict(a)) # outputs {'a': 1, 'x': 0, 'b': 2, 'c': 3}
b = d.copy()
b.insert_after('b', 'x', 0)
print(dict(b)) # outputs {'a': 1, 'b': 2, 'x': 0, 'c': 3}
d.swap('a', 'c')
print(dict(d)) # outputs {'c': 3, 'b': 2, 'a': 1}

Demo here

1 Like

This is not true; it’s implemented in C in cpython/Objects/odictobject.c at main · python/cpython · GitHub. The pure Python version is used only as a fallback if the C version isn’t available. I’m not sure what you’d even need to do when building CPython for that to happen, though.

1 Like

Yep, @blhsing already pointed this out. I am still confused by the structuring inside of _collectionsmodule.c since I would have expected more than one reference to odict/ordered in that file.

It seems that it could move things to a very desirable direction. Insertion complexity is O(1) as expected.

┃  5 repeats, 10 times    ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃ Units: µs             0 ┃
┃           ┏━━━━━━━━━━━━━┫
┃       100 ┃    64 ±   9 ┃
┃      1000 ┃   508 ±  20 ┃
┃     10000 ┃  5254 ± 163 ┃
┃    100000 ┃ 55846 ± 348 ┃
┗━━━━━━━━━━━┻━━━━━━━━━━━━━┛

Sorted containers deviated from implementation 100 lines long to 2K lines “beast” only to address the issue that insertion time into list with more than 1000 elements starts dominating the time it takes to bisect.

And although it ensures O(log(n)) for large objects, it is outperformed by 100 lines implementation for objects with less than 1000 elements.

With these features of OrderedDict, the complexity of efficient implementations of similar objects would decrease 10s of times. And would most likely result in better performance too.

Edit: still thinking whether it would suffice for this.

1 Like

I thought about it and my initial over-optimistic view proved to be wrong. I still can not see any easy pathway to improve upon the issues I referred to previously with improvements suggested by @blhsing. For what I am looking at ability to index and subsequent bisection is important and OrderedDict is not suitable for that. (I am uncertain if what I am looking at is currently realistic at all.)

Having that said, it seems that improvements suggested by @blhsing seem to be low cost compared to potential benefits. So hopefully this thread is still beneficial.

Every data structure has its strengths and weaknesses, and it’s up to you to decide which task of your application needs optimization the most and choose a data structure most efficient at performing that task. There’s no data structure that’s good at everything.

If your application needs a mapping that can efficiently access items by indices, then SortedDict, whose keys are maintained in a bisect-managed list, would be your best choice, with which you can obtain a key-item pair by an index through its indexable items() view.

1 Like

This is what I am aiming to improve upon. Not necessarily in performance, but at least in implementing it more concisely/elegantly.