'insert', 'swap', 'get_by_index' for `OrderedDict`

dg-pb · May 1, 2024, 9:10pm

Given the fact that OrderedDict is not very flexible. E.g. I can not swap 2 items, or insert new item in a selected place, I almost never use it anymore as simple dict covers 99% of use cases of orderedmapping (at least for me).

And for more complex use cases I end up writing my own objects. E.g. I can not use OrderedDict for implementation of SortedDict, HeapDict, QueueDict, because adjusting order with move_to_end is much slower than with alternative approaches.

Implementing swap, insert, get_by_index for OrderedDictcould result in its recovery.

Can anyone else relate to this? Also, would implementation of these be difficult?

ntessore · May 1, 2024, 9:56pm

Instead of having a get_by_index method, I would very much like if both dict_items ,dict_keys, dict_values and odict_items, odict_keys, odict_values were subscriptable.

Rosuav · May 1, 2024, 10:02pm

That’s perfectly okay. A SortedDict isn’t really a subclass of OrderedDict; it’s completely reasonable for it to be its own thing.

Stefan2 · May 1, 2024, 10:37pm

Wouldn’t get_by_index and the subscriptable views be inefficient?

pf_moore · May 1, 2024, 10:39pm

Not really, no. I rarely if ever need features beyond what the standard dict provides. Very occasionally, the fact that dict preserves insertion order is convenient, but that’s about as far as I go. If I needed a more capable ordered container, I’d normally look on PyPI - the sortedcontainers library is supposed to be very good, for example.

As to implementation, I don’t know how complex an implementation would be. But there would likely be a lot of debate over what additional methods should be added (your list isn’t self-evidently correct, at least not to me), and making the implementation high performance could be tricky (my understanding is that OrderedDict is designed to be efficient, and existing users of the class won’t want to lose that efficiency for the sake of methods they don’t use).

But that’s as much as I know. If you want to take this idea further, it’ll probably need a PEP at some point, and in that PEP you’ll likely need to answer questions like:

Why add methods to OrderedDict leaving the user to implement the actual data types they want (SortedDict, HeapDict, QueueDict, …)? Would it not be better to provide the data types themselves?And if the answer is that we don’t know what types the user might need, how do we know the proposed methods are sufficient?
Why aren’t existing container types from PyPI sufficient? Why would a user implement their own by wrapping an OrderedDict rather than using an existing implementation?

Obviously, you don’t have to answer these now, but at some point before anything gets implemented these questions will need to be addressed.

dg-pb · May 1, 2024, 11:15pm

I don’t know. I did manage to make dict with efficient sub-scriptable views, but it combines several containers to achieve so. To be more precise - a separate list to store keys, then implemented dict methods that manage that list at the same time. After implementing several different-feature dictionaries I noticed that what is common between them all is that they can be achieved more easily, efficiently and elegantly if OrderedDict was more flexible.

I copied a lot of SortedContainers approach to make above data structures. I also wrote my own SortedList / SortedDict trying to improve upon SortedContainers. I did manage to make slightly more efficient objects, but nothing major. But that is beside the point. The point is that currently SortedDict/QueueDict/… need to hold another object for keys to provide efficient insertion and ordering methods and if there was a way to efficiently improve OrderedDict in ways I have mentioned it would lead to memory and possibly computational improvements of such objects.

Also, this post is just to share my opinion. I already have implemented these objects to a satisfiable standard and this is purely to see what others think in case this is something that could be useful to more people and someone knows how to implement it quickly and nicely.

MegaIng · May 1, 2024, 11:26pm

Just to make sure: you are aware that collevtions.OrderedDict is currently implement in pure python, right? You can check out the source code in collections/__init__.py. From what I can tell, _collections, the C level companion module currently does not provide an implementation in C for this data type (it does for a few others in the module)

dg-pb · May 1, 2024, 11:35pm

No I wasn’t.

Somehow I was always sure it is implemented in C. Probably due to the fact that implementation is so efficient (as far as I remember my benchmarks).

Thanks for this, in this case my proposal doesn’t apply… Unless there is some magic that I can’t see, I doubt that structures I have mentioned can be improved any further in pure python - I have spent a fair bit of time trying to find most optimal solutions and wasn’t able to improve much on say SortedContainers.

blhsing · May 2, 2024, 1:42am

OrderedDict is really an implementation of a circular doubly linked list with key-based mapping access, so a get_by_index method makes zero sense in terms of efficiency.

But I do support the OP’s idea of insert and swap methods since there are perfect use cases for them in queue management where a mapped doubly linked list is the most efficient data structure.

The insert method by the way should really be two methods, insert_before(ref_key, new_key, value) and insert_after(ref_key, new_key, value).

It will also satisfy recurring calls (like this, this and that) for a built-in data type of a doubly linked list on this forum, especially if we add two more methods next(key) and prev(key). Also nice to have an append_left (or append_first) method instead of having to assign by key and then move_to_end(key, last=False).

All of the methods mentioned above can be trivially implemented since the hard work has already been done.

OrderedDict’s usage has dwindled ever since dict became ordered so I share the OP’s sentiment that OrderedDict should justify its existence with broadened capabilities. The benefit-to-cost ratio in this case is just too good not to do it IMHO.

blhsing · May 2, 2024, 7:01am

While the _collections module does not implement OrderedDict itself, it does add OrderedDict’s C implementation in odictobject.c to the module’s namespace so that collections/__init__.py can import it.

Since collections/__init__.py will always try to use the C implementation when available, in order to run the pure Python version of OrderedDict in CPython, we would have to delete it from _collections’s namespace first:

import sys
sys.modules.pop('collections', None) # in case collections is pre-imported
import _collections
del _collections.OrderedDict
from collections import OrderedDict, _Link

After that, we can then subclass the Python version of OrderedDict to prototype some changes since the _OrderedDict__map attribute wouldn’t be otherwise available from the C implementation:

class InsertableOrderedDict(OrderedDict):
    def _insert_between(self, prev_link, next_link, key, value):
        self._OrderedDict__map[key] = link = prev_link.next = next_link.prev = _Link()
        link.prev, link.next, link.key = prev_link, next_link, key
        dict.__setitem__(self, key, value)

    def insert_before(self, ref_key, key, value):
        next_link = self._OrderedDict__map[ref_key]
        self._insert_between(next_link.prev, next_link, key, value)

    def insert_after(self, ref_key, key, value):
        prev_link = self._OrderedDict__map[ref_key]
        self._insert_between(prev_link, prev_link.next, key, value)

    def swap(self, key1, key2):
        link1, link2 = map(self._OrderedDict__map.__getitem__, (key1, key2))
        prev1, next1, prev2, next2 = link1.prev, link1.next, link2.prev, link2.next
        (link1.prev, link1.next, prev1.next, next1.prev,
         link2.prev, link2.next, prev2.next, prev2.prev) = (
            prev2, next2, link2, link2, prev1, next1, link1, link1
        )

so that:

d = InsertableOrderedDict(a=1, b=2, c=3)
a = d.copy()
a.insert_before('b', 'x', 0)
print(dict(a)) # outputs {'a': 1, 'x': 0, 'b': 2, 'c': 3}
b = d.copy()
b.insert_after('b', 'x', 0)
print(dict(b)) # outputs {'a': 1, 'b': 2, 'x': 0, 'c': 3}
d.swap('a', 'c')
print(dict(d)) # outputs {'c': 3, 'b': 2, 'a': 1}

Demo here

Jelle · May 2, 2024, 1:22pm

This is not true; it’s implemented in C in cpython/Objects/odictobject.c at main · python/cpython · GitHub. The pure Python version is used only as a fallback if the C version isn’t available. I’m not sure what you’d even need to do when building CPython for that to happen, though.

MegaIng · May 2, 2024, 1:28pm

Yep, @blhsing already pointed this out. I am still confused by the structuring inside of _collectionsmodule.c since I would have expected more than one reference to odict/ordered in that file.

dg-pb · May 2, 2024, 3:04pm

It seems that it could move things to a very desirable direction. Insertion complexity is O(1) as expected.

┃  5 repeats, 10 times    ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃ Units: µs             0 ┃
┃           ┏━━━━━━━━━━━━━┫
┃       100 ┃    64 ±   9 ┃
┃      1000 ┃   508 ±  20 ┃
┃     10000 ┃  5254 ± 163 ┃
┃    100000 ┃ 55846 ± 348 ┃
┗━━━━━━━━━━━┻━━━━━━━━━━━━━┛

Sorted containers deviated from implementation 100 lines long to 2K lines “beast” only to address the issue that insertion time into list with more than 1000 elements starts dominating the time it takes to bisect.

And although it ensures O(log(n)) for large objects, it is outperformed by 100 lines implementation for objects with less than 1000 elements.

With these features of OrderedDict, the complexity of efficient implementations of similar objects would decrease 10s of times. And would most likely result in better performance too.

Edit: still thinking whether it would suffice for this.

dg-pb · May 3, 2024, 1:22pm

I thought about it and my initial over-optimistic view proved to be wrong. I still can not see any easy pathway to improve upon the issues I referred to previously with improvements suggested by @blhsing. For what I am looking at ability to index and subsequent bisection is important and OrderedDict is not suitable for that. (I am uncertain if what I am looking at is currently realistic at all.)

Having that said, it seems that improvements suggested by @blhsing seem to be low cost compared to potential benefits. So hopefully this thread is still beneficial.

blhsing · May 6, 2024, 5:25am

Every data structure has its strengths and weaknesses, and it’s up to you to decide which task of your application needs optimization the most and choose a data structure most efficient at performing that task. There’s no data structure that’s good at everything.

If your application needs a mapping that can efficiently access items by indices, then SortedDict, whose keys are maintained in a bisect-managed list, would be your best choice, with which you can obtain a key-item pair by an index through its indexable items() view.

dg-pb · May 6, 2024, 2:28pm

This is what I am aiming to improve upon. Not necessarily in performance, but at least in implementing it more concisely/elegantly.

keita · April 13, 2025, 2:47pm

I am considering adding new APIs to OrderedDict. In this discussion thread, ideas like insert_before, insert_after, and swap have been proposed. I have already attempted an implementation on CPython (and I am serious about pushing this forward):

github.com/oda/cpython

No intention to merge: Add insert_before, insert_after, and swap to collections.OrderedDict.

main ← ordereddict

opened 07:49PM - 12 Apr 25 UTC

oda

+403 -1

This PR updates OrderedDict’s C implementation, and adds insert_before, insert_a…fter, and swap, which are discussed in the thread below. https://discuss.python.org/t/insert-swap-get-by-index-for-ordereddict/52375/5 from collections import OrderedDict d = OrderedDict({i:i for i in range(10)}) d = OrderedDict(a=1, b=2, c=3) a = d.copy() a.insert_before('b', 'x', 0) print(dict(a)) b = d.copy() b.insert_after('b', 'x', 0) print(dict(b)) d.swap('a', 'c') print(dict(d))

I would like to outline my thoughts here.

1. OrderedDict as a Hybrid Data Structure

OrderedDict can be seen as part of a broader category called hybrid data structures, which combine several data structures, in this case a Doubly-Linked List and a Hash Map to leverage the strengths of both:

Fast lookups (via Hash Map)
Ordered traversal (via Doubly-Linked List)

Comparisons with Other Languages:

C++: Boost’s multi_index container allows hybrid structures via templates.
→ Reference: Boost.MultiIndex
Java: LinkedHashMap combines a Hash Map and Doubly-Linked List, ideal for LRU caches.
Python: SortedDict (from sortedcontainers) behaves similarly to a hybrid of a Tree Map and Doubly-Linked List.

Current Limitations:
While OrderedDict exists in Python’s standard library, its API is incomplete. Adding insert_before, insert_after, and swap would unlock the full potential of a hybrid data structure.

2. Why Hybrid Structures Matter

Use Cases:

SortedDict (O(log n) operations) is sufficient for ordered keys, but OrderedDict (O(1) for linked list operations) is better suited for scenarios like HTML attribute order preservation or LRU caches.
Technical interviews often restrict solutions to the standard library. For example:

“Design a data structure to track string counts and retrieve min/max counts efficiently.”

While SortedDict could solve this, OrderedDict with insert_before/insert_after would simplify the solution.

Original Motivation for OrderedDict:
The original rationale for OrderedDict focused on HTML attributes and LRU caches. Enhancing its hybrid capabilities would revive its relevance.

3. Proposed API Design

Current Implementation Attempt:

# Insert ‘x’ with value 0 after key ‘b’
b.insert_after(‘b’, ‘x’, 0)

This syntax feels unintuitive due to parameter ordering.

Alternative Proposal:
Extend move_to_end() with an endpos parameter:

#Insert ‘key’ after ‘endpos’
od.move_to_end(key, last=True, endpos=‘target_key’)
#Insert ‘key’ before ‘endpos’
od.move_to_end(key, last=False, endpos=‘target_key’)

This minimizes API changes while adding flexibility.
I am open to any ideas.

Additional Suggestion:
A next_key(key, reverse=False) method would fully expose the Doubly-Linked List’s capabilities.

4. Reviving OrderedDict

Current State:
Since regular dict became order-preserving, OrderedDict has lost much of its purpose beyond LRU caches.

Future Vision:
By embracing its role as a Doubly-Linked List + Hash Map hybrid, OrderedDict could become indispensable for:

Technical interview solutions (e.g., custom priority queues)
Systems requiring O(1) insertions/deletions at arbitrary positions

This would make Python’s standard library more competitive and widely applicable.

Conclusion:
Enhancing OrderedDict with these APIs aligns with Python’s philosophy of practicality and would fill a critical gap in its standard data structures.

dg-pb · April 14, 2025, 2:16am

What would this do exactly?

keita · April 14, 2025, 3:15pm

For example,

d = OrderedDict({i:i for i in range(10)})
d.next_key(5) == 6
d.next_key(5, reverse=True) == 4

dg-pb · April 14, 2025, 3:43pm

import collections
d = collections.OrderedDict(a=1, b=2, c=3, d=4)
# OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d': 4)])

Consider following operations that potentially takes advantage of most of possibilities:

Get/pop 3rd item efficiently
Insert item into 3rd place
Insert item after key ‘b’
Swap item 2 with item 3
Swap key ‘b’ with key ‘c’
Swap key ‘b’ with item in 3rd place
Move key ‘a’ to 3rd place
Move key ‘a’ before ‘c’
Move key ‘a’ after ‘c’
Move item 1 after ‘c’

Anything missing?

How would each of these look with your proposal?