Add BiMap - a bidirectional dictionary to collections module

AvitalOhayon · December 2, 2024, 11:35pm

Python lacks a built-in bidirectional dictionary implementation.
While traditional dictionaries provide O(1) lookup by key, many use cases require efficient lookup by both key and value.
Currently, developers either maintain two synchronized dictionaries.
The collections module, which already provides specialized container datatypes, is the natural place to add this functionality.

This implementation follows the same concept as boost::bimap in C++ and BiMap in Scala.
While there are third-party Python packages providing similar functionality (like bidict), having this as part of the standard collections module would provide a standard, efficient solution for this use case.

Proposed Solution:

from collections import BiMap


# Create a bidirectional mapping for users
users = BiMap({
    1: "john_doe",
    2: "alice_smith",
    3: "bob_wilson"
})

# Key lookup (O(1))
users[1]                    # "john_doe"

# Value lookup (O(1))
users.get_by_value("john_doe")  # 1

# Adding a new user - automatically handles both directions
users[4] = "emma_brown"

# Check if username exists
"john_smith" in users.values    # True

# Check if ID exists
1 in users                      # True

# Get all usernames or IDs
users.values()                  # View of all usernames
users.keys()                    # View of all IDs

davidism · December 2, 2024, 11:47pm

A package for this already exists: https://bidict.readthedocs.io

AvitalOhayon · December 3, 2024, 12:25am

Yes, but it is an external package.
I want a standard solution for this common need without relying on external dependencies.

Stefan2 · December 3, 2024, 12:37am

How “common” is that need? I don’t recall ever needing it or seeing someone ask for it. And isn’t boost third-party, too?

AvitalOhayon · December 3, 2024, 12:54am

For me, it has been very useful in several projects.
Yes, Boost is also third-party

oscarbenjamin · December 3, 2024, 12:59am

Did you use the existing third party package in those projects?

Most projects will use some third party packages. The fact that third party packages can easily be installed means that there should be a strong argument for why something should be included in the stdlib. I doubt that there is a strong enough argument for BiMap which many people will never have used and which is also not hard to implement on top of dict anyway. For many purposes this is enough:

reverse_dict = {v: k for k, v in some_dict.items()}

blhsing · December 3, 2024, 6:34am

I personally use bidirectional dicts quite frequently, but have always simply worked around it by creating reverse mappings as @oscarbenjamin already pointed out.

A readily available bidirectional dict is something I would definitely take advantage of if it were in the standard library but is something I would not bother to add as an external dependency. Including it in the stdlib would be an easy code clutter reducer with a minimal effort IMHO.

A quick search of "v: k for k, v" in GitHub yields 102k results, showing that this is indeed a common pattern.

And here’s an example of where it can be used in CPython:

github.com

python/cpython/blob/dffb90911a585a0921664c8b1c229d0883e65ee7/Lib/_compat_pickle.py#L166


      
              'ProcessError',
              'TimeoutError',
          )
          
          for excname in MULTIPROCESSING_EXCEPTIONS:
              NAME_MAPPING[("multiprocessing", excname)] = ("multiprocessing.context", excname)
          
          # Same, but for 3.x to 2.x
          REVERSE_IMPORT_MAPPING = dict((v, k) for (k, v) in IMPORT_MAPPING.items())
          assert len(REVERSE_IMPORT_MAPPING) == len(IMPORT_MAPPING)
          REVERSE_NAME_MAPPING = dict((v, k) for (k, v) in NAME_MAPPING.items())
          assert len(REVERSE_NAME_MAPPING) == len(NAME_MAPPING)
          
          # Non-mutual mappings.
          
          IMPORT_MAPPING.update({
              'cPickle': 'pickle',
              '_elementtree': 'xml.etree.ElementTree',
              'FileDialog': 'tkinter.filedialog',
              'SimpleDialog': 'tkinter.simpledialog',
              'DocXMLRPCServer': 'xmlrpc.server',

blhsing · December 3, 2024, 6:46am

Avital Ohayon:

from collections import BiMap


# Create a bidirectional mapping for users
users = BiMap({
    1: "john_doe",
    2: "alice_smith",
    3: "bob_wilson"
})

# Key lookup (O(1))
users[1]                    # "john_doe"

# Value lookup (O(1))
users.get_by_value("john_doe")  # 1

# Adding a new user - automatically handles both directions
users[4] = "emma_brown"

# Check if username exists
"john_smith" in users.values    # True

# Check if ID exists
1 in users                      # True

# Get all usernames or IDs
users.values()                  # View of all usernames
users.keys()                    # View of all IDs

If the keys are all strings and the dict is not going to be altered after definition, you can currently make it an Enum instead (your example code does alter the dict after definition though so I know this won’t fully work for you):

from enum import IntEnum

class Users(IntEnum):
    john_doe = 1
    alice_smith = 2
    bob_wilson = 3

Users(1).name # 'john_doe'
Users['john_doe'] # 1
'john_doe' in Users.__members__ # True
1 in Users # True

Stefan2 · December 3, 2024, 9:41am

Right, I’ve done/seen that. But I think never with having to update them in sync afterwards, i.e., what the proposal would offer.

Is it? If I don’t misread it, the original and the reverse ones are not kept in sync afterwards, see the different updates in the “Non-mutual mappings” section.

AvitalOhayon · December 3, 2024, 11:30am

Right, it is easy to create a reversed_map using {v: k for k, v in original_map.items()},
but maintaining it is less convenient. You have to update both maps with every change.

I think {v: k for k, v in original_map.items()} is very useful because, in cases where a reversed map is needed temporarily, developers often prefer to use this one-liner instead of adding a third-party dependency.
If this functionality were included in the standard library, it would be more convenient.

It also feels intuitive to include it in the collections module, which already contains many specialized dictionary types

picnixz · December 3, 2024, 12:22pm

Actually, Scala’s BiMap is not a native type but a 3rd-party type (from Guava or Apache probably). AFAIK, other languages such as Rust, TypeScript or Perl do not support bidirectional mappings natively. So, I’m really not in favor of adding this pattern to the standard library.

Note that in order to have a bidirectional map, we need to restrict to hashable values (if you want O(1) lookup) and handle duplicated values as well. I suspect we’ll likely have just two internal mappings that do the trick and I don’t think it’s more efficient than having two independent dicts that you’ll update simultaneously (you can wrap that in a Mapping-interface with 2 internal dict attributes for instance).

It also feels intuitive to include it in the collections module, which already contains many specialized dictionary types

While they contain specialized dictionary types, they are here for efficiency and are mostly legacy content. So I doubt we’ll want to have more data structures like this.

If you want to add such type, I think a PEP is likely needed. Or we can have some recipe page for that (we have recipes for itertools, but having a recipe for collections could perhaps be a good start).

chepner · December 3, 2024, 12:46pm

If it belonged in the standard library at all, it would be included in the collections module. The argument for inclusion in the standard library at all has become harder to make since the Internet, and particularly PyPI, has become more widely available since the early days of Python.

Nineteendo · December 3, 2024, 1:17pm

Something like this? (adapted from chatGPT)

class BidirectionalDict(UserDict):
    def __init__(self, *args, **kwargs):
        self.inverse_data = {}
        super().__init__(*args, **kwargs)

    def __setitem__(self, key, value):
        if key in self.data:
            if value == (old_value := self.data[key]):
                return

            del self.inverse_data[old_value]

        if value in self.inverse_data:
            raise ValueError("Duplicate value")

        self.data[key] = value
        self.inverse_data[value] = key

    def __delitem__(self, key):
        value = self.data[key]
        del self.data[key]
        del self.inverse_data[value]

    @property
    def inverse(self):
        inverse_bidict = BidirectionalDict()
        inverse_bidict.data = self.inverse_data
        inverse_bidict.inverse_data = self.data
        return inverse_bidict

Edit: preserve ordering of inverse dict

picnixz · December 3, 2024, 1:22pm

Essentially. Possibly with a covariant return for inverse but that’s the idea. We already include a lot of one-liners for itertools and deliberately don’t want to add them to itertools itself so we can do the same for collections.

blhsing · December 4, 2024, 3:38am

Ah you’re right. That isn’t a good example then.

A better example may be logging.addLevelName, which keeps a reverse mapping in sync:

github.com

python/cpython/blob/7f882c88cfda486947974cb82c20a1ae7047edfc/Lib/logging/init.py#L163


      
                  return result
              return "Level %s" % level
          
          def addLevelName(level, levelName):
              """
              Associate 'levelName' with 'level'.
          
              This is used when converting levels to text during message formatting.
              """
              with _lock:
                  _levelToName[level] = levelName
                  _nameToLevel[levelName] = level
          
          if hasattr(sys, "_getframe"):
              currentframe = lambda: sys._getframe(1)
          else: #pragma: no cover
              def currentframe():
                  """Return the frame object for the caller's stack frame."""
                  try:
                      raise Exception
                  except Exception as exc:

jab · December 11, 2024, 1:34am

I’m the author of bidict. Just saw this thread[1] and figured I’d chime in.

Here is the full list of “missing bidicts in the standard library” that I know about. To summarize:

logging._levelToName
dis.opmap / dis.opnames
html.entities.name2codepoint / html.entities.codepoint2name

I’d welcome anyone who’s started writing their own code that implements a bidirectional mapping to read my code, try it out, and let me know how well it works for you.

-Josh

[1] Please feel free to @-mention me in the future! Wouldn’t want to miss any threads here that mention bidict.

Nineteendo · December 11, 2024, 6:39am

How complete is the implementation from above?

rrolls · December 12, 2024, 2:18pm

I’d personally love to see a bidict available in builtins, never mind just the standard library - I’d consider it as important and as universally-relevant as dict, list and so on. I use the {v: k for k, v in d.items()} construct in numerous places. Usually my use cases are for read-only data but there have been places where I’ve defined a class with some basic functionality for a pair of dicts that can be mutated and kept in sync, so I’d think it’s worth having mutating functionality in there and perhaps leaving the concept of a frozenbidict to be filled in by a third-party package like is currently the case for frozendict. _{(Strictly speaking, I’d actually argue that frozendict and frozenbidict ought to be in the stdlib too, but having bidict in stdlib is far more important IMO.)}

Lucas_Malor · December 12, 2024, 8:59pm

A question: a bidirectional map needs two hash tables necessarily?

Wombat · December 15, 2024, 11:45pm

There isn’t any way around this if you want efficient lookup in either direction.

If there were a better solution, then printed foreign language dictionaries wouldn’t have both an English-to-French section and a French-to-English section in the same book.