Add BiMap - a bidirectional dictionary to collections module

dg-pb · December 16, 2024, 2:22am

Good question.

It can be done with 1 hash table:

class BiMap(dict):
    _r = None

    def __init__(self, inv=False):
        self.inv = inv

    @property
    def r(self):
        if self._r is None:
            self._r = BiMap(not self.inv)
            self._r.update(self)
            self._r._r = self
        return self._r

    def __setitem__(self, key, value):
        full_key = (self.inv, key)
        full_value = (not self.inv, value)
        super().__setitem__(full_key, value)
        super().__setitem__(full_value, key)

    def __getitem__(self, key):
        full_key = (self.inv, key)
        return super().__getitem__(full_key)


if __name__ == '__main__':
    d = BiMap()
    d['a'] = 1
    print(d['a'])
    print(d.r[1])
    print(d.r.r['a'])
    print(d.r.r.r[1])

I am sure one can come with other ways, custom hashtables, etc…

But given standard practice is to use 2, I don’t think any of possibilities offer much beyond additional complexity. If you find anything interesting, please share.

Paddy3118 · December 16, 2024, 3:30pm

Ideally. If keys + values are distinct then you could store the reverse mapping in the same directory, but I tend not to:

vals = 'ABCDEFGHIJ'
keys = list(range(len(vals)))
# I do this
kvdict = {k: v for k, v in zip(keys, vals)}
vkdict = {v: k for k, v in zip(keys, vals)}
# I suppose one could use this:
bidict = {}
for k, v in zip(keys, vals):
    bidict[k] = v
    bidict[v] = k

jab · December 16, 2024, 4:04pm

See Introduction - bidict for some reasons you might not want to do it this way.

Lucas_Malor · December 16, 2024, 5:16pm

Thanks for the replies. I’m not thinking about a Python implementation. In Python, I think you need two dicts necessarily. But in CPython? Can one hashtable be used safely to store both keys and values?

jamestwebber · December 16, 2024, 5:30pm

You could implement the version that @dg-pb described in some other language, but I don’t think this gets you anything valuable.

dg-pb · December 16, 2024, 5:32pm

You can think of Python code as pseudocode. It largely applies to everything that can be done in CPython.

In CPython dict usage is pretty much the same.

There are some paths to go deeper, but that would be a rabbit hole and large amount of work with most likely little value. (Have a look at dict implementation to have a sense of work it would need to implement something custom without re-using existing dict).

Nineteendo · December 16, 2024, 6:19pm

I created an issue to add my recipe from above to the documentation: Add BidirectionalDict recipe to `collections` · Issue #128001 · python/cpython · GitHub

We’ll see how that goes. It’s simple, maintains insertion order and checks for duplicates.

xitop · December 17, 2024, 8:17am

Q: What is the rationale for not allowing duplicate values in bijective mappings? Usually a new item overwrites the previous one. (I understand that the key previously associated with the value must be removed)

picnixz · December 17, 2024, 10:01am

What is the rationale for not allowing duplicate values in bijective mappings

If you have an implementation based on two dicts, you don’t want have the key->val dict containing key1->VAL and key2->VAL while the other contains VAL->key1 or (exclusively) VAL->key2.

Now, OOP-wise, I think a bidict should expose a “read” view in terms of “value to key” and a “read-write” interface for “key to value”. In other words, you should not be able to insert a new entry using its value. Sorry for being unclear when I said “no duplicated values”.

zhangyx · December 17, 2024, 8:42pm

IMO BiMap is a completely symmetric data structure, I see no reason why the inverse mapping needs to be read-only.

picnixz · December 17, 2024, 8:57pm

One reason is: what would x.keys() be after a call x[k] = v? is v the key or is it k? For instance if we consider the loggingLevels example, using loggingLevels[name] = value and then loggingLevels.keys() should yield the level names, but not the values. If we were to allow values to be mapped to keys, then this would be via a separate method.

I wouldn’t mind this separate method but for a first iteration, I’d prefer sticking with something simple. Unfortunately, we rejected adding a recipe so I think users will need to use 3rd-party libs.

xitop · December 18, 2024, 10:23am

My expectation is:

fwd = bidict(a=1, b=2, c=3)
rev = fwd.reverse()
assert type(fwd) is type(rev) is bidict
assert rev.reverse() is fwd
rev[4] = 'd'
assert fwd.keys() == rev.values() == {'a', 'b', 'c', 'd'}
assert fwd.values() == rev.keys() == {1, 2, 3, 4}

And I’d like to add an example related to my question about duplicates.
What should be the outcome and why?

bi = bidict(a=1, b=2, c=3)
bi['a'] = 3

# option 1:
assert bi == {'a': 3, 'b':2}  # 'c':3 is auto-deleted

# option 2: Some sort of  <Duplicate_value_exception> raised

Nineteendo · December 18, 2024, 11:22am

See: Basic Usage - bidict

This could result in surprises or problems down the line.
Errors should never pass silently. Unless explicitly silenced.

If you want to overwrite the key, you can do this instead:

bi.reverse[3] = "a"

xitop · December 18, 2024, 1:00pm

Thanks for the link. Well, a duplicate value error is what most people here want and expect. The implication is that updating requires special attention. Imagine bi = bidict(a=1, b=2, ...) that should be modified to bidict(a=2, b=1, ...).

bi['a'] = 2 # error (clash with 'b':2)
bi.reverse[2] = 'a' # error (clash with 1:'a')

The linked article shows a forced update, etc. but the added complexity and again several options how to deal with it persuaded me that the proper place for a bi-dictionary implementation is a separate module in PyPi.

A frozen bidict - which does not have any problems with duplicities and updates - is still quite a helpful data structure. But we have that as stated immediately at the discussion’s beginning:

{v: k for k, v in mydict.items()}

Nineteendo · December 18, 2024, 1:07pm

@jab, would a recipe be useful to add in the documentation of bidict?

dg-pb · December 18, 2024, 2:14pm

This is how I solved this one for myself.

I have 2 objects:

OneToOne
ManyToMany

OneToOne is strictly Bijection - Wikipedia. Duplicate keys/values raise an error.

ManyToMany doesn’t have this restriction. It is a graph-like in a sense that one key can lead to many values. And its operations are edge-like.

Both above have forward and inverse mappings read & write.

Paddy3118 · December 19, 2024, 10:44am

I’d guess delete operations in the many to many case would need some thought?

In the wikipedia example of the many-to-many, bidirectional mapping between books and authors, you can add books with many authors and authors with many books. On deleting books, you could update Authors to eventually have no books. You could then delete the author, but you would also loose the information that they are an Author. I guess if you are allowed to add books with no authors and Authors without books then you need to be able to keep Authors without books on deletion of their last book, and vice-versa. You would presumably be able to add Authors to existing books and books to existing Authors, incrementally too?

I don’t need it now, so I’ve no code. But it is interesting…

dg-pb · December 19, 2024, 1:11pm

I think more simple way to think about it is:

You have 2 sets of vertices: A & B
And you can have edges from a in A to b in B

Bottom right:

The rest is features and implementation detail.