In another thread (Scientific Utilities - Contributor & Development Discussion - Scientific Python) I was encouraged to submit some of my ideas to the Python discussion forum. So that’s what I’m doing. This post is about my thoughts on extending Python’s dictionary.
I have a strong opinion that dictionaries in Python should have key-based set operations. I have a baseline implementation in my ubelt library call SetDict - ubelt.util_dict module — UBelt 1.3.2 documentation
I spent a lot of time writing the docs, so I’ll just paste that description:
"""
A dictionary subclass where all set operations are defined.
All of the set operations are defined in a key-wise fashion, that is it is
like performing the operation on sets of keys. Value conflicts are handled
with left-most priority (default for ``intersection`` and ``difference``),
right-most priority (default for ``union`` and ``symmetric_difference``),
or via a custom ``merge`` callable similar to [RubyMerge]_.
The set operations are:
* union (or the ``|`` operator) combines multiple dicttionaries into
one. This is nearly identical to the update operation. Rightmost
values take priority.
* intersection (or the ``&`` operator). Takes the items from the
first dictionary that share keys with the following dictionaries
(or lists or sets of keys). Leftmost values take priority.
* difference (or the ``-`` operator). Takes only items from the first
dictionary that do not share keys with following dictionaries.
Leftmost values take priority.
* symmetric_difference (or the ``^`` operator). Takes the items
from all dictionaries where the key appears an odd number of times.
Rightmost values take priority.
Note:
The reason righmost values take priority in union /
symmetric_difference and left-most values take priority in intersection
/ difference is:
1. intersection / difference is for removing keys --- i.e. is used
to find values in the first (main) dictionary that are also in some
other dictionary (or set or list of keys), whereas
2. union is for adding keys --- i.e. it is basically just an alias
for dict.update, so the new (rightmost) keys clobber the old.
3. symmetric_difference is somewhat strange. I'm don't have a great
argument for it, but it seemed easier to implement this way and it
does seem closer to a union than it is to a difference. Perhaps
unpaired union might have been a better name for this, but take
that up with the set theorists.
Also, union / symmetric_difference does not make sense if arguments on
the rights are lists/sets, whereas difference / intersection does.
Note:
The SetDict class only defines key-wise set operations. Value-wise or
item-wise operations are in general not hashable and therefore not
supported. A heavier extension would be needed for that.
TODO:
- [ ] implement merge callables so the user can specify how to resolve
value conflicts / combine values.
References:
.. [RubyMerge] https://ruby-doc.org/core-2.7.0/Hash.html#method-i-merge
CommandLine:
xdoctest -m ubelt.util_dict SetDict
Example:
>>> import ubelt as ub
>>> a = ub.SetDict({'A': 'Aa', 'B': 'Ba', 'D': 'Da'})
>>> b = ub.SetDict({'A': 'Ab', 'B': 'Bb', 'C': 'Cb', })
>>> print(a.union(b))
>>> print(a.intersection(b))
>>> print(a.difference(b))
>>> print(a.symmetric_difference(b))
{'A': 'Ab', 'B': 'Bb', 'D': 'Da', 'C': 'Cb'}
{'A': 'Aa', 'B': 'Ba'}
{'D': 'Da'}
{'D': 'Da', 'C': 'Cb'}
>>> print(a | b) # union
>>> print(a & b) # intersection
>>> print(a - b) # difference
>>> print(a ^ b) # symmetric_difference
{'A': 'Ab', 'B': 'Bb', 'D': 'Da', 'C': 'Cb'}
{'A': 'Aa', 'B': 'Ba'}
{'D': 'Da'}
{'D': 'Da', 'C': 'Cb'}
Python dicts already has the union “|” operator. It would be really nice to get the rest of them. Set notation is so concise and expressive. It hurts to type {k: v for k, v in d.items() if k in s}
when I could just type d & s
. Is it possible to get a PEP started for this?