Add Difference Operators To Dict

Hello community :slight_smile:
Following the previous discussion that I started, I thought and came to several conclusions - the desire to add the operators -, -= to the dictionary in Python.

I’m Looking for Sponsorship

This proposal is currently seeking a sponsor from among the Python core developers. If you think the proposed idea merits further consideration and development into a full PEP, please reach out. I am prepared to refine this proposal with more detailed specifications, rationales, and comprehensive use cases upon obtaining sponsorship.

Abstract

This PEP proposes the addition of two new operators, - and -= to Python’s dictionary type. This enhancement follows the introduction of union operators (| and |=) for dictionaries in PEP 584. The subtraction operator (-) will enable straightforward and efficient removal of keys from a dictionary, producing a new dictionary as a result, while the in-place subtraction operator (-=) will modify the original dictionary. These operators are proposed to work only between dictionaries initially, simplifying both implementation and conceptual overhead.

Motivation

Following the successful adoption of PEP 584, which introduced the | and =| operators for dictionaries, there exists a natural progression to enhance dictionary manipulations further by introducing subtraction operators - and -=. This addition will provide a clear, concise syntax for creating new dictionaries by removing specific keys, a common task currently handled by less direct methods that often involve loops or comprehensions. The proposed operators are intuitive and align with Python’s overarching design philosophy of readability and simplicity in common operations.

Additionally, in previous discussions, the symmetric difference operation (^) for dictionaries was considered. However, it seems that the use case for such an operator was limited and not as broadly useful as subtraction operations, which are more commonly required in practice (from a quick search I found dozens of use cases for the substruction operators in open-source, some of those examples can be found below).

Rationale

The rationale for introducing - and -= for dictionaries is to provide dictionary users with intuitive and efficient tools for key removal, akin to those available for sets. This PEP takes the position that this behavior is simple, obvious, usually the behavior we want, and should be the default behavior for dicts.
From first look it seems that the - would be more useful than any of the other possible operations.

Note: it is debatable if we should expend all the set operations to dicts. This is out of focus for the PEP that I wish to propose, and for the sake of consistency and conciseness will not be included in this PEP.

Specification

The proposed syntax and behavior for the new operators are as follows:

  • d1 - d2: Returns a new dictionary that includes only the keys from d1 that are not present in d2. The values in the resulting dictionary are from d1.
    I.e. d1 - d2 = {k : v for k in d1 if k not in d2}, Example:

    d1 = {"a": 1, "b": 2, "c": 3}
    d2 = {"b": 4}
    d3 = d1 - d2  # d3 is {"a": 1, "c": 3}
    
  • d1 -= d2: Modifies d1 in place, removing keys that appear in d2.Example:

    d1 = {"a": 1, "b": 2, "c": 3}
    d2 = {"b": 4}
    d1 -= d2  # d1 is now {"a": 1, "c": 3}
    

While there are compelling use cases for extending these operations to allow d1 - set1, this proposal initially limits the operations to dictionary operands to maintain focus and simplicity. Future expansions could explore interactions between dictionaries and sets, as well as further extensions to the | operator introduced in PEP 584.

Code Examples and Use Cases

All the code examples here are from packages installed on my computer under the python3.10/site-packages.

numpy/distutils/ccompiler_opt.py
Before:

            feature.update({
                k:v for k,v in cfeature.items() if k not in feature
            })

After:

            feature |= cfeature - feature

matplotlib/axes/_axes.py
Before:

        return self.plot(
            *args, **{k: v for k, v in kwargs.items() if k not in d})

After:

        return self.plot(*args, **(kwargs - d))

sqlalchemy/cyextension/collections.pyx
Before:

        else:
            other = {cy_id(obj): obj for obj in iterable}
        result._members = {k: v for k, v in self._members.items() if k not in other}

After:

        else:
            other = {cy_id(obj): obj for obj in iterable}
        result._members = self._members - other

mypy/solve.py
Before:

        originals.update({v.id: v for v in c.extra_tvars if v.id not in originals})

After:

        originals |= {v.id: v for v in c.extra_tvars} - originals

setuptools/dist.py
Before:

        metadata_only = set(self._DISTUTILS_UNSUPPORTED_METADATA)
        metadata_only -= {"install_requires", "extras_require"}
        dist_attrs = {k: v for k, v in attrs.items() if k not in metadata_only}

After (this one is an example of dict - set operation that is a debatable and not included in the PEP that I want to propose):

        metadata_only = set(self._DISTUTILS_UNSUPPORTED_METADATA)
        metadata_only -= {"install_requires", "extras_require"}
        dist_attrs = attrs - metadata_only

References

Notes

  • To be consistent with the |= existing operator, we should consider include a dict.difference method. I’m not sure if we should do so and I think that is a good point for discussion.

Thanks to everyone in advance! :slight_smile:

1 Like

I strongly disagree that an operator that completely ignores the values on the RHS in all cases can be considered “natural”. Why not only remove if the values match, for instance?

Given this unnatural ambiguity in the definition, this should be a function with a clear name, not an operator.

1 Like

The question you raise is important and the answer is not self-evident, at the same time, a very similar discussion has already been done within the work of PEP 584 and to maintain consistency it is indeed a natural extension.

Also, as shown in the use cases, the useful case is to replace the dict comprehension that contains the if k not in d2.
In addition, the behavior you are referring to already exists using the d1.items() - d2.items() operation.

To sum up, matching by the key-value pair is less useful, less clear, and less consistent with the | operator. But I agree that this conclusion is not obvious.

1 Like

Have you looked for prior art? What other languages support subtraction on their mapping types, and what semantics do they follow?

The | operator does not always ignore the values on the RHS, so it cannot be argued that this would be consistent or a “natural extension”.

Not every operator needs be defined on every type.

The dict comprehension is perfectly clear as-is.

PEP 584

Please avoid posting pictures of text.

Yes, I found some similar operations in scala and ruby:

  • Scala’s collections include Map, which allows subtractive operations via methods like --, which removes a specified set of keys from the map:
    val map1 = Map("a" -> 1, "b" -> 2, "c" -> 3)
    val map2 = Map("b" -> 2)
    val result = map1 -- map2.keys  // Map("a" -> 1, "c" -> 3)
    
  • Ruby allows the subtraction of one hash from another using the except method (recent versions) where keys from one hash are removed from another. It’s not operator-based but is a method call which achieves the same result:
    h1 = {a: 1, b: 2, c: 3}
    h2 = {b: 2}
    h3 = h1.except(*h2.keys)  # {a: 1, c: 3}
    

There are languages that do not implement such behavior at all, for example javascript.

Also, it is important to note that python itself supports such operation via libraries, for example Pandas offer mechanisms to perform operations that could be conceptually similar (e.g., filtering rows/columns from DataFrames, which are dictionary-like).

Both of the examples you give from other languages allow subtraction of a set of keys, not of another dictionary.

4 Likes

Not a dict on the RHS and not the Scala subtraction operator (-).

As you say, not an operator.

Done via a method, not an operator, which strongly suggests we should not make this an operator either.

I suppose that the natural analogue of such an operation in Python would be to subtract an iterable of keys:

d -= ['foo', 'bar']

A dict is then just one example of an iterable of keys.

5 Likes

Thats right, what I am offering is different from what exists in other languages. I brought the similar cases.

There might be advantages of restricting it to sets only, as they already provide much of the functionality. And dict.keys() returns an immutable set:

from typing import Protocol, runtime_checkable

@runtime_checkable
class SetProtocol(Protocol):
    def __len__(self): ...
    def __iter__(self): ...
    def __contains__(self): ...
    def __le__(self): ...
    def __lt__(self): ...
    def __gt__(self): ...
    def __ge__(self): ...
    def __eq__(self): ...
    def __and__(self): ...
    def isdisjoint(self): ...
    def __or__(self): ...
    def __ror__(self): ...
    def __sub__(self): ...
    def __rsub__(self): ...
    def __xor__(self): ...
    def __rxor__(self): ...

print(isinstance({}.keys(), SetProtocol))  # True

Also, then it makes sense that values are ignored.

Nobody has (as far as I can see) justified this being the - operator though. I can’t think of another case of using the additive ring operators between two types of fundamentally different shape like this. Why not just add a popall method?

I’m -1 on dict - dict (because the values in the 2nd dict are ignored). I’m -1 on dict - iterable because it’s a performance trap (key in iterable takes O(n) time, and if the iterable is a set or a dict, you can easily get O(1) with custom code). I’m -1 on dict - set_like because requiring the RHS to be set like feels like an arbitrary restriction.

The problem is that dict - things_to_remove is intended to be the “one obvious way” to do this operation, so making it less intuitive, or less efficient, than the alternatives is a bad choice. And the best approach is highly dependent on the actual objects involved.

Maybe a better design would be to have a method that constructs a new dict by selecting a subset of keys from an existing dict, and then this operation would be d.select(d.keys() - other_dict.keys()). But at that point, you’re so close to {k:d[k] for k in d.keys() - other_dict.keys()} (or simply {k:d[k] for k in d if k not in other_dict}) that it’s hardly worth the effort…

Surely the algorithm would iterate over the RHS and use the O(1) lookup in the LHS dict? Copy all then remove for -.

How do you iterate over “things to exclude” and generate the keys that you don’t want to exclude? That’s why “select” can be more efficient than “popall”…

Edit: Did I miss the second sentence on first read, or did you edit? Copy all then remove is better, but it means copying everything at the start, which could be a lot.

It was edited. It seems like it doesn’t register for the first minutes.

sigh That’s one of the biggest downsides of Discourse compared to email: people can edit their posts.