Add symmetric difference to collections.Counter

I recently noticed that collections.Counter objects support asymmetric difference, union, and intersection, but they don’t support symmetric difference.

So, given these Counter objects:

>>> from collections import Counter
>>> c = Counter("ababcabcd")
>>> d = Counter("bananabab")
>>> c
Counter({'a': 3, 'b': 3, 'c': 2, 'd': 1})
>>> d
Counter({'a': 4, 'b': 3, 'n': 2})

This works:

>>> c | d
Counter({'a': 4, 'b': 3, 'c': 2, 'n': 2, 'd': 1})
>>> c & d
Counter({'a': 3, 'b': 3})
>>> (c | d) - (c & d)
Counter({'c': 2, 'n': 2, 'a': 1, 'd': 1})

But this does not:

>>> c ^ d
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for ^: 'Counter' and 'Counter'

I would expect c ^ d to be the same as (c | d) - (c & d), so implementing it should be fairly simple.

This was discovered for the purpose of solving a programming exercise (finding the difference between the letters in two strings, ignoring order). I do imagine that this may be a useful operation for Counter to support though.

9 Likes

cc @rhettinger

In principle this seems reasonable to me, though Raymond is the domain expert here.

A

Is this a real world need? ISTM that programming exercises by design pick tasks that people don’t normally do and for which solutions don’t already exist. Presumably, that is why lists don’t have a method for Longest Increasing Subsequence or other common toy problems.

The issue with symmetric_difference for multisets is that it is hard to interpret the result. When you see p ^ q during a code review, do you think, “elementwise difference between the maximum and minimum”? Does that task arise often enough to warrant inclusion in the standard library? Do we want people to have to take the time to learn a method they will likely never use and which is difficult to interpret? My opinion is that this is best left as a programming exercise.

2 Likes

Stack Overflow with infos from Tim/Raymond:

Why is there no symmetric difference for collections.Counter?

9 Likes

From a readability point of view, I’m inclined to agree with @Wombat: I would mentally parse (c | d) - (c & d) more easily than the direct c ^ d.

I don’t actually object to adding it, though - it’s a well defined operation that’s valid for both sets and multisets, and we already support it for the former.

4 Likes

Yes :wink:. People familiar with the multiset concept, and the concept of ^ for regular sets, don’t have trouble making “the most obvious” generalization.

That’s the rub: no. At least not that I’ve ever seen, and I’m so old I’ve seen everything :wink:.

2 Likes

I have another concern with having named this operation Counter.__xor__. People expect ^ to be an associative operation, as it is for integers and sets, yet now

>>> x = Counter(a=1)
>>> y = Counter(a=10)
>>> z = Counter(a=100)
>>> (x ^ y) ^ z
Counter({'a': 91})
>>> x ^ (y ^ z)
Counter({'a': 89})

There’s a similar but different way to generalize set.__xor__ that behaves more like ^ and would be associative, namely the one that takes the bitwise XOR of each count:

Counter({item: x[item] ^ y[item] for item in x | y if x[item] != y[item]})

I don’t propose implementing that operation at this time. But can we consider renaming the subtraction-based operation to something other than __xor__ for Python 3.15, such as Counter.symmetric_difference or Counter.absolute_difference?

The operators |, & and ^ are left-associative, as are +, -, * and /.

For some types, such as matrices, * isn’t associative, so ^not being associative for some types shouldn’t be forbidden.

1 Like

Matrix multiplication is associative. There are of course specialized mathematical structures with non-associative multiplication, like the octonions, but these are not things an ordinary programmer will ever see, let alone part of the standard library. I don’t suggest that they should be forbidden, just that they’re unexpected.

Also, we made an entirely new @ operator for matrix multiplication, precisely to distinguish it from element-wise multiplication, which satisfies more of the properties one expects of *. I argue that we should similarly rename a confusingly-named operation to distinguish it from element-wise XOR, which satisfies more of the properties one expects of ^.

(This is in addition to the arguments already made above that ^ is not an intuitive name for a subtraction-based operation. While I understand the abstract appeal of generalizing ^ for sets, element-wise XOR also generalizes ^ for sets in an abstractly cleaner way, so I still think practical intuitiveness is the relevant concern.)

Add option “nothing”.

There should be an option for “add both” like a set.

Hopefully final attitude poll:

  • a ^ b
  • a.symmetric_difference(b)
  • Both
  • Yes method, different name
  • No method
0 voters

-1 for making any change to what we have now. Maybe just document the non-associativity in the docs.

  • Any definition of __xor__ other than what we have now will be surprising to users. All of the participants in the earlier discussion (including the OP) expected the current definition, rather than something new invented solely to enforce associativity.

  • If we ever had a symmetric_difference method that disagreed with __xor__, that would be its own source of confusion.

  • A rename wouldn’t fit in the current API where all the multiset operations are operators (which contrasts with named methods that allow negative and zero counts). So a rename would break what little harmony already exists in the API.

Note, the associativity concern is only a theoretical issue. The __xor__ operation mainly exists for completeness and will be rarely used. If x ^ y will be rare, then x ^ y ^ z will be almost non-existent.

1 Like

Apparently, they don’t. The independently designed and pre-existing multiset project on PyPI does the same as what is in 3.15 now:

>>> from multiset import multiset
>>> Multiset({'a': 1})
>>> x = Multiset({'a': 1})
>>> y = Multiset({'a': 10})
>>> z = Multiset({'a': 100})
>>> (x ^ y) ^ z
Multiset({'a': 91})
>>> x ^ (y ^ z)
Multiset({'a': 89})

That project has been around since 2016 and has been actively maintained. So, the __xor__ in Py3.15 can be considered an established practice.

2 Likes