I recently noticed that collections.Counter objects support asymmetric difference, union, and intersection, but they don’t support symmetric difference.
So, given these Counter objects:
>>> from collections import Counter
>>> c = Counter("ababcabcd")
>>> d = Counter("bananabab")
>>> c
Counter({'a': 3, 'b': 3, 'c': 2, 'd': 1})
>>> d
Counter({'a': 4, 'b': 3, 'n': 2})
This works:
>>> c | d
Counter({'a': 4, 'b': 3, 'c': 2, 'n': 2, 'd': 1})
>>> c & d
Counter({'a': 3, 'b': 3})
>>> (c | d) - (c & d)
Counter({'c': 2, 'n': 2, 'a': 1, 'd': 1})
But this does not:
>>> c ^ d
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for ^: 'Counter' and 'Counter'
I would expect c ^ d to be the same as (c | d) - (c & d), so implementing it should be fairly simple.
This was discovered for the purpose of solving a programming exercise (finding the difference between the letters in two strings, ignoring order). I do imagine that this may be a useful operation for Counter to support though.
Is this a real world need? ISTM that programming exercises by design pick tasks that people don’t normally do and for which solutions don’t already exist. Presumably, that is why lists don’t have a method for Longest Increasing Subsequence or other common toy problems.
The issue with symmetric_difference for multisets is that it is hard to interpret the result. When you see p ^ q during a code review, do you think, “elementwise difference between the maximum and minimum”? Does that task arise often enough to warrant inclusion in the standard library? Do we want people to have to take the time to learn a method they will likely never use and which is difficult to interpret? My opinion is that this is best left as a programming exercise.
From a readability point of view, I’m inclined to agree with @Wombat: I would mentally parse (c | d) - (c & d) more easily than the direct c ^ d.
I don’t actually object to adding it, though - it’s a well defined operation that’s valid for both sets and multisets, and we already support it for the former.
I have another concern with having named this operation Counter.__xor__. People expect ^ to be an associative operation, as it is for integers and sets, yet now
>>> x = Counter(a=1)
>>> y = Counter(a=10)
>>> z = Counter(a=100)
>>> (x ^ y) ^ z
Counter({'a': 91})
>>> x ^ (y ^ z)
Counter({'a': 89})
There’s a similar but different way to generalize set.__xor__ that behaves more like ^ and would be associative, namely the one that takes the bitwise XOR of each count:
Counter({item: x[item] ^ y[item] for item in x | y if x[item] != y[item]})
I don’t propose implementing that operation at this time. But can we consider renaming the subtraction-based operation to something other than __xor__ for Python 3.15, such as Counter.symmetric_difference or Counter.absolute_difference?
Matrix multiplication is associative. There are of course specialized mathematical structures with non-associative multiplication, like the octonions, but these are not things an ordinary programmer will ever see, let alone part of the standard library. I don’t suggest that they should be forbidden, just that they’re unexpected.
Also, we made an entirely new @ operator for matrix multiplication, precisely to distinguish it from element-wise multiplication, which satisfies more of the properties one expects of *. I argue that we should similarly rename a confusingly-named operation to distinguish it from element-wise XOR, which satisfies more of the properties one expects of ^.
(This is in addition to the arguments already made above that ^ is not an intuitive name for a subtraction-based operation. While I understand the abstract appeal of generalizing ^ for sets, element-wise XOR also generalizes ^ for sets in an abstractly cleaner way, so I still think practical intuitiveness is the relevant concern.)
-1 for making any change to what we have now. Maybe just document the non-associativity in the docs.
Any definition of __xor__ other than what we have now will be surprising to users. All of the participants in the earlier discussion (including the OP) expected the current definition, rather than something new invented solely to enforce associativity.
If we ever had a symmetric_difference method that disagreed with __xor__, that would be its own source of confusion.
A rename wouldn’t fit in the current API where all the multiset operations are operators (which contrasts with named methods that allow negative and zero counts). So a rename would break what little harmony already exists in the API.
Note, the associativity concern is only a theoretical issue. The __xor__ operation mainly exists for completeness and will be rarely used. If x ^ y will be rare, then x ^ y ^ z will be almost non-existent.