From an “high” point of view |
operator and .union
method are very
similar in the set
context.
{1,2,3} | {4,5,6} == ({1,2,3}).union({4,5,6})
The .union
method, however is “more lenient”, as it accept any iterable.
The |
operator instead trows a TypeError
if I try to join a set
with any “non set” iterable.
Is there a reason why |
instead is strict?
Probably a type safety intent. The .union etc methods are for adding (or
removing, whatever) various elements from the set - those elements might
be in various forms so accepting any iterable is both feasible and
convenient to the user.
However, an expression with sets:
set1 | set2
has more predictable behaviour if you’re sure all the operands are sets.
For one thing, if they’re both sets then the above is cummutable:
set1 | set2 == set2 | set1
If set.__or__
accepted nonsets then that wouldn’t hold, as you’d be
using set2.__or__
in the second expression. If that’s a different type
then you’ll get a different operation entirely. With a pure iterable
(maybe a range()
) you’ll get a loud TypeError
showing the issue,
but supposing it were a list (well, some collection accepting |
)? It
might quietly produce another list, not the outcome you might hope.
Better to enforce tighter constraints, and leave operations which do
“conversions” (such as this iterable->set of elements situation) to
named methods whose behaviour is more overt.
I discovered this behavior because I wanted to create a set of integers from a list of ranges, and hoped to write something like
idxs = set()
for r in ranges:
idxs |= r
This is better written as:
idxs.update(r)
Note here that .update
modifies idx
itself. Using |=
would make a
new set like .union()
does. Slower! More memory!
>>> s=set()
>>> s0 = s
>>> s.update((1,2,3))
>>> s
{1, 2, 3}
>>> s is s0
True
>>> s2=s.union((4,5,6))
>>> s2
{1, 2, 3, 4, 5, 6}
>>> s2 is s0
False
Cheers,
Cameron Simpson cs@cskk.id.au