Make set.union and set.intersection convert a non-set self implicitly

While I understand that set.union and set.intersection are instance methods, I wonder what could possibly be the difficulties to making them convert a non-set self to a set first, when used as class methods.

Suppose we have a list of lists and want to obtain their common items:

lists = [[1, 2, 3], [2, 3, 4], [3, 4, 5]]

Wouldn’t this be more intuitive and readable:

set.intersection(*lists) # currently raises TypeError

than the currently accepted boilerplate of either converting the first list to set first and slicing the rest:

set(lists[0]).intersection(*lists[1:])

or mapping all lists to sets first:

set.intersection(*map(set, lists))
2 Likes

The pattern I know is set().union(...) which doesn’t look too different from the “ideal” set.union(...).

(It does technically construct an extra unneeded set object, but compared to the map option its very resource light)

The typical use case is to obtain the set union of a list of lists, so to use the set().union(...) approach one would first have to extract the first item from the list and slice the rest to a different list or call itertools.islice for them. Hardly “resource light” IMO.

Updated my original post with a better example for clarification of the use case.

Instead use: set().union(*lists).

Good alternative indeed, but doesn’t solve the problem for set.intersection. Updated my original post with set.intersection as an example then.

1 Like

Give it a try with a class and methods called from the class. Can you write one that does something like what you’re asking? Moved here from ideas because coming to a better understanding of that is needed first. It’s also not clear that any of the existing approaches are actually bad as you suggest, or couldn’t be improved without change to Python, which is also appropriate for the help category.

Of course it can be done with a set subclass:

class Set(set):
    def intersection(self, *others):
        if not isinstance(self, set):
            self = set(self)
        return self.intersection(*others)

lists = [[1, 2, 3], [2, 3, 4], [3, 4, 5]]
print(Set.intersection(*lists)) # outputs {3}

But why not make such an intuitive behavior built-in? What exactly could be the downsides?

Ofcourse you can write a function/method that behaves like that:

class Set:
    def intersection(*args):
        if not args:
            return Set()
        if not isinstance(args[0], Set):
           self = Set(args[0])
        else:
            self = Set(args[0])
        args = args[1:]
        # ... normal implementation

This isn’t a technical program(unless CPython’s implementation has some weird limitations), it would definitely be possible to support the syntax, probably with minimal changes in the actual implementation.

For union, there is a simpler syntax as I described above. For intersection, I don’t think there is. Whether the current syntax is bad, I am not too sure about tbh.

Would it pass static typing? Does anything else behave this way?

Without any type annotations, my code above would get the following
complaint from mypy:

main.py:8: error: Argument 1 to "intersection" has incompatible type "*list[list[int]]"; expected "__main__.Set"  [arg-type]

But would get a pass with proper annotations:

from typing import Iterable, Any

class Set(set):
    def intersection(self: set | Iterable[Any], *others: Iterable[Any]):
        if not isinstance(self, set):
            self = set(self)
        return self.intersection(*others)

lists = [[1, 2, 3], [2, 3, 4], [3, 4, 5]]
print(Set.intersection(*lists))

So I don’t think type checking would be an issue as long as the type checkers update their typesheds with the change as usual.

I don’t recall anything else in Python’s standard library that currently supports this behavior though, but then I don’t think there’s a good use case for any other types to warrant this behavior.

With set, however, the use case above is common and intuitive enough to allow this behavior IMHO.

1 Like

I cannot think of any precedent for what you are requesting. It violates the basic idea of an instance method. Anyway, give a list of lists:

lists = [[1, 2, 3], [2, 3, 4], [3, 4, 5]]
set(lists.pop()).intersection(*lists)
# {3}
1 Like

Popping modifies the original list, not an ideal alternative.

EDIT:
Inspired by your idea of popping, use iter perhaps:

lists = [[1, 2, 3], [2, 3, 4], [3, 4, 5]]
iter_lists = iter(lists)
set(next(iter_lists)).intersection(*iter_lists)

I agree that it is in theory weird to allow self to be of a different type, but in this particular case it just makes this particular aspect of usage so much more intuitive and readable that it may just warrant an exception to the rules.

The bottomline is that yes, there are workarounds. But I suppose we can agree that the code would look objectively cleaner and friendlier if such a usage were allowed.

Instead this could be framed as “a class and instance method that share the same name”. Yes, this would AFAIK be a unique case in the python stdlib, although I have seen methods that behave like that and I think I have also implemented one such method myself (in a variation of set noless).

I think rather than making union and intersection both class and instance methods, we could potentially add two new classmethods, from_intersection and from_union that does what OP wants.

1 Like

That would be a good solution indeed. It would also then support an empty list of lists, whereas a hybrid class/instance method cannot.

While we’re at it make the two class methods accept an iterable of iterables instead of a variable number of iterables so we don’t have to use the unpacking operator when calling it.

The hybrid could still take an empty list of arguments without problems.

I am personally not a huge fan of the naming convention from_<operation>, normally it would be from_<source object>. But I guess that is something one can get used to. I also don’t have a better suggestion for names except the hybrid.

Hmm true. I was thinking more in terms of typing an instance method where self could be missing, but now realized that we can either @typing.overload it with distinctly different signatures or type self as Optional[set | Iterable[Any]] with a default value of None.

And yes the naming of the dedicated class methods would be awkward because the best names are already taken as instance methods, so yeah a hybrid solution would still be nice.

The only reason set has this limitation is because it’s a built-in. Ordinary methods in user-defined classes don’t care whether the first argument is an instance of the class:

>>> class Example:
...     def join(self, *values):
...         return self, *values
... 
>>> Example().join(1, 2, 3)
(<__main__.Example object at 0x7f8a1e3dab20>, 1, 2, 3)
>>> Example.join(1, 2, 3)
(1, 2, 3)
>>> Example.join(Example(), 1, 2, 3)
(<__main__.Example object at 0x7f8a1e3dab20>, 1, 2, 3)

Just as they don’t care whether that argument is assigned to a separate parameter named self (although this example also allows for an empty argument list when calling it from the class):

>>> class Example:
...     def join(*values):
...         return values
... 
>>> Example().join(1, 2, 3)
(<__main__.Example object at 0x7f8a1e3dab20>, 1, 2, 3)
>>> Example.join(1, 2, 3)
(1, 2, 3)
>>> Example.join(Example(), 1, 2, 3)
(<__main__.Example object at 0x7f8a1e3dab20>, 1, 2, 3)

(This can’t be done with @classmethod or @staticmethod - since those will discard any knowledge of the object, if any, upon which the method was called.)

So, theoretically almost everything behaves this way - it’s just a question of whether it’s useful, which boils down to the method caring about self but not caring about its internal interface. set.union and set.intersection only care about the fact that self is iterable, because they create a new object.

How about set.union_of() and set.intersection_of()?

1 Like