Make set.union and set.intersection convert a non-set self implicitly

MegaIng · December 19, 2023, 8:29am

Definitely better and acceptable.

Interestingly, if these are added as class/static methods, it might be a good idea to prevent access to those from an instance to reduce the chance of confusing the two by accident.

I am unsure of then taking (*iterables) or (iterables) would be better.

ajoino · December 19, 2023, 8:33am

My point was mostly about potentially adding classmethods to cover this usage, the names were just placeholders I’m still not sure that such classmethods are necessary, I rarely see and use sets in that way so I think it’s probably better to roll your own intersection_of_sets() and union_of_sets() functions.

blhsing · December 19, 2023, 8:44am

Judging by the number of upvotes this StackOverflow question and this question and their accepted answers have received, I would say there’s a reasonable amount of demand to justify a separate class method that accepts either an iterable of or a variable number of iterables.

MegaIng · December 19, 2023, 8:50am

Zero for intersection is actually not a good idea, that shouldn’t be allowed: mathematically it should return “the entire universe/alphabet we are (implicitly) talking about” which in pythons case would be “all possible hashable PyObjects” which is … quite a big set.

blhsing · December 19, 2023, 8:56am

Ah you’re right indeed. Removed my link to the SO question where people ask about a solution for a zero set intersection.

rhettinger · December 20, 2023, 7:42pm

If I understand the problem statement correctly, you’re not happy with the current one liners:

big_cup = set().union(*data)
big_cap = set(data[0]).intersection(*data[1:])

and you want them rolled into single calls:

big_cup = set.union(*data)
big_cap = set.intersection(*data)

and don’t want to write helper functions:

def union_of_iterables(*data):
    return set().union(*data)

def intersection_of_iterables(first, /, *rest):
    return set(first).intersection(*rest)

Some questions come to mind:

Would people who subclass set() and frozenset() need to alter their code? Is this even possible is the subclass constructor takes an extra argument such as a type converter?

Would collections.abc.Set and collections.abc.MutableSet change as well?

Why not set.difference() as well? That would save:
set(data[0]).difference(*data[1:])

How about set.isdisjoint()? Should it also accept an iterable of iterables?

Should dict.get() accept a list of key/value pairs for the first argument?
That way users can skip the step of creating a dictionary:
dict(iterable).get(key, default)

Should tuple.count() accept a generic iterable so that users won’t need to write:
tuple(iterable).count(obj)?

Personally, I don’t think there is enough of a value add to warrant having an unexpected and incongruous API that isn’t in harmony with its surroundings. Also, I don’t really like the idea of churning an API for something that comes up so rarely — it’s likely that most users wouldn’t use this even once in their careers.

That said, there are implicit type conversions I do like. I still support the suggestion to have str.join() automatically convert its inputs to strings. That would address a common task making it both cleaner and faster than existing solutions. That matches want print() already does and it would be harmonious change.

MegaIng · December 20, 2023, 9:02pm

At least I am happy with the union oneliner, not so much with the intersection oneliner. IMO it would be nice to have symmetry here.

I would say subclasses can already implement this if they want. (mostly. Getting access to the cls to support further subclassing would require a custom descriptor.) Therefore this should ideally only be support directly on the set class, not any subclasses. Not sure if that is feasiable

No. This is a convenice function for the builtin set, not a property of set-like classes.

difference isn’t symmetric in all arguments, union and intersection are.

Hm, interesting idea. I don’t think this is as common as the other functions, but sure, would be an idea.

set.union and set.intersection can be seen as extended constructors. Not so much for dict.get (and set.isdisjoint

A function that does this could be added to itertools. But it doesn’t construct a tuple, so why would it be bound to the tuple class?

This isn’t an implicit type conversion. As you can see from the one liners, one of which does no conversion and the other only converts the first argument.

In contrast to that suggestion, here there is no ambiguity as to what should happen.

This is probably true for quite a few edge case features in CPython.

The syntax set.union(*args) and set.intersection(*args) both already work as long as the first argument is of type set. I wouldn’t exactly call slightly extending the power of this syntax “unexpected”. It being unique among builtins (and probably the entire stdlib, although I haven’t checked) is a bit of a drawback, I agree.

Not sure what you mean “harmony with it’s surroundings”? Do you just mean that no other set methods behave like this?

rhettinger · December 20, 2023, 9:48pm

If parallel construction something you care about, it is not hard to write set.union(*map(set, data)) and set.intersection(*map(set, data)). Those both work today and are easy to understand.

Also, these are unimportant cases. They don’t come up much and it isn’t hard to do with existing tooling. If this we something people actually needed and cared about, long ago we would have seen the two helper functions appearing in production code. But we haven’t.

Not sure what you mean “harmony with it’s surroundings”? Do you just mean that no other set methods behave like this?

If you look at the API for lists, sets, dicts, tuples, and other core containers, nothing else works this way. It is weird and unexpected, just another special case to learn and remember for something that you would likely never use. Tim said “special cases aren’t special enough …” but it would have been more direct to say that for API design, “resist that urge to do anything weird”. IMO converting some set instance methods to classmethods falls in that category. No one expects that and no one would be able to predict which methods on which classes auto-coerce their first input to an instance of that class. That is well outside the norm for Python.

MegaIng · December 20, 2023, 10:19pm

In math they definitely are symmetric. If we had a universal set, this would even be obvious for python sets.

I don’t understand the point here? sum isn’t related to min and max. The correct counterpart would be product, which we don’t have builtin in python. But if we did, I am sure you would agree that a default of 1 would make sense, just like we have a default of 0 for sum.

Yes, I would say primarily because for none of those there is a similar constructor operation that would make sense. But yeah, introducing a special case for this is probably not necessary.

gcewing · December 20, 2023, 10:42pm

If we had the concept of a “negative set” that specifies all the
items that are not in the set, then an empty negative set woould
serve as a “set of everything” for this purpose.

Rosuav · December 20, 2023, 10:46pm

class NegativeSet(set):
    def add(self, obj):
        super().discard(obj)
    def discard(self, obj):
        super().add(obj)
    def __contains__(self, obj):
        return not super().__contains__(obj)

Doesn’t seem too hard, if you want to play around with it.

Stefan2 · December 24, 2023, 11:08am

Not actually suggesting this, but the verb forms intersect and unite aren’t taken yet (could be extended like unite_iters or unite_iterables).

Sounds like operator.countOf …