Add proper/strict kwarg to set.issubset/issuperset

Numerlor · July 12, 2022, 11:36pm

It’s currently impossible to check for a proper subset or subset without using the < and > operators respectively, which have arguable readability. Adding the ability to do it through the existing functions with a kwarg could make some code clearer while making it consistent with the symmetry other methods and operators on the builtin sets have have

steven.daprano · July 13, 2022, 1:04am

It’s not impossible to check for a proper subset or superset.

a < b
a.issubset(b) and a != b

# Proposal:
a.issubset(b, proper=True)

Gouvernathor · July 13, 2022, 12:35pm

That’s true, but I would agree that providing a single function call equivalence for the __lt__ operator would be good and make it more consistent. It could also save computation time, possibly significantly, compared to the given example which calls issubset and __ne__.

As for the kwarg, I’d call it strict, as for zip.

storchaka · July 13, 2022, 1:20pm

If you need a single function call, use operator.lt() or just method __lt__().

No, it will not save computation time, because the only way to implement a strict subset check is to check for subset and for not equality.

Gouvernathor · July 13, 2022, 2:26pm

If dunder access was supposed to be done liberally like this, there wouldn’t be the vars builtin (or it wouldn’t take an optional argument), or the type one-argument function.
And importing a module is hardly a simple solution, much less one yielding readability to the operation, as the OP was asking for.

brettcannon · July 13, 2022, 6:05pm

While we can argue over the perceived difficulty of using an import statement, I personally don’t think finding the superset of subset of a set occurs often enough across the Python ecosystem to warrant adding dedicated methods for it when there’s already an operator that works appropriately. The “arguable readability” view of the OP is a subjective opinion. And because it’s a one-liner as Steven showed which is not complex it also doesn’t necessitate a method to avoid some pitfall in writing this out as needed or creating your own function to do the same thing.

Numerlor · July 13, 2022, 6:29pm

I think it’s inarguable that set_obj.issubset(other, proper=True) is clearer to readers compared to the operators if they’re not used regularly by them, and the methods are already there, they would just be extended.

a.issubset(b) and a != b works but can lead to unnecessary comparisons being done, so the alternative would be len(a) < len(b) and a.issubset(b) which while relatively clear on what it does, also introduces two more calls and needs more parsing by the reader to be understood.

ajoino · July 13, 2022, 6:45pm

I think the < operator is very clear if you’re used to working with sets in math (though I will concede that the mathematical operator is less pointy).

steven.daprano · July 13, 2022, 10:13pm

I’m still just neutral on the idea, but as a data point, sets in Swift have four distinct methods:

isSubset
isSuperset
isStrictSubset
isStrictSuperset

I’m not a big fan of boolean flag arguments that select between two different behaiours, but neither do I think this functionality deserves four methods rather than two. So maybe the boolean flag is the lesser evil? I’m genuinely on the fence on that, and reserve the right to come down on one side or the other.

If I were to pick the colour of the bikeshed, I’d prefer “Proper” over “Strict” – the term “proper …” is far more common in mathematics, and “strict” sounds like it will raise an exception if some condition is not met.

Melendowski · July 14, 2022, 12:23am

Looking at section " ⊂ and ⊃ symbols" of Subset - Wikipedia I can agree with you partially but, I don’t think the operate overloading is explicit enough in this case. Depending on what book you read or how your professor teaches you, you could learn one pairing over the other. Where as in the case of using the bitwise operators to mean set operations is straight forward and intuitive, the set operations are defined in set builder notation using logical operators so there’s no ambiguity.

petersuter · July 14, 2022, 5:29am

I don’t think using operators like < is that confusing or unreadable or unclear. The documentation seems to explain the meaning acceptably.

It also already uses the terms “proper subset” and “proper superset”. So to me ispropersubset and ispropersuperset seem like the obvious methods to add, if anything were to be added.

The documentation does kind of raise the question: Why do all other operators offer an equivalent method but not this one?

(Except the in operator, which in math would be ∈ and maybe called “is element of” I guess, and in other programming languages is often set.contains(element). Python has operator.contains. Python can’t and shouldn’t try to exactly mirror math notation or other programming languages in all aspects of course.)

Dutcho · July 14, 2022, 12:59pm

If (and I’m pretty neutral) this goes the method-with-proper/strict-flag way, I’d suggest adding the same option to issubclass() as well, if only for orthogonality reasons.
Like the following (and yes, issubclass() doesn’t need a set argument) is True:

set('ab').issubset('abc', proper=True) and not set('abc').issubset('abc', proper=True) and set('abc').issubset('abc')

so does below:

issubclass(bool, int, proper=True) and not issubclass(int, int, proper=True) and issubclass(int, int)

After all, types are also a kind of sets (although not Python sets)

PS
If this results in multiple methods like ispropersubset(), I do not propose to introduce an extra built-in ispropersubclass()

EpicWink · July 14, 2022, 10:58pm

I personally don’t like any binary operation being an instance method (corollary: I hate the ThreeJS API). I will always use the operators (eg <), when available, otherwise simple functions (eg an hypothetical setslib.is_proper_subset(subset, superset), or even the existing set.issubset(subset, superset)).

I think instance method should be related directly to the instance itself, not to the domain that the instance is part of.

hauntsaninja · July 18, 2022, 12:43am

If you don’t want to use < because you don’t like those operators, and don’t want to use operator.lt because you don’t like functions… a != b and a.issubset(b) is the way to go. It’s explicit, doesn’t add any terminology, and is fewer characters than the methods proposed.

What unnecessary comparisons? Note that a != b does a length check: cpython/Objects/setobject.c at 07aeb7405ea42729b95ecae225f1d96a4aea5121 · python/cpython · GitHub

Gouvernathor · August 20, 2022, 3:23am

These operations being available as instance methods have an advantage that’s been overseen here, I believe : they accept any iterable, and apply it set semantics for comparisons.
{3, 4}.issubset([3, 4]) works, when {3, 4} <= [3, 4] doesn’t.

I’m not sure that’s a very common use case, but the overhead of having to manually convert the lists (in this example) to sets would be annoting. And necessary, given the previous recommendation to do a != b, which means still relying on operators.

ajoino · August 20, 2022, 6:54am

Remember that the < operator calls the set.__lt__ method which could also accept any iterable if so desired. Also, I’m not convinced by the argument that writing a < set(b) is annoying than some method magically (it would probably involve a C equivalent of calling set()) transforming iterables to sets. I think it makes the code nicer.

storchaka · August 20, 2022, 7:22am

True, the methods accept arbitrary iterables, and in some cases they can avoid converting an argument to an immediate set. For example:

>>> {0}.issuperset(iter(range(10**12)))
False
>>> {0}.issubset(iter(range(10**12)))
True

But {-1}.issubset(iter(range(10**12))) will take a long time to complete.

Gouvernathor · August 20, 2022, 10:24am

There is no guarantee anywhere that set.__lt__ accepts non-sets as a second argument, and I think adding that guarantee would be a mistake. The < operator should keep it’s strict-type behavior, and I don’t see how making lt behave your way wouldn’t break this.
Also, again, you’re not supposed to call dunders directly, ever. That can’t be the advised way. If we have the issubset method at all, it’s because we don’t want people to call the dunders directly.

ajoino · August 20, 2022, 10:29am

I probably wasn’t very clear, sorry.

I never argued any changes to __lt__, I just meant that it could be changed to support arbritrary iterables. My main point is that I don’t think writing set(...) (to convert an arbritary iterable to a set) is not annoying, I find it makes the intention of the code clear.