It’s currently impossible to check for a proper subset or subset without using the <
and >
operators respectively, which have arguable readability. Adding the ability to do it through the existing functions with a kwarg could make some code clearer while making it consistent with the symmetry other methods and operators on the builtin sets have have
It’s not impossible to check for a proper subset or superset.
a < b
a.issubset(b) and a != b
# Proposal:
a.issubset(b, proper=True)
That’s true, but I would agree that providing a single function call equivalence for the __lt__
operator would be good and make it more consistent. It could also save computation time, possibly significantly, compared to the given example which calls issubset
and __ne__
.
As for the kwarg, I’d call it strict
, as for zip
.
If you need a single function call, use operator.lt()
or just method __lt__()
.
No, it will not save computation time, because the only way to implement a strict subset check is to check for subset and for not equality.
If dunder access was supposed to be done liberally like this, there wouldn’t be the vars
builtin (or it wouldn’t take an optional argument), or the type
one-argument function.
And importing a module is hardly a simple solution, much less one yielding readability to the operation, as the OP was asking for.
While we can argue over the perceived difficulty of using an import
statement, I personally don’t think finding the superset of subset of a set occurs often enough across the Python ecosystem to warrant adding dedicated methods for it when there’s already an operator that works appropriately. The “arguable readability” view of the OP is a subjective opinion. And because it’s a one-liner as Steven showed which is not complex it also doesn’t necessitate a method to avoid some pitfall in writing this out as needed or creating your own function to do the same thing.
I think it’s inarguable that set_obj.issubset(other, proper=True)
is clearer to readers compared to the operators if they’re not used regularly by them, and the methods are already there, they would just be extended.
a.issubset(b) and a != b
works but can lead to unnecessary comparisons being done, so the alternative would be len(a) < len(b) and a.issubset(b)
which while relatively clear on what it does, also introduces two more calls and needs more parsing by the reader to be understood.
I think the <
operator is very clear if you’re used to working with sets in math (though I will concede that the mathematical operator is less pointy).
I’m still just neutral on the idea, but as a data point, sets in Swift have four distinct methods:
- isSubset
- isSuperset
- isStrictSubset
- isStrictSuperset
I’m not a big fan of boolean flag arguments that select between two different behaiours, but neither do I think this functionality deserves four methods rather than two. So maybe the boolean flag is the lesser evil? I’m genuinely on the fence on that, and reserve the right to come down on one side or the other.
If I were to pick the colour of the bikeshed, I’d prefer “Proper” over “Strict” – the term “proper …” is far more common in mathematics, and “strict” sounds like it will raise an exception if some condition is not met.
Looking at section " ⊂ and ⊃ symbols" of Subset - Wikipedia I can agree with you partially but, I don’t think the operate overloading is explicit enough in this case. Depending on what book you read or how your professor teaches you, you could learn one pairing over the other. Where as in the case of using the bitwise operators to mean set operations is straight forward and intuitive, the set operations are defined in set builder notation using logical operators so there’s no ambiguity.
I don’t think using operators like <
is that confusing or unreadable or unclear. The documentation seems to explain the meaning acceptably.
It also already uses the terms “proper subset” and “proper superset”. So to me ispropersubset
and ispropersuperset
seem like the obvious methods to add, if anything were to be added.
The documentation does kind of raise the question: Why do all other operators offer an equivalent method but not this one?
(Except the in
operator, which in math would be ∈ and maybe called “is element of” I guess, and in other programming languages is often set.contains(element)
. Python has operator.contains
. Python can’t and shouldn’t try to exactly mirror math notation or other programming languages in all aspects of course.)
If (and I’m pretty neutral) this goes the method-with-proper
/strict
-flag way, I’d suggest adding the same option to issubclass()
as well, if only for orthogonality reasons.
Like the following (and yes, issubclass()
doesn’t need a set
argument) is True
:
set('ab').issubset('abc', proper=True) and not set('abc').issubset('abc', proper=True) and set('abc').issubset('abc')
so does below:
issubclass(bool, int, proper=True) and not issubclass(int, int, proper=True) and issubclass(int, int)
After all, type
s are also a kind of sets (although not Python set
s)
PS
If this results in multiple methods like ispropersubset()
, I do not propose to introduce an extra built-in ispropersubclass()
I personally don’t like any binary operation being an instance method (corollary: I hate the ThreeJS API). I will always use the operators (eg <
), when available, otherwise simple functions (eg an hypothetical setslib.is_proper_subset(subset, superset)
, or even the existing set.issubset(subset, superset)
).
I think instance method should be related directly to the instance itself, not to the domain that the instance is part of.
If you don’t want to use <
because you don’t like those operators, and don’t want to use operator.lt
because you don’t like functions… a != b and a.issubset(b)
is the way to go. It’s explicit, doesn’t add any terminology, and is fewer characters than the methods proposed.
What unnecessary comparisons? Note that a != b
does a length check: cpython/Objects/setobject.c at 07aeb7405ea42729b95ecae225f1d96a4aea5121 · python/cpython · GitHub
These operations being available as instance methods have an advantage that’s been overseen here, I believe : they accept any iterable, and apply it set semantics for comparisons.
{3, 4}.issubset([3, 4])
works, when {3, 4} <= [3, 4]
doesn’t.
I’m not sure that’s a very common use case, but the overhead of having to manually convert the lists (in this example) to sets would be annoting. And necessary, given the previous recommendation to do a != b
, which means still relying on operators.
Remember that the <
operator calls the set.__lt__
method which could also accept any iterable if so desired. Also, I’m not convinced by the argument that writing a < set(b)
is annoying than some method magically (it would probably involve a C equivalent of calling set()
) transforming iterables to sets. I think it makes the code nicer.
True, the methods accept arbitrary iterables, and in some cases they can avoid converting an argument to an immediate set. For example:
>>> {0}.issuperset(iter(range(10**12)))
False
>>> {0}.issubset(iter(range(10**12)))
True
But {-1}.issubset(iter(range(10**12)))
will take a long time to complete.
There is no guarantee anywhere that set.__lt__
accepts non-sets as a second argument, and I think adding that guarantee would be a mistake. The <
operator should keep it’s strict-type behavior, and I don’t see how making lt behave your way wouldn’t break this.
Also, again, you’re not supposed to call dunders directly, ever. That can’t be the advised way. If we have the issubset method at all, it’s because we don’t want people to call the dunders directly.
I probably wasn’t very clear, sorry.
I never argued any changes to __lt__
, I just meant that it could be changed to support arbritrary iterables. My main point is that I don’t think writing set(...)
(to convert an arbritary iterable to a set) is not annoying, I find it makes the intention of the code clear.