The current documentation of itertools includes an all_equal recipe that would return False only if there is at least one item in the given iterable that is different from the other items:
def all_equal(iterable):
"Returns True if all the elements are equal to each other."
g = groupby(iterable)
return next(g, True) and not next(g, False)
The recipe, although concise, is slightly inefficient in that next(g, True) always returns True and consequently the expression doesn’t short-circuit immediately when the given iterable is empty, leading to an unnecessary second call of next.
This slight shortcoming can be remedied with the following implementation instead:
def all_equal(iterable):
"Returns False only if any element differs from the others."
g = groupby(iterable)
return not (next(g, False) and next(g, False))
Note that the docstring above may also be considered clearer in case of an empty iterable.
I suggested that, too, but Raymond declined. Although for me it’s a clarity issue (I find return x and y misleading if x is always true), the slight rare inefficiency is secondary.
Ah cool. I did not notice that you already pointed it out in an SO answer (mostly because it ranked lowly with 0 vote) and also opened an issue for it. Love your use of any too. I’ve added a comment in your open issue in support of your proposal4:
def proposal4(iterable):
g = groupby(iterable)
return not (any(g) and any(g))
I just added some more thoughts/solutions/benchmarks in the GitHub issue. Among all that is a new solution, my favorite for readability (and it was faster):
def all_equal(iterable):
groups = groupby(iterable)
for first in groups:
for second in groups:
return False
return True
Yeah, like I said in the issue: “I already know this isn’t everyone’s cup of tea”. But I do suspect this is a familiarity issue. Way back when I first saw a list comprehension with two for clauses, I was puzzled. But it soon became trivial. Now I’ve also used for statements like that for a while and find it perfectly fine. In contrast, like I just added in the issue: With next(g, True) and not next(g, False) I have to carefully decipher that step by step every time even despite being familiar with it. I can’t simply read it.
Another disadvantage of len(set(iterable)) == 1 is that it’s wrong. Should use <=. And it always goes through the whole iterable (instead of stopping early) and builds a possibly large set. For an itertools recipe, I’d be offended
Moving this to the Help category since it doesn’t seem to fit in the “Ideas” category, and seems to be more about answering the question “what is the best way to check if all items are equal?” The linked issue indicates that the core dev is not interested in continuously tweaking the current recipe. If you want to teach others about this, getting it into the core docs isn’t a requirement, summarizing different approaches is a great idea for a blog post.
Looks like you are, both here and at Stack Overflow, at least in combination with groupby. I guess I rarely use EAFP, and I’ve pretty much replaced try-next-except with for-statements in my coding.
Thanks, it certainly didn’t belong in Ideas . As this thread centers on a proposed improvement to the clarity of Python’s documentation, I’ve moved it to the Documentation category where we’ve been having such discussions. Given the significant renewed community interest and the primary close reason at the time being the stated personal preference of one particular core dev, I’ve gone ahead and re-opened the issue for further discussion.
I did too. It’s the solution for me that has the least “extra knowledge” required:
the side effect on the iterator is obvious because “next” is named for the side effect
no need to think about truthiness because we never look at the result of next
But sadly it’s also by far the least efficient option, and I don’t think I’d reach for itertools in this case if I was aiming for pure readability, I’d fetch the first element and use any to compare with the other elements. So I didn’t end up proposing it.
It may just be a personal preference, but I would like to avoid the double negative not False in favor of just True. If that costs a slight inefficiency in the single case of an empty input, I’m fine with that. Also, I prefer the positively worded docstring which matches the style used in the builtin all() function.
If the slight inefficiency bugs you, consider submitting a PR to the more-itertools project. Those tools are more about being used (where speed matters) rather than being read (where topic focus matters). I’ve done this myself for convolve where the beautiful version in the docs isn’t as fast as what I submitted to more-itertools with sliding_window inlined and the unneeded tuple conversion removed.
To the other respondents to the thread. Yes, there are many ways to implement all_equal(). We had a nice Twitter thread competition on the subject and I summarized some of the results in a StackOverflow answer. For a list input, my favorite was t.count(t[0]) == len(t). For purposes of the itertools recipes though, the groupby() variant is preferred because 1) it teaches you something about groupby which is the least obvious itertool, 2) it works with iterator inputs, 3) is memory efficient, 4) relies only on equality tests rather than hashing or sorting, 5) runs at C speed, 6) doesn’t use auxiliary memory or a counter, 7) demonstrates a functional style characteristic of itertools, 8) has an early-out. Mostly though, I like that it gets to the heart of what groupby is all about which is lazily chunking groups of equal values (much like the Unix uniq command line tool). That is actually the only reason the all_equal() recipe was included.
Thanks Raymond. I agree that personal preference makes sense here as this is in the Itertools Recipes section of the docs instead of documentation of the module itself.
@CAM-Gerlach Since all_equal is a recipe and not part of the module source code, we can leave this documentation as is.
I don’t quite see your point here since the current recipe also relies on a double negative, not next(g, False). The proposed alternative simply moves where the double negative is evaluated while making the first call to next more meaningful.
In English, this says if the number of equality groups is less than 2, then the inputs are all equal. That is clearer than the current conjunction of next() calls, but it still shows-off a core capability of groupby() which was specifically designed to find runs of equal values.
Generally I like it and I like the underlying reasoning. However, while the islice call makes sense for efficiency reasons, I wonder if it isn’t too distracting here for pedagogical purposes. (It would also be nice if itertools could count the elements in a lazy iterator, without needing to create a temporary list first.)