Improving the all_equal recipe in itertools doc

blhsing · February 20, 2024, 3:48am

The current documentation of itertools includes an all_equal recipe that would return False only if there is at least one item in the given iterable that is different from the other items:

def all_equal(iterable):
    "Returns True if all the elements are equal to each other."
    g = groupby(iterable)
    return next(g, True) and not next(g, False)

The recipe, although concise, is slightly inefficient in that next(g, True) always returns True and consequently the expression doesn’t short-circuit immediately when the given iterable is empty, leading to an unnecessary second call of next.

This slight shortcoming can be remedied with the following implementation instead:

def all_equal(iterable):
    "Returns False only if any element differs from the others."
    g = groupby(iterable)
    return not (next(g, False) and next(g, False))

Note that the docstring above may also be considered clearer in case of an empty iterable.

Stefan2 · February 20, 2024, 7:47am

I suggested that, too, but Raymond declined. Although for me it’s a clarity issue (I find return x and y misleading if x is always true), the slight rare inefficiency is secondary.

GitHub Issue

Stack Overflow answer

blhsing · February 20, 2024, 8:03am

Ah cool. I did not notice that you already pointed it out in an SO answer (mostly because it ranked lowly with 0 vote) and also opened an issue for it. Love your use of any too. I’ve added a comment in your open issue in support of your proposal4:

def proposal4(iterable):
    g = groupby(iterable)
    return not (any(g) and any(g))

Stefan2 · February 23, 2024, 6:03pm

I just added some more thoughts/solutions/benchmarks in the GitHub issue. Among all that is a new solution, my favorite for readability (and it was faster):

def all_equal(iterable):
    groups = groupby(iterable)
    for first in groups:
        for second in groups:
            return False
    return True

Rosuav · February 23, 2024, 6:34pm

I just have to say, you have a peculiar definition of “readability” This is a nested loop that does no looping whatsoever.

jamestwebber · February 23, 2024, 6:52pm

I think the most readable version of this is len(set(iterable)) == 1

Granted that only works for hashable items, while the itertools recipe supports anything that works with ==.

The issue itself has been closed for almost 2 years, so I think this thread is more about code golf than anything else.

Stefan2 · February 23, 2024, 7:13pm

Yeah, like I said in the issue: “I already know this isn’t everyone’s cup of tea”. But I do suspect this is a familiarity issue. Way back when I first saw a list comprehension with two for clauses, I was puzzled. But it soon became trivial. Now I’ve also used for statements like that for a while and find it perfectly fine. In contrast, like I just added in the issue: With next(g, True) and not next(g, False) I have to carefully decipher that step by step every time even despite being familiar with it. I can’t simply read it.

Stefan2 · February 23, 2024, 7:20pm

Another disadvantage of len(set(iterable)) == 1 is that it’s wrong. Should use <=. And it always goes through the whole iterable (instead of stopping early) and builds a possibly large set. For an itertools recipe, I’d be offended

I could not disagree more.

davidism · February 23, 2024, 7:44pm

Moving this to the Help category since it doesn’t seem to fit in the “Ideas” category, and seems to be more about answering the question “what is the best way to check if all items are equal?” The linked issue indicates that the core dev is not interested in continuously tweaking the current recipe. If you want to teach others about this, getting it into the core docs isn’t a requirement, summarizing different approaches is a great idea for a blog post.

kknechtel · February 23, 2024, 10:54pm

Am I the only one reaching for EAFP here?

def all_equal(iterable):
    g = groupby(iterable)
    try:
        next(g)
        next(g)
        return False
    except StopIteration: # at most one group
        return True

Stefan2 · February 24, 2024, 12:48am

Looks like you are, both here and at Stack Overflow, at least in combination with groupby. I guess I rarely use EAFP, and I’ve pretty much replaced try-next-except with for-statements in my coding.

CAM-Gerlach · February 24, 2024, 6:45pm

Thanks, it certainly didn’t belong in Ideas . As this thread centers on a proposed improvement to the clarity of Python’s documentation, I’ve moved it to the Documentation category where we’ve been having such discussions. Given the significant renewed community interest and the primary close reason at the time being the stated personal preference of one particular core dev, I’ve gone ahead and re-opened the issue for further discussion.

alicederyn · February 25, 2024, 1:15pm

I did too. It’s the solution for me that has the least “extra knowledge” required:

the side effect on the iterator is obvious because “next” is named for the side effect
no need to think about truthiness because we never look at the result of next

But sadly it’s also by far the least efficient option, and I don’t think I’d reach for itertools in this case if I was aiming for pure readability, I’d fetch the first element and use any to compare with the other elements. So I didn’t end up proposing it.

davidism · February 26, 2024, 4:20pm

29 posts were merged into an existing topic: A high-performance solution to the “are all elements of an iterable equal” problem

rhettinger · February 25, 2024, 2:53pm

It may just be a personal preference, but I would like to avoid the double negative not False in favor of just True. If that costs a slight inefficiency in the single case of an empty input, I’m fine with that. Also, I prefer the positively worded docstring which matches the style used in the builtin all() function.

If the slight inefficiency bugs you, consider submitting a PR to the more-itertools project. Those tools are more about being used (where speed matters) rather than being read (where topic focus matters). I’ve done this myself for convolve where the beautiful version in the docs isn’t as fast as what I submitted to more-itertools with sliding_window inlined and the unneeded tuple conversion removed.

To the other respondents to the thread. Yes, there are many ways to implement all_equal(). We had a nice Twitter thread competition on the subject and I summarized some of the results in a StackOverflow answer. For a list input, my favorite was t.count(t[0]) == len(t). For purposes of the itertools recipes though, the groupby() variant is preferred because 1) it teaches you something about groupby which is the least obvious itertool, 2) it works with iterator inputs, 3) is memory efficient, 4) relies only on equality tests rather than hashing or sorting, 5) runs at C speed, 6) doesn’t use auxiliary memory or a counter, 7) demonstrates a functional style characteristic of itertools, 8) has an early-out. Mostly though, I like that it gets to the heart of what groupby is all about which is lazily chunking groups of equal values (much like the Unix uniq command line tool). That is actually the only reason the all_equal() recipe was included.

willingc · February 25, 2024, 7:58pm

Thanks Raymond. I agree that personal preference makes sense here as this is in the Itertools Recipes section of the docs instead of documentation of the module itself.

@CAM-Gerlach Since all_equal is a recipe and not part of the module source code, we can leave this documentation as is.

blhsing · February 26, 2024, 3:25am

I don’t quite see your point here since the current recipe also relies on a double negative, not next(g, False). The proposed alternative simply moves where the double negative is evaluated while making the first call to next more meaningful.

I can agree with that preference.

rhettinger · February 26, 2024, 8:12pm

I’m thinking of changing all_equal() to:

def all_equal(iterable):
    return len(list(islice(groupby(iterable), 2))) < 2

In English, this says if the number of equality groups is less than 2, then the inputs are all equal. That is clearer than the current conjunction of next() calls, but it still shows-off a core capability of groupby() which was specifically designed to find runs of equal values.

kknechtel · February 26, 2024, 9:24pm

Generally I like it and I like the underlying reasoning. However, while the islice call makes sense for efficiency reasons, I wonder if it isn’t too distracting here for pedagogical purposes. (It would also be nice if itertools could count the elements in a lazy iterator, without needing to create a temporary list first.)

Stefan2 · February 27, 2024, 1:08am

How about …

def all_equal(iterable):
    return not any(pairwise(groupby(iterable)))