Improving the all_equal recipe in itertools doc

blhsing · February 27, 2024, 1:43am

Although it is trivial to count the elements in a lazy iterator, the commonly used idiom using sum just isn’t very immediately readable to those who aren’t familiar with the idiom:

def all_equal(iterable):
    return sum(1 for _ in islice(groupby(iterable), 2))) < 2

So yeah it would be nice to have an itertools function just for counting, and to consume an iterator cheaply.

blhsing · February 27, 2024, 1:57am

Easily read as “that there is not any pair of equality groups”. +1

alicederyn · February 27, 2024, 7:00am

Also as a mild annoyance in this case, it doesn’t short-circuit any more.

Stefan2 · February 27, 2024, 7:07am

Ben had said that for a moment too. Is the islice(..., 2) so easily overlooked?

blhsing · February 27, 2024, 7:19am

Maybe @alicederyn meant to say that it doesn’t short-circuit anymore when given an empty iterable, like my original argument in the first post.

alicederyn · February 27, 2024, 7:35am

Apparently yes, it’s easily overlooked. Oops. I guess “isn’t very immediately readable to those who aren’t familiar with the idiom” can be extended! Sorry…

alicederyn · February 27, 2024, 7:40am

I am a fan of not having to repeat the 2 inside islice too

Though as an example, I suppose the islice version shows off a more general tool that can be used elsewhere more easily than “any pairwise” which only works for “is there two or more”

Agreed, I’ve wanted this in the past. ilen?

Stefan2 · February 27, 2024, 7:56am

Meh. It’s shown in six recipes already. Enough!

Even like this already, in the very first recipe:

def take(n, iterable):
    "Return first n items of the iterable as a list."
    return list(islice(iterable, n))

So … actually:

def all_equal(iterable):
    return len(take(2, groupby(iterable))) < 2

Stefan2 · February 27, 2024, 8:16am

That’s what more-itertools calls it.

And there’s an old issue where it was rejected.

nedbat · February 27, 2024, 1:56pm

The length and variety of this topic threads highlights an important point: itertools is full of powerful tools the can be combined in many ways, many of which are not obvious at first. As the recipes section says, “The primary purpose of the itertools recipes is educational.”

There are many good points being made here about the pros and cons of each approach, and the behavior of the primitives being used. Since all_equal is a recipe in the docs, not an implementation, why do we need to choose just one? We could expand the recipes from a single code block to readable prose that explains what’s happening in each, to make it more fully pedagogical.

BTW: The recipes section also says, “The recipes also give ideas about ways that the tools can be combined — for example, how compress() and range() can work together,” but compress isn’t mentioned in any of the recipes, so there’s some editing to be done. It looks like we lost the compress/range combination when sieve was updated.

rhettinger · February 27, 2024, 5:37pm

FWIW, this is a canonical use of islice. It says, “fetch no more than two groups.”

It is similar to the standard idiom for sequences: preview = data[:10]

Both groupby() and islice() are being used in the most direct, canonical, and least clever way. It is what we want people to learn.

A core problem being solved is that (aside from Tim Peters, Ben, and Stefan) no one is born knowing how to manipulate iterator streams with an iterator algebra in a functional style. Working through these examples teaches that style of thinking (and a few patterns). In my courses, I’ve had people work through how each example works and have found that it confers Jedi like mastery of the itertools.

kknechtel · February 27, 2024, 6:04pm

I agree, but it’s an extra step in the process that isn’t required to solve the problem - it’s just enabling short-circuiting. Some of the other approaches don’t make this feel like a separate step, while still accomplishing short-circuiting. But either way, examples that purely chain together function calls (including all / any) are probably a better illustration of the power of itertools, than examples that have to rely on boolean operators to combine results. Yes, even though any/all could be described as generalizations of or/and.

I feel like CS courses used to give a better background for this kind of thing. (For example, by expecting students to become familiar with pipelines in Unix commands, and accomplish useful things with them, following “the Unix way”.) But yes, having examples like this is excellent pedagogically.

I wonder if it wouldn’t be better to show multiple examples for all_equal. That “preferably only one obvious way” thing doesn’t seem to work out as often as one might like

pylang · February 28, 2024, 9:04am

I think having more than one canonical way to do pythonic things is fun to play with, but perhaps not something worth propagating in the docs. Sometimes too many options is confusing and leads one wanting to see less.