A high-performance solution to the "are all elements of an iterable equal" problem

alicederyn · February 25, 2024, 6:09pm

As above, assumes slice

alicederyn · February 25, 2024, 6:14pm

See the GitHub issue linked in the second post: itertools: clearer all_equal recipe · Issue #92628 · python/cpython · GitHub

kknechtel · February 26, 2024, 10:35am

from itertools import islice

def map_adjacent_pairs(func, seq):
     return map(func, seq, islice(seq, 1, None))

hypernova · February 26, 2024, 1:21pm

Why is this different to

map(func, seq[1:], seq[:-1])

?

pf_moore · February 26, 2024, 1:26pm

It doesn’t require indexability. This has been mentioned a number of times already.

hypernova · February 26, 2024, 1:45pm

Sure… but this is totally irrelevant to the point being made. I’m not sure why this keeps coming back up?

The problem with using seq[1:] and seq:[:-1] is not the fact these speak in terms of indices, but the fact that seq has to be referred to twice.

effigies · February 26, 2024, 2:03pm

I think you just want itertools.starmap:

import itertools
import operator

def all_equal(it):
    return all(itertools.starmap(operator.eq, itertools.pairwise(it)))

kknechtel · February 26, 2024, 2:15pm

It is for the sorts of people who want to use these tools in the first place. The point of map is that it’s lazy. If I knew I had indexable, slicable sources then I’d know that laziness didn’t get me any benefits, so I’d use a list comprehension instead (or in the context of all/any, perhaps a generator expression - since this allows simpler syntax).

Fundamentally, yes, because you’re fundamentally considering two sequences of values that are based on seq: the values that you want to be on the left-hand side of the equality comparisons being done, and the values on the right-hand side.

But if you write something like my map_adjacent_pairs, or any of the other examples offered so far, then it’s reusable, and you only have that duplication in one place.

It’s possible to build the same thing with itertools.starmap and itertools.pairwise, too, as @effigies shows. But that just pushes the double reference to the input sequence into the itertools.pairwise implementation. Similarly, if something functionally equivalent to map_binary (my map_adjacent_pairs is intended to implement the interface you appear to have in mind) were implemented in the language, the double reference would just get pushed into the underlying C code.

But all of this is still missing the point. Like @rhettinger said: there are many ways to implement the “are all elements equal?” functionality, with varying trade-offs in terms of performance, elegance etc. But this thread is in the Documentation section, and it’s about the specific example shown in a specific place in the documentation. And the purpose of that example is not to be the most performant or most elegant way to solve the problem. The purpose is to showcase how a specific tool can facilitate solving the problem.

MegaIng · February 26, 2024, 3:04pm

Yes, but @hypernova made a different thread, not in the documentation section, (i believe in ideas?) asking for a new function to be added and used all_equals as a motivation. IMO the moderators wrongly merged this, leading to this confusion.

hypernova · February 26, 2024, 3:40pm

The starmap + pairwise solution is also quite slow, because it calls a lot of Python functions in sequence - on the other hand I very much like it, because of how Pythonic it is.

This is why I started a thread suggesting adding a pairwise version of map. The alternative might be going one step further and implementing an all_equal function in C directly.

The advantage of a pairwise map is that you could use it with any binary function.

@MegaIng I did make another thread in the ideas section, which was totally unrelated to this. I did not know this thread existed. Someone moved my thread into this one, which is really a shame because now if anyone were interested in contributing to that discussion, they won’t find it.

I was really asking for feedback on the suggestion. Like I said, I am happy to do the work on it. But I don’t want to invest time into it if it isn’t a feature which would be well received by the community. Hence I was doing my “market research”…

Stefan2 · February 26, 2024, 4:29pm

I asked about it and the two topics got separated again. To be fair, both this topic’s title “A high-performance solution to the “are all elements of an iterable equal” problem” and its content rather look like the opposite of what you say. Like the “all equal” problem was really the subject, and that more general “map pairwise” function mentioned near the end seemed like an afterthought.

hypernova · February 26, 2024, 5:19pm

I keep getting emails about my posts being flagged and removed by mods. Why is this? I’ve checked the rules and haven’t broken any…

alicederyn · February 26, 2024, 5:22pm

Not just referred to twice but iterated twice, right? I assume that’s what you are highlighting? This would be a base requirement for me too: iterables should not be assumed to be reusable.

hypernova · February 26, 2024, 5:39pm

Yeah, exactly - it just doesn’t really “sit right” - with any programming language.

Generally speaking good functions have interfaces which work like this:

You pass in whatever pieces of data are required for the operation (only once!) and a result is returned, ideally without side effects.

I know the suggestion has been to wrap this “uglyness” up into a function and hide it somewhere. But my argument is really as simple as “the fact we had something to hide suggests we are missing the right tool for the job”. That’s it, really. The performance gains to be had imo are secondary, although some others may feel this is also a compelling argument.

sirosen · February 26, 2024, 6:04pm

Now that it’s split from the other thread, I take it that this is a suggestion for a new stdlib itertools function?

Is there an implementation of this in more-itertools? If not, why not try adding it there first?

It’s always easier to talk about something concrete, and lots of this thread feels like it’s dancing around the issue that the most efficient implementation will vary based on what you know about the input.

For example, if it’s a sorted^[1] sequence with a min length of 1, you can just do seq[0] == seq[-1]. That’s an absurd suggestion for a generic check! But it’s obviously more or less the best you can do for that narrow case.
IMO, that’s the key issue: the more you know about the input, the better you can make the implementation. How often do you need a highly optimized solution but don’t know anything other than “it’s iterable”?

in a domain in which sorting relates to equality in “the usual way” ↩︎

alicederyn · February 26, 2024, 7:14pm

If it’s a set, you can just check the length!

Topic		Replies	Views
`itertools.all_equal` Ideas	25	1244	March 10, 2024
Improving the all_equal recipe in itertools doc Documentation documentation	32	1209	February 28, 2024
all() function not working as it should. Python Help help	5	326	December 21, 2023
The map() function or the list comprehension? Python Help	5	2558	September 13, 2023
Map not working Python Help	7	923	December 19, 2023

A high-performance solution to the "are all elements of an iterable equal" problem

Related Topics