A high-performance solution to the "are all elements of an iterable equal" problem

As above, assumes slice

See the GitHub issue linked in the second post: itertools: clearer all_equal recipe · Issue #92628 · python/cpython · GitHub

from itertools import islice

def map_adjacent_pairs(func, seq):
     return map(func, seq, islice(seq, 1, None))
1 Like

Why is this different to

map(func, seq[1:], seq[:-1])

?

It doesn’t require indexability. This has been mentioned a number of times already.

Sure… but this is totally irrelevant to the point being made. I’m not sure why this keeps coming back up?

The problem with using seq[1:] and seq:[:-1] is not the fact these speak in terms of indices, but the fact that seq has to be referred to twice.

I think you just want itertools.starmap:

import itertools
import operator

def all_equal(it):
    return all(itertools.starmap(operator.eq, itertools.pairwise(it)))

It is for the sorts of people who want to use these tools in the first place. The point of map is that it’s lazy. If I knew I had indexable, slicable sources then I’d know that laziness didn’t get me any benefits, so I’d use a list comprehension instead (or in the context of all/any, perhaps a generator expression - since this allows simpler syntax).

Fundamentally, yes, because you’re fundamentally considering two sequences of values that are based on seq: the values that you want to be on the left-hand side of the equality comparisons being done, and the values on the right-hand side.

But if you write something like my map_adjacent_pairs, or any of the other examples offered so far, then it’s reusable, and you only have that duplication in one place.

It’s possible to build the same thing with itertools.starmap and itertools.pairwise, too, as @effigies shows. But that just pushes the double reference to the input sequence into the itertools.pairwise implementation. Similarly, if something functionally equivalent to map_binary (my map_adjacent_pairs is intended to implement the interface you appear to have in mind) were implemented in the language, the double reference would just get pushed into the underlying C code.

But all of this is still missing the point. Like @rhettinger said: there are many ways to implement the “are all elements equal?” functionality, with varying trade-offs in terms of performance, elegance etc. But this thread is in the Documentation section, and it’s about the specific example shown in a specific place in the documentation. And the purpose of that example is not to be the most performant or most elegant way to solve the problem. The purpose is to showcase how a specific tool can facilitate solving the problem.

2 Likes

Yes, but @hypernova made a different thread, not in the documentation section, (i believe in ideas?) asking for a new function to be added and used all_equals as a motivation. IMO the moderators wrongly merged this, leading to this confusion.

1 Like

The starmap + pairwise solution is also quite slow, because it calls a lot of Python functions in sequence - on the other hand I very much like it, because of how Pythonic it is.

This is why I started a thread suggesting adding a pairwise version of map. The alternative might be going one step further and implementing an all_equal function in C directly.

The advantage of a pairwise map is that you could use it with any binary function.

@MegaIng I did make another thread in the ideas section, which was totally unrelated to this. I did not know this thread existed. Someone moved my thread into this one, which is really a shame because now if anyone were interested in contributing to that discussion, they won’t find it.

I was really asking for feedback on the suggestion. Like I said, I am happy to do the work on it. But I don’t want to invest time into it if it isn’t a feature which would be well received by the community. Hence I was doing my “market research”…

I asked about it and the two topics got separated again. To be fair, both this topic’s title “A high-performance solution to the “are all elements of an iterable equal” problem” and its content rather look like the opposite of what you say. Like the “all equal” problem was really the subject, and that more general “map pairwise” function mentioned near the end seemed like an afterthought.

I keep getting emails about my posts being flagged and removed by mods. Why is this? I’ve checked the rules and haven’t broken any…

Not just referred to twice but iterated twice, right? I assume that’s what you are highlighting? This would be a base requirement for me too: iterables should not be assumed to be reusable.

Yeah, exactly - it just doesn’t really “sit right” - with any programming language.

Generally speaking good functions have interfaces which work like this:

You pass in whatever pieces of data are required for the operation (only once!) and a result is returned, ideally without side effects.

I know the suggestion has been to wrap this “uglyness” up into a function and hide it somewhere. But my argument is really as simple as “the fact we had something to hide suggests we are missing the right tool for the job”. That’s it, really. The performance gains to be had imo are secondary, although some others may feel this is also a compelling argument.

Now that it’s split from the other thread, I take it that this is a suggestion for a new stdlib itertools function?


Is there an implementation of this in more-itertools? If not, why not try adding it there first?

It’s always easier to talk about something concrete, and lots of this thread feels like it’s dancing around the issue that the most efficient implementation will vary based on what you know about the input.

For example, if it’s a sorted[1] sequence with a min length of 1, you can just do seq[0] == seq[-1]. That’s an absurd suggestion for a generic check! But it’s obviously more or less the best you can do for that narrow case.
IMO, that’s the key issue: the more you know about the input, the better you can make the implementation. How often do you need a highly optimized solution but don’t know anything other than “it’s iterable”?


  1. in a domain in which sorting relates to equality in “the usual way” ↩︎

2 Likes

If it’s a set, you can just check the length!

1 Like