Optionally include pair of last and first element of iterator in `itertools.pairwise`

DERAlfons · January 17, 2024, 2:40am

Proposal:

I propose to add an optional argument to the pairwise function in the itertools module. When this argument is set to True, the iterator returned by pairwise should include a pair of the last and the first element of the input iterator (if the input iterator is finite).
I also suggest circular as the name for this argument.

Example:

>>> list(pairwise(['A', 'B', 'C', 'D'], circular = True))
[('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'A')]

Why?

I found this extension of pairwise to be useful when working with geometrical shapes, specifically polygons. When corners is the (ordered) list of corners of a polygon, pairwise(corners, circular = True) can be used to iterate over the edges of that polygon.

There are likely some other cases in which this can be useful, for example creating secret santa pairs or something of that kind, or more generally when working with circular linked list data structures.

Reference implementation:

from itertools import tee, chain

def pairwise(iterable, circular = False):
    a, b = tee(iterable)
    first = next(b, None)
    return zip(a, chain(b, [first] if circular else []))

Edge cases:

>>> list(pairwise(['A'], circular = True))
[('A', 'A')]

>>> list(pairwise([], circular = True))
[]

AA-Turner · January 17, 2024, 5:22am

cc: @rhettinger

ntessore · January 17, 2024, 8:28am

As an alternative, there could be a generic circulator()^[1] itertool that traverses any iterable, repeating the first element at the end. The proposed functionality is then pairwise(circulator(...)).

name inspired by CGAL ↩︎

erlendaasland · January 17, 2024, 8:36am

Sounds like this is easily implemented in user code. You can also suggest it as an enhancement of more-itertools.

jamestwebber · January 17, 2024, 6:35pm

Specfically you can do this with islice, pairwise and cycle:

>>> list(islice(pairwise(cycle(['a', 'b', 'c', 'd', 'e'])), 5))
[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e'), ('e', 'a')]

Stefan2 · January 17, 2024, 7:53pm

Only if you know the length. And that’s very wasteful memory-wise, storing all elements instead of just the first.

jamestwebber · January 17, 2024, 8:00pm

It doesn’t store anything extra, it cycles. But yes you need to know the length. If it doesn’t have a length there’s no way to circularize.

I suppose if you have an arbitrary iterator that you know will end at some point, you could save memory by never storing the whole thing. But that is pretty niche? And not one of the examples in the op.

effigies · January 17, 2024, 8:02pm

def circular(iterator):
    iterator = iter(iterator)
    first = next(iterator)
    yield first
    yield from iterator
    yield first

>>> from itertools import pairwise
>>> list(pairwise(circular("abcde")))
[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e'), ('e', 'a')]

(To be clear, this was just a response to the “If it doesn’t have a length, there’s no way to circularize.” though reading through again, I suppose the exercise was not to write a new function.)

DERAlfons · January 17, 2024, 8:07pm

Yes, this is very easily implemented in user code, see the reference implementation in my original post.
I just think, that it would be a natural, generally useful and non-obtrusive extension of a function, which is already in the standard library.

Regarding more-itertools:
The pairwise function was added to the standard library in Python 3.10. more-itertools had a pairwise function before that, but starting with Python 3.10 it just re-exports the function from the standard library. I would not like to make a change to only the more-itertools version of the function. Also, if a change is made to pairwise in the standard library, i would suggest to make the same change to pairwise in more-itertools, for use in legacy Python versions.

Furthermore, i would also suggest similar extensions to the functions triplewise and sliding_window in more-itertools

DERAlfons · January 17, 2024, 8:12pm

It is not niche at all. It actually is one of the main reasons to even use an iterator instead of just a list or tuple. The most well known example is probably the new behavior of the range function.

Rosuav · January 17, 2024, 8:13pm

But storing is how cycle works. When you call itertools.cycle on something, it makes a list to store the items in, appends them all as it finds them, and then repeatedly returns values from that list. For this specific use-case, it’s overkill to save them all just to re-yield the first.

jamestwebber · January 17, 2024, 8:18pm

I wasn’t referring to the concept of iterators as niche, I was referring to the concept of “circular pairwise iterator” as niche.

In the various examples above, the input was a sequence. Does cycle save a new copy in that case? Reading the source, it looks like it does, which is kind of a shame…

jamestwebber · January 17, 2024, 8:20pm

If iterator is itertools.repeat(...) this isn’t gonna circularize I should have said that “the length must exist” or something like that.

Rosuav · January 17, 2024, 8:24pm

Yes. The itertools functions/objects/tools/whatever you call them never assume that something is a sequence. Even if it IS a sequence, it would be extremely surprising if mutating after the first pass could sometimes affect it and sometimes not - imagine if itertools.cycle(x) behaved differently from the way itertools.cycle(iter(x)) did (since the latter isn’t receiving a sequence). Much cleaner to treat everything the same way and just iterate over it once.

Stefan2 · January 17, 2024, 8:49pm

That’ll crash if there are no elements. I might do one of these:

def circular(iterator):
    iterator = iter(iterator)
    for first in iterator:
        yield first
        yield from iterator
        yield first

def circular(iterator):
    iterator = iter(iterator)
    first = [*islice(iterator, 1)]
    return chain(first, iterator, first)

Stefan2 · January 17, 2024, 8:53pm

For windowed, this has already been suggested.

Rosuav · January 17, 2024, 9:07pm

Stefan:

That’ll crash if there are no elements.

I’d be inclined to take the simpler and clearer route then: if there’s no first element, yield nothing at all.

def circular(iterator):
    iterator = iter(iterator)
    try: first = next(iterator)
    except StopIteration: return
    yield first
    yield from iterator
    yield first

Stefan2 · January 17, 2024, 9:10pm

I’d say that’s what every correct implementation does (unless raising some error is decided to be the correct thing to do, but I don’t think so)

Rosuav · January 17, 2024, 10:01pm

Agreed, but what I mean is, the simplest expression of that isn’t “iterate over the iterator, and for every element in it, yield everything from the iterator”. Now, that’s definitely equivalent for a well-behaved iterator, but I wouldn’t call it a simple way to write that. What I provided definitely expresses that concept.

And there ARE broken iterators out there. Files, for example. Here’s a fully-worked example that behaves very oddly with the for-loop version:

import tempfile

def circular(iterator):
    iterator = iter(iterator)
    for first in iterator:
        yield first
        yield from iterator
        yield first

with tempfile.NamedTemporaryFile(delete_on_close=False) as temp:
    print(temp.name)
    temp.write(b"First!\n")
    temp.close()
    with open(temp.name) as f:
        for line in circular(f):
            print(line.rstrip())
            new = input("Enter line to append, or blank to skip: ")
            if new:
                with open(temp.name, "a") as f: print(new, file=f)
    print("-- All done --")

Various combinations of input here will result in surprising behaviour. For example, if you skip the first prompt, you get back “First!” again (implying that we’re now done - that was the circularization), but if you respond with text to the followup prompt, you’ll get that line back twice more.

Sure, you can dismiss files as “broken”, but it’s not like this is some pathological example that I had to craft from scratch.

Stefan2 · January 17, 2024, 11:00pm

That’s a well-known concept

(Since nobody was willing/able to explain how the above is “offensive, abusive, to be hateful conduct or a violation of our community guidelines” and I disagree and a moderator had left it alone after their edit, I’m assuming it’s a mistake and the only edit I’ll do to unhide it is to add this note.)