Optionally include pair of last and first element of iterator in `itertools.pairwise`

Proposal:

I propose to add an optional argument to the pairwise function in the itertools module. When this argument is set to True, the iterator returned by pairwise should include a pair of the last and the first element of the input iterator (if the input iterator is finite).
I also suggest circular as the name for this argument.

Example:

>>> list(pairwise(['A', 'B', 'C', 'D'], circular = True))
[('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'A')]

Why?

I found this extension of pairwise to be useful when working with geometrical shapes, specifically polygons. When corners is the (ordered) list of corners of a polygon, pairwise(corners, circular = True) can be used to iterate over the edges of that polygon.

There are likely some other cases in which this can be useful, for example creating secret santa pairs or something of that kind, or more generally when working with circular linked list data structures.

Reference implementation:

from itertools import tee, chain

def pairwise(iterable, circular = False):
    a, b = tee(iterable)
    first = next(b, None)
    return zip(a, chain(b, [first] if circular else []))

Edge cases:

>>> list(pairwise(['A'], circular = True))
[('A', 'A')]
>>> list(pairwise([], circular = True))
[]
3 Likes

cc: @rhettinger

As an alternative, there could be a generic circulator()[1] itertool that traverses any iterable, repeating the first element at the end. The proposed functionality is then pairwise(circulator(...)).


  1. name inspired by CGAL ↩︎

Sounds like this is easily implemented in user code. You can also suggest it as an enhancement of more-itertools.

1 Like

Specfically you can do this with islice, pairwise and cycle:

>>> list(islice(pairwise(cycle(['a', 'b', 'c', 'd', 'e'])), 5))
[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e'), ('e', 'a')]

Only if you know the length. And that’s very wasteful memory-wise, storing all elements instead of just the first.

1 Like

It doesn’t store anything extra, it cycles. But yes you need to know the length. If it doesn’t have a length there’s no way to circularize.

I suppose if you have an arbitrary iterator that you know will end at some point, you could save memory by never storing the whole thing. But that is pretty niche? And not one of the examples in the op.

def circular(iterator):
    iterator = iter(iterator)
    first = next(iterator)
    yield first
    yield from iterator
    yield first
>>> from itertools import pairwise
>>> list(pairwise(circular("abcde")))
[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e'), ('e', 'a')]

(To be clear, this was just a response to the “If it doesn’t have a length, there’s no way to circularize.” though reading through again, I suppose the exercise was not to write a new function.)

2 Likes

Yes, this is very easily implemented in user code, see the reference implementation in my original post.
I just think, that it would be a natural, generally useful and non-obtrusive extension of a function, which is already in the standard library.

Regarding more-itertools:
The pairwise function was added to the standard library in Python 3.10. more-itertools had a pairwise function before that, but starting with Python 3.10 it just re-exports the function from the standard library. I would not like to make a change to only the more-itertools version of the function. Also, if a change is made to pairwise in the standard library, i would suggest to make the same change to pairwise in more-itertools, for use in legacy Python versions.

Furthermore, i would also suggest similar extensions to the functions triplewise and sliding_window in more-itertools

It is not niche at all. It actually is one of the main reasons to even use an iterator instead of just a list or tuple. The most well known example is probably the new behavior of the range function.

But storing is how cycle works. When you call itertools.cycle on something, it makes a list to store the items in, appends them all as it finds them, and then repeatedly returns values from that list. For this specific use-case, it’s overkill to save them all just to re-yield the first.

1 Like

I wasn’t referring to the concept of iterators as niche, I was referring to the concept of “circular pairwise iterator” as niche.

In the various examples above, the input was a sequence. Does cycle save a new copy in that case? Reading the source, it looks like it does, which is kind of a shame…

If iterator is itertools.repeat(...) this isn’t gonna circularize :wink: I should have said that “the length must exist” or something like that.

1 Like

Yes. The itertools functions/objects/tools/whatever you call them never assume that something is a sequence. Even if it IS a sequence, it would be extremely surprising if mutating after the first pass could sometimes affect it and sometimes not - imagine if itertools.cycle(x) behaved differently from the way itertools.cycle(iter(x)) did (since the latter isn’t receiving a sequence). Much cleaner to treat everything the same way and just iterate over it once.

1 Like

That’ll crash if there are no elements. I might do one of these:

def circular(iterator):
    iterator = iter(iterator)
    for first in iterator:
        yield first
        yield from iterator
        yield first
def circular(iterator):
    iterator = iter(iterator)
    first = [*islice(iterator, 1)]
    return chain(first, iterator, first)

For windowed, this has already been suggested.

1 Like

I’d be inclined to take the simpler and clearer route then: if there’s no first element, yield nothing at all.

def circular(iterator):
    iterator = iter(iterator)
    try: first = next(iterator)
    except StopIteration: return
    yield first
    yield from iterator
    yield first

I’d say that’s what every correct implementation does :slight_smile: (unless raising some error is decided to be the correct thing to do, but I don’t think so)

Agreed, but what I mean is, the simplest expression of that isn’t “iterate over the iterator, and for every element in it, yield everything from the iterator”. Now, that’s definitely equivalent for a well-behaved iterator, but I wouldn’t call it a simple way to write that. What I provided definitely expresses that concept.

And there ARE broken iterators out there. Files, for example. Here’s a fully-worked example that behaves very oddly with the for-loop version:

import tempfile

def circular(iterator):
    iterator = iter(iterator)
    for first in iterator:
        yield first
        yield from iterator
        yield first

with tempfile.NamedTemporaryFile(delete_on_close=False) as temp:
    print(temp.name)
    temp.write(b"First!\n")
    temp.close()
    with open(temp.name) as f:
        for line in circular(f):
            print(line.rstrip())
            new = input("Enter line to append, or blank to skip: ")
            if new:
                with open(temp.name, "a") as f: print(new, file=f)
    print("-- All done --")

Various combinations of input here will result in surprising behaviour. For example, if you skip the first prompt, you get back “First!” again (implying that we’re now done - that was the circularization), but if you respond with text to the followup prompt, you’ll get that line back twice more.

Sure, you can dismiss files as “broken”, but it’s not like this is some pathological example that I had to craft from scratch. :slight_smile:

1 Like

That’s a well-known concept :slight_smile:

(Since nobody was willing/able to explain how the above is “offensive, abusive, to be hateful conduct or a violation of our community guidelines” and I disagree and a moderator had left it alone after their edit, I’m assuming it’s a mistake and the only edit I’ll do to unhide it is to add this note.)