An optional hasnext method for iterators

zach · May 1, 2022, 1:51am

Proposal:

I propose that iterators have an optional __hasnext__ method.
The truthiness for iterators would be equivalent to calling iterVar.__hasnext__(). For example, if iterVar: would be equivalent to if iterVar.__hasnext__():

Notes:

TBD: Async iterators should be considered. (i.e. __ahasnext__)
TBD: Raising an exception when iterators missing __hasnext__ are tested for truthiness? Maybe an exception isn’t backwards compatible enough.

Use Cases:

First use case example:
Consider the builtin statistics library’s (pure Python fallback) implementation of mean. cpython/statistics.py at 3.10 · python/cpython · GitHub

# Old version
def mean(data):
    if iter(data) is data:
        data = list(data)
    n = len(data)
    if n < 1:
        raise StatisticsError('mean requires at least one data point')
    T, total, count = _sum(data)
    assert count == n
    return _convert(total / n, T)

# New version
def mean(data):
    if not iter(data):
        raise StatisticsError('mean requires at least one data point')
    T, total, count = _sum(data)
    return _convert(total / count, T)

The new version reads better, and it’s more space efficient because data = list(data) is no longer present. Specifically, the new version doesn’t store a giant list of the iterator’s items.

Second use case example:
Test if a generator is empty.

myList = []
if myList:
        print("A")
        
myGenerator = (i for i in [])
if myGenerator:
        print("B")

Currently, this code prints “B” and not “A”. But why print “B”? I think neither should be printed.

Adding a __hasnext__ method to generators would fix this problem.

yield could allow the user to expose whether or not a next item exists. e.g. A second argument to yield would be a function that indicates this.

def gen():
    for i in [1, 2, 3, 4, 5]:
        yield(i, lambda: i < 5)

sweeneyde · May 1, 2022, 5:08pm

If you need this capability, you can always make your own wrapper, for example:

# untested, use at your own risk
class CachedIterator:
    __slots__ = ["_it", "_next", "_sentinel"]

    def __init__(self, it):
        self._it = iter(it)
        self._next = self._sentinel = object()

    def has_next(self):
        if self._next is not self._sentinel:
            return True
        try:
            self._next = next(self._it)
        except StopIteration:
            return False
        else:
            return True

    def __next__(self):
        if self._next is not self._sentinel:
            result = self._next
            self._next = self._sentinel
            return result
        else:
            return next(self._it)

Note that a fast, side-effect-free __hasnext__ for generators would be impossible in general: the only way, in general, to figure out if there’s more data is to try to compute more data:

>>> gen = (x for x in itertools.count() if is_odd_perfect_number(x))
>>> gen.__hasnext__() # ???

Or consider side effects:

def genfunc():
    yield 1
    if it_is_tuesday():
        cursor.execute("DROP TABLE students;")
        yield 2
>>> gen = genfunc()
>>> next(gen)
1
>>> gen.__hasnext__() # ???

This is also part of the broader story of LBYL-vs-EAFP (see glossary). People used to idioms in other programming languages are often resistant to EAFP style, but in Python, it’s fine and normal and common to do try:/except StopIteration.

In my mind, it’s nice to have only one iteration protocol (going back 20 years to PEP 234), and to have it only have two methods (__iter__ and __next__). It’s helpful for understanding that the basic answer to “when is the next thing computed” is always “when I call its __next__ method”. Explicitly wrapping with something like a CachedIterator is useful sometimes, but IMO that behavior should always be a result of explicit wrapping, since that caching would be extra (mental or performance) overhead everywhere else.

zach · May 1, 2022, 7:25pm

Thanks for your message. Perhaps you saw an old version of my post? I quickly removed the cached generator after I posted it, due to several issues.

Can you take another look at the post’s “second use case”? It explains gen.__hasnext__ via an optional argument for yield. No caching is necessary, and there’s no question as to “what is computed where”.

Regarding your other points, I’d like to tie them back to a consistency difference in truthiness. Specifically, bool(mySequence) will check len(mySequence) != 0, but bool(myIterator) is always True. Not ideal. In a perfect world, bool(myIterator) would evaluate __hasnext__ and also raise an exception if __hasnext__ is not defined.

I wonder if this is simply about things behaving as expected, and not LBYL-vs-EAFP.

Regarding PEP 234, maybe the name __hasnext__ isn’t necessary if __bool__ can be used instead.

sweeneyde · May 1, 2022, 8:42pm

Since yield is a statement and not a function, that’s already valid syntax, so changing it would break things:

>>> def f(i):
...     yield(i, lambda: i < 5)
... 
...     
>>> gen = f(2)
>>> next(gen)
(2, <function f.<locals>.<lambda> at 0x000001B826BEBBA0>)

Changing the implementation of __bool__ is also a breaking change: someone might be checking if generator: to mean if generator is not None:, and that use-case would break as well. That’s just the default __bool__ behavior for objects: everything is truthy except when len(...) == 0 or special cases like None or 0 or 0.0, etc. Theoretically, generators could be changed, but it would take a long deprecation period and I’m not sure it’s worth it. I’d rather stick to using a wrapper class when needed.

Somewhat related is PEP 424, but those semantics are very loose.

zach · May 2, 2022, 12:54am

Hi Steven, it seems my origial post was misplaced, please read the post here: https://discuss.python.org/t/an-optional-hasnext-method-for-iterators/15389

zach · May 2, 2022, 4:31am

Also Dennis, thanks for your reply. I think it wraps things up. In other words, from my perspective:

The current state of truthiness for iterators is not ideal when considering that None, 0, 0.0, 0j, Decimal(0), Fraction(0, 1), (), [], {}, set(), range(0) are all correctly truthy.
EAFP over LBYL goes a long way, and this discrepancy is mostly avoidable in practice.

An optional __hasnext__ method for iterators

An optional hasnext method for iterators