Why does calling next on exhausted iterators not raise an exception

fakuivan · August 24, 2020, 2:13am

The iterator protocol states that iterator implementations that do anything else except raise StopIteration once exhausted should be considered broken. I understand it might be reasonable to state something like that if you want to avoid iterators “coming back to life”, but it limits the course of action when calling next on already exhausted iterators. It’s a common issue for beginners to forget to tee an iterator before using it twice on the code, and this restriction makes it “not standard” to implement protections against these kinds of errors. (by exhausted iterators I mean iterators that raised StopIteration at least once in their lifetime)

Here’s an implementation of such wrapper:

gist.github.com

https://gist.github.com/fakuivan/d089b1d982fca17b8287fc56a59529de/e477069215d87641664fef8d298807c349f8d3bb

reuse_guard.py

#!/usr/bin/env python3.8
from typing import TypeVar, Iterator

"""
https://docs.python.org/3/library/stdtypes.html#iterator-types
This is considered broken by the iterator protocol, however I think
that what's considered broken is to continue to _yield values_, where
with this we emphasize the fact that if ``StopIteration`` is raised
once, the iterator _should not be used_ further. Those are two
different things.

This file has been truncated. show original

It avoids errors like the the following:

In [48]: iterator = reuse_guard(iter((1, 2, 3, 4)))

In [49]: list(iterator)
Out[49]: [1, 2, 3, 4]

In [50]: list(iterator)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-47-456650faec86> in __next__(self)
     19         try:
---> 20             return next(self._iterator)
     21         except StopIteration as e:

StopIteration:

During handling of the above exception, another exception occurred:

IteratorExhaustedError                    Traceback (most recent call last)
<ipython-input-50-5070d0fe4365> in <module>
----> 1 list(iterator)

<ipython-input-47-456650faec86> in __next__(self)
     21         except StopIteration as e:
     22             if self._iterated:
---> 23                 raise IteratorExhaustedError(
     24                     "This iterator has already reached its end")
     25             self._iterated = True

IteratorExhaustedError: This iterator has already reached its end

Should something like this be allowed or even added to python? Why is this not the default behaviour?

aeros · August 24, 2020, 5:57pm

Generally speaking, iterators are intended to be iterated over only a single time. Once that first StopIteration occurs, the iterator has served its purpose, and is finished being used. In most cases where one would want to re-use the same iterator twice, they should be using a list or other iterable container instead. There are of course some exceptions, but this is the most common scenario in my experience (even more so for applicable situations to beginners).

This seems to be trying to solve an issue that only exists because of an incorrect assumption from beginners about how iterators are intended to be used, so I think that misunderstanding should be addressed rather than trying to implement protections against the mistake. It’s perfectly fine for beginners to make that mistake as long as they’re able to look up what they did wrong rather easily (typically by searching what StopIteration means and learning how iterators work).

To me, this doesn’t seem worth the cost of having to make changes to existing iterators that rely on StopIteration always occurring when exhausted, or changing expectations from an API perspective when dealing with iterators. I’m just not seeing a real practical use case here that adequately justifies a change of a fundamental protocol.

fakuivan · August 24, 2020, 11:41pm

I gave the beginners case as an example, I did not say that this only occurs because of misconceptions by beginners.

Fair enough. I still don’t see why the reuse_guard class I posted should be considered a broken iterator implementation. Shouldn’t the spec say something along the lines of “iterators should not continue to try to yield values once StopIteration has been raised”?

guido · August 25, 2020, 1:34am

Because that would constrain other uses of iterators to refrain from calling next() on an exhausted iterator. While you’re right that newbies (and others too!) often do that by mistake, there are also legitimate situations where this behavior is useful so that some other wrapper doesn’t have to keep extra state to record whether the wrapped iterator is exhausted. The current specification makes clear that such a wrapper can just call next() and if it raises StopIteration the iterator was exhausted.