Deprecate "old-style iteration protocol"?

Please remember that not all Python code is open source or publicly available.

The best we can do is find a lower bound on classes that use the sequence protocol.

It may or may not be representative of proprietary code, but there’s grep.app to search GitHub, which can be limited to Python files—of course, with any of this, you have to be able to form a query that will capture what you are looking for, which may not be entirely obvious here.

If it’s any solace, type checkers don’t recognise old style iterables as compatible with iterable. And in all the years of mypy, this missing support has only come up a couple time, especially so once people mostly stopped using Python 2.

I’m aware of one popular library (torch.utils.data.Dataset) that relies on old style iteration for providing iterability. I tried to get them to use __iter__, but they claimed some users had use cases where it wasn’t actually an iterable. Of course, most code assumes that it is iterable (including plenty of code in torch), so I wasn’t sympathetic to that concern. But I didn’t feel like arguing the point :slight_smile:

3 Likes

Put me on team “deprecate old-style iteration protocol”, especially now that Python 2 is fully retired. We would not have added this today if we started with __iter__.

For those projects that depend on the old-style iterator, is copy-pasting this enough to fix them? Edit: Nope, see below

    def __iter__(self):
        from itertools import count
        for i in count():
            yield self[i]

You would also have to put the loop in a try: except: block and turn IndexError into StopIteration I think, so it has to be a bit longer.

If the current behavior stays then opting out of it is much easier (__iter__ = None as explained above). So current behavior is more convenient to the users too.

I think the boilerplate is this because generators are supposed to return instead of raise StopIteration since Python 3.7.

    def __iter__(self):
        from itertools import count
        try:
            for i in count():
                yield self[i]
        except IndexError:
            return

I rather see the current implementation of the iteration protocol as unnecessarily complex, unexpected and confusing. I think that deprecation and later removal of the old protocol would contribute to making Python more accessible.

2 Likes

I don’t think it is really unexpected that

for x in foo:
  do_something(x)

is similar to

i = 0
while True:
  x = foo[i]
  do_something(x)
  i+=1

with the extra details about IndexError stopping the iteration instead of being raised.

In fact, many newcomers write the while loop equivalent (for i in range(len(foo))) and use indexing when they come from other C like languages, and they learn later that python has a nicer way of doing it.

The stuff about __iter__ is an extra layer on top of that to customize it when you can’t / don’t want to use indexing for iteration.

They’re not broken, so don’t need fixing.

I don’t understand the desire many people have to break other people’s code and make more work for other people, especially when the feature they want to remove doesn’t affect them personally.

Its not whether the fix is four lines or four hundred lines, but that code that works in Python 3.0 through 3.11 suddenly breaks when you try to run it in 3.whatever, and the person trying to run the code has to work out why, then fix it. If they are even capable of it (maybe they are using a sourceless, byte-code only app, or maybe they’re an end-user with no programming skill).

Legacy code that works is not broken, and we should only break it if we have a really good reason.

The time to have removed this, if it needed removal, was in 3.0, when we removed or changed a bunch of other things for asthetic reasons (e.g. old style classes). We didn’t remove it then. That should tell us something.

1 Like

I have never, not once, seen a beginner ask a question about iteration in Python that was confused about the existence of the old sequence protocol, and I have spent a lot of time helping beginners on various forums.

Or if I have, it was so long ago, and so minor, that I have completely forgotten it.

But I have seen a lot of people, beginners and experienced coders alike, including some true Pythonista gurus, get confused about the iterator protocol and what it takes for an object to be an iterator (as opposed to what it takes for an object to be iterable).

Even without the sequence protocol, the iterator protocol is complex:

  • Objects with __iter__ and __next__ methods are iterators.
  • The __iter__ method must return self.
  • Objects with only an __iter__ method which doesn’t return self are very common, but they aren’t iterators and don’t seem to have a name apart from “iterable”.
  • But “iterable” also includes iterators.
  • If the __next__ method raises StopIteration, it must forever afterwards raise StopIteration. Otherwise it is officially broken.
  • People think that range() objects are iterators, they are not.

Compared to that, the sequence protocol is simple and straightforward! wink

1 Like

Lemme clarify a bit.

  • Objects with an __iter__ method are iterable. This method should return an iterator.
  • Objects with __iter__ and __next__ methods, where __iter__ returns self, are iterators.
  • If the __next__ method raises StopIteration, it must forever afterwards raise StopIteration. Otherwise it is officially broken. But broken iterators do happen.

(And your comment about range objects is part of that distinciton: a range object is iterable, but it is not an iterator.)

2 Likes

It’s not really “broken” if an iterator provides a way to rewind, advance, or otherwise modify the iteration. It just has to be used responsibly. For example, an io file object is an iterator of the lines in a file that supports the ability to seek() to the beginning or end, or to a byte offset (or opaque tell() value if it’s text I/O). For example:

>>> f = io.StringIO('1\n2\n3\n')
>>> next(f)
'1\n'
>>> offset = f.tell()
>>> list(f)
['2\n', '3\n']
>>> f.seek(offset)
2
>>> list(f)
['2\n', '3\n']

>>> f.seek(0)
0
>>> next(f)
'1\n'
>>> f.seek(0, os.SEEK_END)
6
>>> list(f)
[]
2 Likes

According to the documentation, it’s still broken. Broken things can still be useful, but you can expect bizarre behaviour from them around their brokenness.

https://docs.python.org/3/glossary.html#term-iterator

I know of the “deemed broken” wording, but I don’t like that phrasing. I think an iterator is only strictly broken when __next__() doesn’t continue to raise StopIteration if nothing else has intentionally modified the iteration state. I’d have no misgivings if the docs stated that such cases are “undefined behavior” in the iteration protocol. For example, a dependent iterator probably won’t or can’t reset its state appropriately for a source iterator that has been resurrected like this. The contract is that once an iterator raises StopIteration, its consumer(s) can throw it away as exhausted. It’s a simple use once and discard mentality. Anything more complex requires coupling between the producer and consumer.

3 Likes

Well, okay. Change the wording from “broken” to “undefined behaviour”. Actually, that would be quite entertaining - it’ll set Steve D’Aprano off on one of his rants.

But either way, an external user of an iterator can’t know whether anything has modified the iteration state, so the idea that an iterator can be exhausted and then have more data is independent of any call to seek() etc. As I understand it, file objects have been broken in this way basically forever, and it hasn’t stopped them from being useful; but people shouldn’t be surprised if code like this fails:

def mutate(it):
    for thing in it:
        yield thing.upper()

with open("somefile") as f:
    lines = mutate(f)
    for line in lines: print(line)
    f.seek(0)
    for line in lines: print(line)

If you know how the file object works, you can see a potential fix: just reinitialize the mutator each time. But the mutator isn’t required to cope with broken iterators, and I don’t think that it’s a problem to call the file object broken in this way.

Why not simply say that if an iterator returns StopIteration, clients are allowed to assume that the iterator will always return StopIteration on future calls to __next__()? That captures the key point here, while allowing other behaviour (iterators can offer a “reset” mechanism, and callers don’t have to make the assumption if they know better).

That’s pretty much what the docs say already:

Once an iterator’s __next__() method raisesStopIteration, it must continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken.

1 Like

My point is that the docs say that the iterator is “broken” if it violates that assumption. My version avoids making that judgement, and simply notes that clients can assume what happens next without checking. It’s of little consequence in terms of how people write code, but it might stop some of the arguments about whether something is a “proper” iterator in cases where it makes no practical difference.

But I’m not about to make a PR for the docs, so I don’t actually care that much.

If it walks like a duck, and quacks like a duck, it’s probably a duck. Even if it occasionally honks when you’re not looking :slightly_smiling_face:

2 Likes

A duck that honks is a Citroën 2CV :wink:

1 Like

Clients don’t need permission to make other assumptions about iterators after they raised StopIteration. If you want to iterate over iterators like this, there are no Python Police to stop you (although your peers may laugh at you behind your back):

it = iter(some_iterable)
for i in range(100):
    for obj in it:
        process(obj)

For most iterators, the last 99 attempts to iterate over it will be empty loops, but you never know when an exhausted iterator will suddenly recover and stop being exhausted. Right?

The risk is actually the other way. Here is a legitimate idiom that will fail if the iterator suddenly unexhausts itself:

words = iter(words)
# Process words before "STOP" in one way, and words afterwards
# in another way.
for word in words:
    if word == "STOP":
        break
    process_before_stop(word)

do_some_more_stuff()
# Now process words after "STOP"
for word in words:
    process_after_stop(words)

We should be able to assume that if the first loop exhausts the iterator (i.e. that the sentinel “STOP” either doesn’t exist, or is the very last word), that the iterator will remain exhausted forever, and that the second loop will do nothing.

If iterators can be reset, then we don’t know if do_some_more_stuff() may have reset the iterator and broken our expectations about the iterator being exhausted.

And that is why, technically, file iterators are broken.

But then file I/O is a very grubby case. Errors can be transient; files can be modified by other processes even in the middle of a read. Reading a file is not idepotent: there is no guarantee that two reads of the same file from the same position will give the same data, even if you are reading from read-only media. Computing would be so much cleaner and simpler if there was no I/O :slight_smile:

Describing an iterator as “broken” is a provocative thing to say. But this is Python, and if you want to shoot yourself in the foot, you can. Broken things can be useful. If you want to give your iterators a reset mechanism, you can, but then don’t be surprised if that breaks people’s expectations about iteration.

1 Like