Deprecate "old-style iteration protocol"?

Note “types implementing __getitem__ but not __iter__ are” NOT “assumed to be “infinite lazy sequences” by default.”

class FiniteGetItemExample:
    def __getitem__(self, key):
        if key in range(10):
            return key
        else:
            raise IndexError

list(FiniteGetItemExample())
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

About __getitem__ / __iter__ connection: many people coming from other languages (Java and its kin) code iteration unpythonically as a loop over getting items. The default implementation eases that path (no idea if that’s the reason it was included in paleo-Python).

Infinite lazy sequences are not the default. You have to give your class an __getitem__ method, and that method has to violate the expected behaviour of a finite sequence or mapping.

Your class does that, and the consequence is that it behaves as an infinite lazy sequence. If you don’t want it to behave as an infinite lazy sequence, then don’t program it to be an infinite lazy sequence.

The Principle Of Least Astonishment correctly applies to user interfaces, not APIs, and refers to what ordinary users expect, not skilled programmers. A widget that looks like a push button should behave as a push button. A hyperlink stating that it takes you to a glossary of terms should actually go to a glossary of terms.

We can extend the principle to APIs, but then the problem becomes that we no longer have anything even remotely close to “an ordinary user”. Programmers differ far more in what they consider “astonishing”, and that astonishment changes greatly as they become more experienced.

You might be surprised that your “identity_map” class behaves as an infinite lazy sequence, but I wasn’t, I recognised it immediately I saw the __getitem__ method, and to me there is no surprise that it iterates forever.

To me, it would be astonishing if it didn’t iterate forever.

So the POLA is a very weak argument when it comes to software APIs. Surprising to whom?

Don’t just dismiss backwards compatibility like that. The sequence protocol goes back to Python 1.x days, it is much, much older than the iteration protocol, and the Python language takes backwards compatibility very, very seriously.

Given the choice between

  1. breaking some unknown number of third-party scripts and applications,

  2. and requiring people who don’t want their subscriptable classes to be iterable to add one extra line of code to their class

we’re going to choose 2 unless there is some really, really, big and important reason to break people’s code.

“Some people, who haven’t learned about the sequence protocol, might be surprised; other people just don’t like it” is not a big important reason.

You should. That means that any time you call list(obj) on some unknown iterator, you have no idea whether it is going to terminate or loop forever.

And depending on your OS and the version of Python, that could simply raise MemoryError after some indefinite time (which your program probably doesn’t handle), or in the worst case it could lock up your computer to the point it needs a hard reset, or cause the OOM Killer to start randomly killing processes.

I’ve had both happen to me, although fortunately not on production servers!

Given how easy it is to write iterators that run forever, and how useful it is, I’m actually surprised that there aren’t more problems in practice with list not terminating.

Correct. It is a method, not the method. Iteration in the Python data module corresponds to a pair of protocols, the iteration protocol and the sequence protocol.

No, that’s not what happens. Your “identity_dict” class has no __iter__ method.

What actually happens is that iter() builtin accepts objects that follow the sequence protocol as well as the iteration protocol.

A side-effect of this is that any code that tests whether an object is iterable by looking only for __iter__ is wrong.

Perhaps it was a mistake to unify sequence indexing and mapping subscripting with a single dunder method, but given that they both use the same syntax obj[x] it is hard to see how they could use different dunders. Because Python is dynamically typed, the compiler cannot tell whether obj[x] is a sequence that expects an index or a mapping that expects a key. So that’s a language limitation we have to live with.

(Although for code written using the C stable ABI, there actually is such a method to distinguish the two.)

Which doesn’t solve your problem right now. Right now, you still need to prevent your identity_dict object from being iterated. You can’t afford to wait until Python 3.15 or 3.16 or even some far distant 4.0 version in another 30 years.

So depreciation doesn’t help you, it just inconveniences those whose working code will start raising annoying warnings and then eventually stop working.

Depreciating working language features is not a step we take lightly. Obviously there have been changes to the language, but they are mostly additions, and relatively few subtractions. Some code, written for Python 1.4 or 1.5 and perhaps even older, is still capable of running under Python 3.10 or 3.11.

2 Likes

If you want something to be dict-like and NOT iterable, you can simply block iteration:

>>> class Identidict:
...     def __getitem__(self, key):
...             print(key)
...             return key
...     __iter__ = None
... 
>>> d = Identidict()
>>> next(iter(d))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Identidict' object is not iterable

I think, that it was pretty clear, what I meant. Types that implement __getitem__ are assumed to be “infinite lazy sequences” by default. If you want a finite sequence, you must raise IndexError. If you want a non-lazy sequence, you must define a __len__. If you want the collection to be a non-sequence, you have to implement __iter__ or set it to None.

An object that provides access to its elements via a key or index doesn’t need to be a Sequence, so the fact that python implicitly assumes so is wrong.

I am sorry, but this is just wrong. Quoting directly from the wikipedia article you linked:

In more abstract settings like an API, the expectation that function or method names intuitively match their behavior is another example.

Programmers are users. They are users of the API. “Skilled programmers” are just as human as “normal” users. If we follow your logic, we wouldn’t need error checking and intuitive APIs. “Skilled programmers” would just never make mistakes and would be able to instantly debug any problem in their code (even without useful error messages).

Yes, POLA is a subjective metric. That doesn’t make it a useless metric. This behaviour is a “gotcha”. Just because you are aware of this gotcha doesn’t make it any less surprising to people who haven’t stepped on this proverbial rake before.

Again, quoting directly from the wikipedia article you linked:

in particular a programmer should try to think of the behavior that will least surprise someone who uses the program, rather than that behavior that is natural from knowing the inner workings of the program.

The reason, why this behaviour is not surprising to you is because you are familiar with it, because you know “the inner workings” of this API. Without this knowledge, this behaviour ceases to be “obvious”.

The way __getitem__ automatically implies a “default” sequence-like __iter__ implementation does not match the rest of the python data model.

  1. You don’t magically get a __mul__ implementation for integers after defining __add__. You would be surprised if x*3 would automatically evaluate x+x+x if x.__add__ existed, but not x.__mul__ right?
  2. This behaviour is surprising, because it doesn’t intuitively follow from “defining a method that gets an item”.

Again, I think that it’s pretty clear what I meant here. I meant, that itertools.count() terminating or not is irrelevant to the problem described in my original post. Let’s not get sidetracked, this thread is about the fact that implementing only __getitem__ implicitly makes your collection a Sequence.

Yes, but this is a distinction without a difference as far as I am concerned. Again, I think, that my point is perfectly understandable here:

  1. The programmer didn’t create an __iter__ method
  2. Python saw that there is a __getitem__ method and assumed that the collection is a Sequence
  3. Code calling iter or iterating over the collection or checking for element membership behaves as if a “default” __iter__ implementation was present, due to faulty assumption (2)

Yes, hasattr(foo, "__iter__") is False, but at this point you seem to be nitpicking the details. If it acts like a duck, walks like a duck and quacks like a duck… These types act like there is a default __iter__ implementation.

As I mentioned in my original post, collections.abc.Iterable from the standard library is also “wrong” in this way (yes, it is a documented behaviour, but an inconsistent documented behaviour).

My problem right now, is that (non-sequence) collection types that define __getitem__ and not __iter__ behave “surprisingly”. Often, this surprising behaviour results in infinitely hanging code, exceptions in unrelated places and sometimes even silently incorrect code. DeprecationWarnings absolutely would make the situation better.

And some code written in Python 3.9 is not capable of running under Python 3.10. Also, Python 2 was EOL two years ago. People, who still run code that was written for Python 1 and never updated, are unlikely to be using Python 3.10 right now anyway.

I am not saying that we should “move fast and break things”, but let’s not pretend that any change that breaks backwards compatibility should be instantly rejected just because some Python 1 code might no longer work.

2 Likes

Two different people answered you the same way, so if you were misunderstood, what you meant wasn’t “pretty clear”.

But I don’t think you were misunderstood. I think you were, and still are, just incorrect in your claim that classes (with or without __getitem__ are “assumed to be infinite lazy sequences by default”.

The interpreter makes no assumption about your class being lazy or infinite. The fact that your class behaves as a lazy infinite sequence is because you programmed it to behave as a lazy infinite sequence (as well as an infinite mapping).

If the user of your identity class were to write this:


obj = identity_dict()  # Your class.

key = random.random()

while True:

    try:

        print(obj[key])

    except LookupError:

        break

    key = random.random()

it would still loop forever and never terminate. And the nature of the identity dict is such that there is nothing you can do to prevent this infinite loop except tell your users “Don’t do that!” and make them aware that it is a lazy, infinite mapping.

At least with iteration (for-loops or list) you can disable that completely by setting the iter dunder to None.

It isn’t lazy and infinite “by default”, it is lazy and infinite because you programmed it to be that way.

Well yeah.

How else are you going to signal that the sequence (or mapping) doesn’t have an index/key if you don’t raise an exception?

For real collections (sequences or mappings) you have some actual data structure with a finite size, so this issue doesn’t come up. When you run out of data, you get a LookupError (IndexError or KeyError).

Your mapping doesn’t have actual data. It lazily simulates fake data, and does so without terminating. So what did you expect to happen if the caller repeatedly looks up indexes/keys over and over again?

If every key always succeeds then that implies that it is infinite and lazy.

If it was not your intention to write an infinite, lazy collection then it is your code that is buggy, not the language. If it was your intention, then congratulations, you succeeded, and the Python interpreter did exactly what you told it to do, which was to loop forever.

Right. That’s because iteration in Python can use two different protocols, with iter taking precedence.

Your class is not considered a Sequence. isinstance(identity_dict(), collections.abc.Sequence) returns False.

But your class behaves sufficiently like a sequence in this regard, because you programmed it to behave like a sequence.

In other words, your class might not swim or fly like a sequence, but it quacks like a sequence. For a task like iteration that only requires quacking, your class might as well be a sequence.

This sort of duck-typing is built deep in the Python execution model. If you don’t want it, I’m afraid you are using the wrong language :frowning:

Dunders are used by the interpreter to implement certain behaviours. If you write the dunder, you are responsible for that behaviour. The interpreter isn’t assuming anything – you have explicitly written the dunder to cause your class to behave in the way that it then behaves.

As far as your issues with the POLA, we’re going to simply have to disagree on this one.

You wrote a class with a dunder used to define sequence behaviour, and then were surprised that your class behaved like a sequence. I was not.

Right. And that is why arguments from the POLA are very weak when it comes to APIs, beyond such obvious and trivial examples that (let’s say) a function called “print” should print.

APIs consist of much more than just function names. Knowledge of protocols can be surprising if you don’t know the protocol, but that is not a violation of POLA. That’s just lack of knowledge.

This is just wrong. __getitem__ does not imply a default __iter__ implementation.

This is the second time you have made that wrong claim about a default __iter__, please stop repeating that misinformation. Your class has no __iter__ method, the interpreter does not add one, and iteration using __iter__ is only one of two ways that iteration is defined in Python.

Just as the str() builtin falls back on __repr__ when __str__ doesn’t exist, and the not-equal operator falls back on __eq__, and most operators have a reversed __rop__ method. You cannot assume that operations in Python only use a single dunder.

No, but then I am something of a mathematician of sorts, so it wouldn’t surprise me for multiplication to fall back on addition. That is the most natural thing in the world.

I beg to differ. It is not surprising, because it does intuitively follow from getting an item.

The most simple, natural, intuitive form of iteration is:

  • get the first item (index 0 in Python);

  • get the second item;

  • get the third item;

  • get the fourth item;

etc, halting when there are no more items.

It is a critical distinction with obvious consequences, not the least of which is that testing for iteration by looking only for a iter dunder is not sufficient.

Iteration is not controlled only by the presence of a iter dunder. If you thought it was, you were wrong. It is as simple as that.

You lacked knowledge about the design of Python. Your lack of knowledge doesn’t mean that the interpreter is wrong, or that we should break working code to bring Python back into line with your incorrect assumption about iteration. It just means you lacked knowledge.

Now you know better. Congratulations. You are a more knowledgeable Python programmer today than you were two days ago.

In a practical sense, it is highly unlikely that depreciation would make it into 3.11, so you shouldn’t expect it before 3.12. At which point it will likely be silent depreciation. Unless you run Python with all silent warnings enabled, which hardly anyone does, you probably wouldn’t see the warning until 3.13 or 3.14.

Point of note: collections.Counter() behaves as if every possible element is in it, yet it isn’t infinitely iterable. This isn’t a problem, since it has a different definition of iterability, but it goes to show that synthesizing results in response to __getitem__ doesn’t mean they have to be iterated over.

Steve, I know how you adore a good argument, but just because you can catch the OP on a few technicalities (like talking about an implicit __iter__ function, when really what’s under discussion is iterability in general - yes, congratulations, you found a technical error), don’t assume that the argument’s merits do not exist. It IS surprising that some forms of __getitem__ will make an object iterable and others will not. For instance, this one will not:

>>> class X:
...     def __getitem__(self, item):
...             if isinstance(item, float): return item
...             raise KeyError
... 

But this one will:

>>> class X:
...     def __getitem__(self, item):
...             if isinstance(item, int): return item
...             raise KeyError
... 

Yes, it’s documented. It doesn’t mean it won’t be surprising.

(And yes. Programmers most certainly ARE users, and the Principle of Least Astonishment absolutely DOES apply. I have had the unpleasant experience of working with a number of highly surprising APIs, and it is not something to wish on one’s worst enemy.)

(Unless your worst enemy is a self-aggrandized Wordpress “expert” who charges exorbitant rates for minimal work, in which case (a) they deserve everything that PHP can throw at them, and (b) you deserve a better enemy.)

11 Likes

Correct, this class is both lazy and infinite. However, this class is not a sequence. It is an infinite, lazy collection. You can not derive the ability to iterate over a collection from only the ability to get a value from a key/index.

The __getitem__ method intuitively seems like a method, that ought to only provide the ability to get a value from a key/index, but in practice it also provides an ability to iterate over the collection by assuming that it is a sequence (unless you explicitly opt-out via setting __iter__ manually).

You can tell, that it assumes that it’s a sequence, because it iterates only over non-negative integers and checks for IndexError (instead of iterating over every possible value (which is impossible of course) and checking for LookupError).

Right. And I am arguing that one of these two protocols ought to be deprecated. That would make the concept of “iterable” correspond 1-to-1 with the __iter__ method.

It behaves like a sequence despite not being a Sequence, because that’s what the current language semantics entail (aka the current data model semantics assume that this class is a (lowercase s) sequence). And I am arguing that these semantics are bad and unintuitive.

Iteration is not controlled only by the presence of a iter dunder. I knew that it wasn’t. I thought that it shouldn’t. “This is how it currently is” is not a good argument against changing things.

I find your approach to this discussion to be needlessly condescending. My argument is “this is how it currently is and I think, that it is unintuitive and should be changed”, not “I don’t understand what is happening here, please help”.

See prior discussions regarding and related to this subject:

2 Likes

OK. This is a plausible suggestion. I suggest that we focus on this as the rest of the discussion is getting heated and frankly neither productive nor particularly interesting.

The fallback to __getitem__ with an index for iteration was from the days before the iterator protocol, and was included when that protocol was added for backward compatibility. However, it was not, and never has been, deprecated, so it is still a valid way of creating a class that you can iterate over (I’m not using the word “iterable” here because I don’t want to get sucked into the nitpicking debate).

It could be deprecated, but the benefit is small (it’s not that confusing, and there arent that many cases where it causes an issue, despite your comments - all your examples are based round a single class that you wrote, you haven’t provided any evidence that this is a widespread problem). And we have little or no evidence that the downside would be similarly small - no-one has surveyed how many classes still rely on this fallback, or how widely used they are.

So if you really want to push for deprecation, I think you need to focus on those practical points - cost and benefit - rather than debating theory and opinion, which as far as I can tell is what this discussion is tending towards.

But frankly, I think it’s a waste of time. You can fix your class with a one-line addition (__iter__ = None). If others have this issue, they can do so too. The time spent already on this discussion is far greater than the time it would take to simply fix the problem in your class. If it bothers you that much, add a comment to the class explaining how you wish you didn’t have to do this, but you must in order to work with (at least) Python 3.11 and earlier, and there’s no sign yet that Python is likely to change…

6 Likes

Fair point. Do we have a representative collection of python code that could be scraped for statistics like this? I could probably hack something together with BigQuery or the like, but I would prefer to avoid that, if we already have some tooling for this. I think I remember seeing somebody gathering similar stats on the bug tracker/discuss previously.

I actually already did that. :laughing:

1 Like

Please remember that not all Python code is open source or publicly available.

The best we can do is find a lower bound on classes that use the sequence protocol.

It may or may not be representative of proprietary code, but there’s grep.app to search GitHub, which can be limited to Python files—of course, with any of this, you have to be able to form a query that will capture what you are looking for, which may not be entirely obvious here.

If it’s any solace, type checkers don’t recognise old style iterables as compatible with iterable. And in all the years of mypy, this missing support has only come up a couple time, especially so once people mostly stopped using Python 2.

I’m aware of one popular library (torch.utils.data.Dataset) that relies on old style iteration for providing iterability. I tried to get them to use __iter__, but they claimed some users had use cases where it wasn’t actually an iterable. Of course, most code assumes that it is iterable (including plenty of code in torch), so I wasn’t sympathetic to that concern. But I didn’t feel like arguing the point :slight_smile:

3 Likes

Put me on team “deprecate old-style iteration protocol”, especially now that Python 2 is fully retired. We would not have added this today if we started with __iter__.

For those projects that depend on the old-style iterator, is copy-pasting this enough to fix them? Edit: Nope, see below

    def __iter__(self):
        from itertools import count
        for i in count():
            yield self[i]

You would also have to put the loop in a try: except: block and turn IndexError into StopIteration I think, so it has to be a bit longer.

If the current behavior stays then opting out of it is much easier (__iter__ = None as explained above). So current behavior is more convenient to the users too.

I think the boilerplate is this because generators are supposed to return instead of raise StopIteration since Python 3.7.

    def __iter__(self):
        from itertools import count
        try:
            for i in count():
                yield self[i]
        except IndexError:
            return

I rather see the current implementation of the iteration protocol as unnecessarily complex, unexpected and confusing. I think that deprecation and later removal of the old protocol would contribute to making Python more accessible.

2 Likes

I don’t think it is really unexpected that

for x in foo:
  do_something(x)

is similar to

i = 0
while True:
  x = foo[i]
  do_something(x)
  i+=1

with the extra details about IndexError stopping the iteration instead of being raised.

In fact, many newcomers write the while loop equivalent (for i in range(len(foo))) and use indexing when they come from other C like languages, and they learn later that python has a nicer way of doing it.

The stuff about __iter__ is an extra layer on top of that to customize it when you can’t / don’t want to use indexing for iteration.

They’re not broken, so don’t need fixing.

I don’t understand the desire many people have to break other people’s code and make more work for other people, especially when the feature they want to remove doesn’t affect them personally.

Its not whether the fix is four lines or four hundred lines, but that code that works in Python 3.0 through 3.11 suddenly breaks when you try to run it in 3.whatever, and the person trying to run the code has to work out why, then fix it. If they are even capable of it (maybe they are using a sourceless, byte-code only app, or maybe they’re an end-user with no programming skill).

Legacy code that works is not broken, and we should only break it if we have a really good reason.

The time to have removed this, if it needed removal, was in 3.0, when we removed or changed a bunch of other things for asthetic reasons (e.g. old style classes). We didn’t remove it then. That should tell us something.

1 Like

I have never, not once, seen a beginner ask a question about iteration in Python that was confused about the existence of the old sequence protocol, and I have spent a lot of time helping beginners on various forums.

Or if I have, it was so long ago, and so minor, that I have completely forgotten it.

But I have seen a lot of people, beginners and experienced coders alike, including some true Pythonista gurus, get confused about the iterator protocol and what it takes for an object to be an iterator (as opposed to what it takes for an object to be iterable).

Even without the sequence protocol, the iterator protocol is complex:

  • Objects with __iter__ and __next__ methods are iterators.
  • The __iter__ method must return self.
  • Objects with only an __iter__ method which doesn’t return self are very common, but they aren’t iterators and don’t seem to have a name apart from “iterable”.
  • But “iterable” also includes iterators.
  • If the __next__ method raises StopIteration, it must forever afterwards raise StopIteration. Otherwise it is officially broken.
  • People think that range() objects are iterators, they are not.

Compared to that, the sequence protocol is simple and straightforward! wink

1 Like