"List inception" leading to out-of-memory and no exception

I had this brilliant idea of adding an element to the list which is the list itself. A nested list. This behaves well:

❯ python -c 'l = [1,2,3]; l.append(l); print(l)'                                                                                                                               
[1, 2, 3, [...]]

~
❯ python -c 'l = [1,2,3]; l.extend(l); print(l)'
[1, 2, 3, 1, 2, 3]

But if you use an iterator instead it behaves so bad that it fills up your RAM, swap space and completely freezes your machine. The code:

❯ python -c 'l = [1,2,3]; i = iter(l); l.append(i); print(l)'  # This is OK                                                                                          
[1, 2, 3, <list_iterator object at 0x7fb6ec4a78d0>]

❯ python -c 'l = [1,2,3]; i = iter(l); l.extend(i)'  # Don't try this at home OOM! 
❯ python -c 'l = [1,2,3]; l += iter(l)'  # Don't try this either: OOM!

I wanted to ask if this is a known wart. Eitherways, it might need fixing.
I haven’t seen anyone make a list like this, but this is too easy to break. I am using Python 3.7.4 on Linux if it makes any difference. See screenshot below:

What would you expect to happen instead? There are many ways to consume unlimited memory in Python and this is just one of them. I’m not sure that this should be considered a bug.

Indeed one can argue that this should be the behaviour. I tried to mimick this in C++ and I see my RAM filling up too. With this code:

#include <iostream>
#include <list>


int main() {
  std::list<int> l;
  l = {1, 2, 3};

  for (auto i = l.begin(); i != l.end(); ++i){
    // std::cout << *i;
    l.push_back(*i);
  }
}

But Python is not C++. There are safety mechanisms in place in Python to avoid such behaviour. I can show how dictionaries behave:

In [18]: d = {'a': 1, 'b': 2}

In [19]: d.update({k*2: 0 for k in iter(d)})  # No problem

In [20]: d = {'a': 1, 'b': 2}

In [21]: for k, v in d.items():
    ...:     d[2 * k] = 2 * v   # Error raised. Good!
    ...:
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-21-e70d39418520> in <module>
----> 1 for k, v in d.items():
      2     d[2 * k] = 2 * v
      3

RuntimeError: dictionary changed size during iteration

In [22]: d = {'a': 1, 'b': 2}

In [23]: for i in iter(d):
    ...:     d[i*2] = 0   # Again, Error raised. Good!
    ...:
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-25-7745a173c361> in <module>
----> 1 for i in iter(d):
      2     d[i*2] = 0
      3

RuntimeError: dictionary changed size during iteration

Why can’t lists be safer?

In [24]: l = [1, 2, 3]

In [25]: for i in iter(l):
    ...:     l.append(i)  # Get ready for oom!
    ...:

As for your question, I would expect either a RuntimeError or the behaviour like extend method while not using an iterator.

While we could catch this very specific case and raise an error for it, there is no way to catch all possible “stupid things” that people may do.

2 Likes

Agreed, but I think that a patch which changes this scenario to instead raise a RuntimeError wouldn’t be a bad thing. Especially if it’s already been implemented for dicts.

It is not what is implemented for dicts.

For dicts you will get an exception when you try to iterate a dict when mutate it wish changing the internal layout. This is because the internal layout is not visible outside, and the behavior is not predictable. But the behavior of a list is highly predictable, and there is a code which add or remove elements from a list while iterating it. It is useful when you know what you do.

dict.update() does not check if its argument is an items view or an iterator of an item view of the same dict. There are no reasons to do this.

1 Like

You’re correct, I can now see that the behaviors are not at all equivalent. The external visibility plays a significant factor, which I had not adequately considered before. Thanks for the explanation.

I can see it being useful to resize a list while iterating over its elements (A), but I’m not certain that it would ever be necessary to extend a list using an iterator of the list being iterated.

A)

ls = [1, 2, 3]
for i in ls:
    if conditional:
        ls.extend(other_iterable) # other_iterable being any iterable besides iter(ls)

B)

ls = [1, 2, 3]
for i in ls:
    if conditional:
        ls.extend(iter(ls))

Can you explain or present a scenario where it is necessary to use something along the lines of B? I’m finding it quite difficult to imagine one. At least to me, this looks like something that could reasonably raise a RuntimeError.

Whether or not it’s a worthwhile time investment to implement this behavior is a different story, but I don’t think an error being raised here would be at all unreasonable.

@storchaka Thanks for clarifying dicts. Perhaps the pattern

l = [1, 2, 3]
for i in iter(l):
    l.append(i)
    if condition(i):
        break

has some use, and is better left alone. However I cannot think of any use case for l.extend(iter(l)) or l += iter(l).

I cannot think of any use for i = i + 0 but we don’t raise an error for that either…

But I get your point that this causes a crash, so catching it would be nice. However, I’m not at all sure it’s anything like as simple to catch as you think it is. I suggest that if you’re interested in this, you (or someone else who is motivated) need to provide an implementation of the behaviour you’re proposing. There’s still no guarantee that it would get accepted (the maintenance cost of the new code needs to be considered, as well) but it would at least demonstrate that it’s possible.

To be honest, though, I don’t think it’s worth bothering. I think this is the sort of thing that people will encounter once (at most), work out what they did wrong, and never do it again. So I don’t think it’s worth checking for. (It’s even arguable that working out what your error was teaches you some useful things about iterators and lazy evaluation that you wouldn’t otherwise have known, so having the error occur is actually a positive… :wink:)

1 Like

That was the main premise of what I was trying to argue, but I’m starting to realize that the implementation details would be quite complicated. What would be the process for determining that a list is being extended by an iterator of itself?

With situations like this, we have to do a cost-benefit analysis and determine if the mistake is made frequently enough for it to justify the resources required to make this situation raise an error. But, as you said:

However, if someone is motivated enough to cause this behavior to raise an error, perhaps an initial implementation could be created using ast for demonstration purposes. I’m not at all experienced with using it myself, so I’m not certain as to what the process would involve.

Did you mean “… so not having the occur is actually a positive”? Presumably if the error did occur, the author would not have needed to work out the examples. I’m assuming that was a typo, but I just wanted to make sure. (:

If so, I agree. Perhaps this would serve as good instructional content. Lazy evaluation can be incredibly convenient, but it’s a double-edged sword. Improper usage can easily lead to unexpected behavior.

How would you use a new feature? Currently you can not use l.extend(iter(l)).

An iterator of the list is just one particular case. What about reverse(l), filter(None, l), map(int, l), zip(l), islice(l, 0, None), and an infinite number of other cases?

And what about set, dict, deque, array and other collectionhs?

Yep, that’s the key question.

Agreed, and honestly I think that this discussion alone has cost more than any benefit that would be gained. I can’t imagine any non-toy code that would try to extend a list with an iterator of itself. I can’t even think of a plausible accident that would result in this (excluding the OP’s original case of playing with self-referential lists, for educational purposes, where he’s probably learned more from the current behaviour than he ever would have from a simple error).

I expect that extending with a general infinite iterator (like, for example, itertools.repeat(10)) would be orders of magnitude more likely, and there’s no way even in theory to error on that case.

I meant “having the infinite loop error occur”. But yes, I was hopelessly unclear - you got my meaning, though, luckily :slight_smile:

2 Likes

Good point, it would probably be impossible to address all of the indefinite loop scenarios, even if someone was fully dedicated towards doing so.

From a development perspective, I think that I learned a decent amount as well. I was a bit too eager initially to modify the existing behavior because it looked similar to the dict behavior at a surface level. I was also too fixated on whether or not the error would “make sense”, when in reality that’s only a small part of the consideration.

The feedback from you and @storchaka helped me to more deeply consider the cost of implementing the behavior relative to the benefit it would provide. I’m personally glad that this discussion occurred. Thanks for the insight. (:

1 Like

Cool, and thanks to @ashwinvis for a question which generated an interesting discussion!

1 Like

See also:
https://bugs.python.org/issue31815
https://bugs.python.org/issue14010

2 Likes