How safe/documented is it to mutate a list being iterated?

Gouvernathor · September 10, 2022, 12:25am

It seems from various trials on my part that removing or adding items from a list, while iterating over the list, is fine and doesn’t cause any problem. But how much is it so, and is it safe to assume that behavior will hold true for future (and possibly past) versions of Python ?

Consider the following code :

>>> liz = list(range(16))
>>> for el in liz:
...     liz.pop(2)
>>> liz
[0, 1, 10, 11, 12, 13, 14, 15]

This seems to indicate that it’s safe to remove items just before and just after the one being considered, as well as the one being considered itself. But one example is not the same as a certainty.

My use-case would be to iter through a list of nodes, and some nodes may add other node for future consideration (but in a specific order, not at the very end of the list). And we don’t want to lose formerly considered nodes.
So, the solution we’re considering is roughly this :

nodes_list.sort()
for i, node in enumerate(nodes_list):
    if node.adds_others:
        others = sorted(node.others())
        formers = nodes_list[i+1:]

        new_end = merge_sorted(formers, others)
        # zip two sorted lists into one sorted list
        # it's heapq.merge really

        nodes_list[i+1:] = new_end

We didn’t manage to crash this yet. Is it safe ?

Rosuav · September 10, 2022, 12:31am

Depends on your definition of “safe”. You should be able to be 100% confident that you will not segfault Python with code like this, for instance, so in that sense it is “safe”.

I think the code you have is probably going to work fine, but it might not be the cleanest way to describe it. My reading of it is that you’re trying to iterate through things, always hitting each element once (I assume that if node.others() returns a node you’ve already iterated over, you want to tag it again later), and since you’re merging sorted lists, it looks like you really want a priority queue. You mention heapq.merge, suggesting that you’re aware of the existence of heaps, but I can’t see what about this code means you can’t simply use an actual heap for it. Maybe I’m just not comprehending your code, but that itself might be a clue that something’s more complicated than it needs to be

Gouvernathor · September 10, 2022, 12:37am

I’m not sure what a heap is, and I didn’t write the code either, I just know in what context it’s supposed to fit in and what feature(s) it will bring - if it works.
Something you wondered about, and here is the answer, is that .others() won’t (ever) return a node that’s already been considered.

Also, one aspect I forgot to mention is that mutating sets while iterating through them often raises exceptions, and I’m fairly sure I’ve seen dicts do so too, my question is (roughly) whether that can happen in the case of lists.
My conception of “safe” is this : 1) it won’t raise an exception, 2) it won’t end iterating before all the nodes in the list have been considered. Including the case where the last node in the list adds other nodes at the end of the list.

Rosuav · September 10, 2022, 1:02am

Oh! Okay. In that case, lemme introduce you to heaps!

A heap in Python is built on top of a list, but instead of sorting the list completely, you kinda “half sort” it according to some rules (the details can be found in the heapq module’s docs). Importantly, the element at the start of the list is the one with the lowest value, just like it would be if you sorted the list.

Instead of sorting node.others() and then merging it with the remainder of the list, what you’d do is simply add the others into the heap. This can be done fairly efficiently.

And instead of iterating over the list in strict order, you repeatedly ask the heapq module to pop the first element off the list; it’ll then rebalance things so that popping the first element off is always reasonably fast.

Ultimately, what you get is a heap containing all the things you haven’t looked at yet. (There’s no record of all the ones you have looked at; if you want that, push them into a list as you process them.) Everything runs in O(log n) time, which is generally going to be fast enough for what you need.

The biggest advantage of this switch is that you no longer need to wonder whether the iteration and mutation will raise an exception, because the iteration logic basically looks like this:

while nodes_list:
    node = heapq.heappop(nodes_list)
    if node.adds_others:
        for new_node in node.others():
            heapq.heappush(nodes_list, new_node)
    # process the node as normal

This is now completely normal and well-understood, and is most definitely safe. Your two (or three) concerns are assured:

It won’t raise an exception. Each step is individually safe (heappop will raise if the heap is empty, but it’s only called after the while loop has checked that there’s still stuff to process), so you can be confident that this is safe.
It won’t end iterating before all nodes in the list have been considered. Safe because adding to a heap correctly means the loop will continue.
Including where the last node adds others. This is where the order of checks becomes important; the while loop only checks for content in the list after any potential nodes have been added to the heap. So the list might become empty between the heappop and the subsequent heappush calls, but that won’t end the loop.

A heap is a cool data structure that looks, to me, like someone went “wow, keeping a list sorted is just WAY too much trouble and I can’t be bothered”. It’s a lazy approach to sorting that just does the barest minimum to get by, and then lets tomorrow take care of itself. In real-world usage, it turns out that laziness is actually really practical!

Gouvernathor · September 10, 2022, 1:08am

Ok, thanks very much ! We need to keep a list of nodes in consideration order after iteration is over, but I understand how to do that from your code and we’ll try to adapt it to our context.

Rosuav · September 10, 2022, 1:19am

Cool, should be easy enough to add them to a list as they’re processed; adding onto the end of a list is fast (amortized constant time, if you’re curious).

steven.daprano · September 10, 2022, 10:34am

[quote]
Also, one aspect I forgot to mention is that mutating sets while
iterating through them often raises exceptions, and I’m fairly sure I’ve
seen dicts do so too, my question is (roughly) whether that can happen
in the case of lists.

In general, it is not “safe” to mutate data structures (in the sense of adding or deleting elements) while iterating over them. Unless you are very careful, you might miss elements, or iterate over the same element twice.

It should be safe to mutate them in the sense of modifying or replacing the element.

# Not safe.
for i, obj in enumerate(mylist):
    if obj is None:
        del mylist[i]

# Safe.
for i, obj in enumerate(mylist):
    if obj is None:
        mylist[i] = "Hello"

Dict iteration specifically checks to see if the length of the dict changes, and if it does, it will raise an exception. Set iteration does the same thing (I think). But that doesn’t notice if you delete an item and then add another item in its place.

Lists have no such protection, because (with care!) there are safe algorithms for mutating the list as you iterate over it. You just have to be more careful while doing so.

steven.daprano · September 10, 2022, 11:00am

By the way, Discourse does not recognise “py3” as a language for code blocks. You have to use “python”.

Oh heavens no!!! You’ve either been very lucky, very clever, or just haven’t noticed the problems as they occur.

L = [1, 1, 2, 1, 3, 4]
# Remove all 1's from the list.
for i, x in enumerate(L):
    print("inspecting element", x, "at position", i)
    if x == 1: del L[i]

print(L)

and the output is:

inspecting element 1 at position 0
inspecting element 2 at position 1
inspecting element 1 at position 2
inspecting element 4 at position 3
[1, 2, 3, 4]

Your code with the nodes inserting new nodes seems okay to me. I think it should be okay because you don’t delete any elements at or before the current position, you only insert new elements ahead of the current position. So iteration will continue without skipping anything, or ending early.

But to quote Donald Knuth:

“Beware of bugs in the above code; I have only proved it correct, not tried it.” - Donald Knuth.

I would want many, many, carefully-thought unit tests to verify the behaviour, carefully checking all the corner cases I can think of.

dstromberg · September 22, 2022, 3:42am

When I’m going to iterate over a datastructure that I also want to mutate in the same loop, I iterate over a copy.deepcopy(ds), and modify ds.

There are many heaps available for Python: