Inconsistent/undocumented list extension behaviour

crj · October 8, 2024, 4:32am

This feels like either a documentation issue or an actual bug; I’m not sure which.

Suppose you try list.extend(iterable) but iterable raises an exception. What should the list now contain?

There two obvious and maybe even equally reasonable options:

The list is unchanged
Items yielded before the exception occurred are added

The documentation is silent on which occurs. What it does say, however, is that list.extend is “equivalent to a[len(a):] = iterable”.

Well…

def gen():
    yield from range(100,105)
    raise RuntimeError('whoops')

try:
    a = list(range(5))
    a.extend(gen())
except:
    pass
finally:
    print(f'{a=}')

try:
    b = list(range(5))
    b[len(b):] = gen()
except:
    pass
finally:
    print(f'{b=}')

…prints:

a=[0, 1, 2, 3, 4, 100, 101, 102, 103, 104]
b=[0, 1, 2, 3, 4]

…so they’re not precisely equivalent. Should they be?

Nineteendo · October 8, 2024, 5:41am

If you need the second behaviour, you can always use a.extend(list(gen())).

xitop · October 8, 2024, 7:58am

IMO a result of a failed operation should remain undefined - as a general principle.

chepner · October 8, 2024, 12:52pm

Which operation failed? Not the call to gen: it successfully returns a generator object. That leaves a particular call to next on that generator object, which means that list.extend and list.__setitem__ are doing something differently even if no exception is ever raised. I think it’s fair to say that “something” has to be documented, so that other implementations of Python define them consistently.

To be clear, I think both should have the same behavior, though it’s not clear which behavior is “correct”. I can see arguments for adding none and adding what can be added.

nedbat · October 8, 2024, 12:54pm

I think in general it is dangerous to use “equivalent to” in the docs, precisely because of these kinds of places where the results are different. The equivalence is never precise, and the differences are not explained. The itertools page often says “roughly equivalent to” which I guess is better because it alludes to the idea that it is not exact, though it does not explain the gaps.

In this case, the page is in the tutorial, and IMO uses a more exotic thing to explain a simpler thing. I would simply remove the “equivalent to” sentences.

crj · October 8, 2024, 1:17pm

I’m inclined to agree. Especially since, having slept on it, I’ve realised the difference can be exhibited even without the generator raising():

def gen(fn):
    yield from range(100,105)
    fn()
    yield from range(200,205)

a = list(range(5))
a.extend(gen(lambda: print(f'{a=}')))
# a=[0, 1, 2, 3, 4, 100, 101, 102, 103, 104]

b = list(range(5))
b[len(b):] = gen(lambda: print(f'{b=}'))
# b=[0, 1, 2, 3, 4]

jamestwebber · October 8, 2024, 1:29pm

In both cases, the discrepancy is because gen() is being fully evaluated before assignment in the second case, while it’s being iterated over item-by-item in the first case.

So in the “raise exception” version, nothing has been added at the time of the exception. In the “print midway” example, the list has yet be to modified at the time of printing, but I expect the final list to be the same as a.

crj · October 8, 2024, 1:40pm

Undefined is a relative term, though. Some possible outcomes, in decreasing order of reasonableness:

[0, 1, 2, 3, 4] or [0, 1, 2, 3, 4, 100, 101, 102, 103, 104]
[0, 1, 2, 3, 4, 100, 101, 102]
[]
[0, 1, 2, 3, 4, 'hello']
Program crashes
Python erases your home directory

Where does one draw the line?

jamestwebber · October 8, 2024, 1:47pm

I think the current behavior is a pretty good place to draw it.

It seems like an implementation detail that a[len(a):] = ... will unpack the RHS into a sequence before assignment. That detail could be changed in the future, for performance purposes or some other reason–or more generic optimizations might change it if they can recognize what’s happening. But it’s not something that should be special-cased just to maintain this small detail of equivalence, because that would make future optimization harder.

xitop · October 8, 2024, 1:55pm

I mean the single (logical) line of code where an exception has occurred and interrupted the normal execution order. An exception usually means a failure and what was interrupted has failed.

Nineteendo · October 8, 2024, 2:23pm

I think the reason might have to do with the fact that when assigning to an extended slice, the iterable on the right hand side of the statement must contain the same number of items as the slice it is replacing.

elis.byberi · October 8, 2024, 2:28pm

Both methods are equivalent in the sense that under normal conditions, they both append each item from the iterable to the list. What’s not normal in your case is the raised exception.

Using extend:

extend() processes the generator elements one by one.
The gen() function yields values 100, 101, 102, 103, 104 before raising the RuntimeError.
Even though the generator raises an exception after yielding all values, extend() still adds all the yielded values to a before the exception occurs.
The exception happens after all the values have been added, so the list a becomes [0, 1, 2, 3, 4, 100, 101, 102, 103, 104].

Using slice assignment:

Slice assignment b[len(b):] = gen() attempts to evaluate the entire generator first before performing any changes to the list.
When the generator yields values 100, 101, 102, 103, 104, it works fine.
However, as the generator finishes yielding all values and hits the raise RuntimeError('whoops') line, an exception is raised before the slice assignment is completed.
Since the slice assignment isn’t completed (because of the exception), the list b is not modified at all.

This issue has been discussed previously.

Is it worth explaining the behaviors of both extend and __setitem__ here?

elis.byberi · October 8, 2024, 4:16pm

I doubt the evaluation order will change.

Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side.

jamestwebber · October 8, 2024, 4:20pm

Of course, but I don’t think that is relevant here: gen(...) evaluates to a generator, not a sequence. The assignment of b[len(b):] = gen(...) converts that generator to a sequence before adding it to b. It’s not about the general evaluation order, it’s about how a assignment to a slice interprets a generator object.

elis.byberi · October 8, 2024, 4:39pm

Yes, this is consistent with the syntax being used and is an implementation detail of list.__setitem__.

b[len(b):] = iterator is deliberately requesting an iterator, and it will iterate over the gen() object, similar to how a for ... in loop processes gen().

Since the iterator is on the right-hand side (RHS), it will be evaluated before the left-hand side (LHS) according to the evaluation order rule.

It may look confusing at first.

jamestwebber · October 8, 2024, 4:47pm

This was the point I was making in the first place–it’s an implementation detail. That means it could change, and it doesn’t make sense to force the two versions in the OP to act precisely the same in pathological cases.

chepner · October 8, 2024, 6:16pm

So in both cases, you advocate essentially a transaction approach, where neither a.extend(...) nor b.[len(b):] = ... actually commits any changes until the entire iterable is successfully exhausted?

Like I said, I don’t know that it’s preferable to do it one way or the other, or even that both need to behave the same, but I don’t think there is any reason it should be undefined, in the sense that each Python implementation is free to choose whatever behavior it likes.

pkoning2 · October 9, 2024, 12:42am

Some languages, like ALGOL 68, use two terms for things that are not precisely specified. Perhaps “undefined” and “unspecified” but I don’t remember. One means “anything can happen up to including loss of your home directory”; the other means “the resulting value is not specified but more serious evil outcomes, like a program crash, will not happen”.
In this case, the latter answer seems right, i.e., program execution continues (with the raised exception) but we’re not going to constrain what value ends up in your list.

xitop · October 9, 2024, 11:35am

No, I do not. Let me try to explain my opinion again:

Given a generator like the one in the initial post, no expression using it up to the point where it raises can be evaluated:

newset = myset.union(gen())  # will not assign new value

We are now discussing only if a side effect should be precisely defined. e.g. here:

mylist.extend(gen())    # original question
myset.update(gen())     # very similar

There might by expressions returning a real result (not just None) and having a side-effect at the same time.

I have doubts that:

if a result is impossible to get, should we insist that the side-effect of list.extend should be precisely defined/documented?
if yes, will it be defined/documented also for set.update and all other possible uses (in stdlib or in general) ?
If yes, will be the rules consistent?

And that’s why I feel it’s better not to go further that road.

ChrisBarker-NOAA · October 11, 2024, 8:14pm

There might by expressions returning a real result (not just None) and having a side-effect at the same time.

I have doubts that:

if a result is impossible to get, should we insist that the side-effect of list.extend should be precisely defined/documented?

if yes, will it be defined/documented also for set.update and all other possible uses (in stdlib or in general) ?

the list being changed in list.extend() is not a side effect, it is the intended result of the method call.

I would note that this is why “Functional programming languages generally emphasize immutability”.

But Python does not. Note that tuples don’t have an extend() method.

There might by expressions returning a real result (not just None)

And this is why the mutating methods on the Python built-in objects all return None.

Yes, you can bury that mutating method call inside a more complex expression (or a function for that matter) that does return something, and that may have a return value, and could hide that “side effect”. I would argue that that’s also why functions with side effects are often discouraged.

To use this example:

def bad_idea(some_data):
    a_global_list.extend(some_data)

if you passed a iterator that raised before completion, then the function would raise, but a_global_list would be altered. And that could be a mess.

But the answer is not that extend() should be atomic, but that you shouldn’t write functions with side effects like that.

NOTE: I’m a bit curious if this would be different if Python was re-designed now:

Early Python used to be “all about Sequences” slicing, extend() all sorts of things were kind of designed around sequences.

Iterators were introduced later – and plugging iterators (in particular lazy iterators) into the Sequence-focused language does create a few places of impedance mismatch.

maybe extend() would have been implemented differently if iterators were the initial use case.

But anyway, as you point out, it’s all of the mutating methods in all classes that take iterators that this issue could effect.