Inconsistent sequence docs (and perhaps behavior)

kg583 · February 21, 2024, 5:24am

There is a table in the docs that outlines the basic operations on sequence types.

In the table s is an instance of a mutable sequence type, t is any iterable object and x is an arbitrary object that meets any type and value restrictions imposed by s (for example, bytearray only accepts integers that meet the value restriction 0 <= x <= 255).

We turn our attention to the row on .extend():


`s.extend(t)` or `s += t`	extends s with the contents of t (for the most part the same as `s[len(s):len(s)] = t`)

Now, s.extend(t) and s += t are not necessarily literally the same, but they should do roughly the same thing: modify s by putting all the things in t at the end.

However, we run into a discrepancy:

test = bytearray([1, 2, 3])
test.extend([1, 2, 3])

print(test)
>>> bytearray(b'\x01\x02\x03\x01\x02\x03')

test = bytearray([1, 2, 3])
test += [1, 2, 3]

print(test)
>>> TypeError: can't concat list to bytearray

This error is raised in every version of Python I could check (at least as far back as 2.7), so I presume the behavior is “intended”. Thus, I would suggest updating the docs to reflect this.

But, given what the docs do say, this behavior seems definitely not intended. Not only should .extend and += do essentially the same thing, each item in [1, 2, 3] is an acceptable member of a bytearray, which the leading paragraph very clearly delineates as acceptable!

I’d be interested if there’s any “deliberate” reason for this TypeError that someone could provide or if its just a holdover from Python 2. Whatever the case, something should be updated, with the docs being the easier option but the behavior IMO needing attention regardless.

steve.dower · February 21, 2024, 1:35pm

The reason would seem to be that bytearray + list is not an allowed operation, but bytearray.extend(iterable_of_ints) is.

Ultimately, the fact that a += b is more like a = a + b than it is like a.extend(b) is what’s not obvious.

Proposing a change to the docs to treat .extend and += separately would probably be fine.

MegaIng · February 21, 2024, 2:50pm

This isn’t true for list:

>>> l = [1,2,3]
>>> l += [4,5]
>>> l += (6,7)
>>> l += (i for i in range(8, 10))

are all valid, and the stdlib MutableSequence ABC implements __iadd__ by mapping it to extend:

github.com

python/cpython/blob/5f7df88821347c5f44fc4e2c691e83a60a6c6cd5/Lib/_collections_abc.py#L1167-L1169


      
          def __iadd__(self, values):
              self.extend(values)
              return self

bytearray doesn’t follow the normal pattern for MutSeq, and should IMO be fixed.

tjreedy · February 21, 2024, 8:44pm

I suspect that the different implementation for bytearray may be due to the type and value limitation of the objects added.

steve.dower · February 21, 2024, 8:57pm

Right, bytearray came from bytes and str, which is why it follows their style here (try "abc" + ["d", "e", "f"]), and does not derive/draw inspiration from MutableSequence.^[1]

So the questions are:

can we clarify the documentation to make it more obvious that bytearray only supports closely-compatible concatenation in + and +=

and

can we change bytearray’s behaviour to allow concatenation with more types

The answer to the first one is “yes, with a PR, and we can backport to all versions so the docs are clearer.”

The answer to the second is “maybe, probably needs a PEP or at least a proper design.” Similar recent changes/proposals have had PEPs (the dict union operator, and PEP 467 is still ongoing).

Most of the ABCs are inspired by the concrete implementations, so don’t be too surprised when they don’t match up with every single type consistently. ↩︎

MegaIng · February 21, 2024, 9:02pm

No, the question is only “can we make bytearray’s __iadd__ behave like the MutableSequence ABC says it should behave”.

[1, 2] + (1, 2) doesn’t work either after all. This isn’t about general concatenation ^[1], it’s just that for mutable sequences a += b isn’t described as being a = a + b, but described as being a.extend(b). This is similar to a + b == b + a being true for numeric types (outside of precision loss), but this is ofcourse not true for sequences despite using the same operator.

In fact, the sequence ABC doesn’t require __add__ to be supported ↩︎

jamestwebber · February 21, 2024, 9:03pm

The docs currently state that bytearrays support the mutable sequence operations, so I think some clarification/correction is necessary until the behavior changes (if ever)

ncoghlan · October 20, 2024, 2:00am

bytearray concatenation works via the buffer protocol, which lists don’t implement.

It’s semantically distinct from extending the array with new values (which does use iteration).

Having two rows and noting that in-place concatenation is usually the same operation as extension by iteration makes sense to me.