Sum: fast path for objects that supports the buffer protocol

sum(b"ABC") returns 198. In the background it converts b"A", b"B" and b"C" to C integers, then makes it Python int objects then gets their corresponding C integer value and then adds that to the intermediate C value.

This is a very roundabout way with too many conversions.
It should be possible to make sum support the buffer protocol, get the type and as such allow Python to do very fast calculations on array.array and memoryviews.

This is getting a little into numpy’s territory, but I feel that for the very common operation sum a speed-up is warranted, and not too much effort to implement. I am willing to implement it, if I am not the only one who considers it a useful feature.

EDIT: Correct typo

The buffer protocol doesn’t imply that iterating over something gives you a list of bytes, so I don’t think it makes sense to special-case it.

The buffer protocol allows requesting the format.
If the format is any of the type specifiers for integer, float or double, that could warrant a fast path. For bytes it is actually “B” for uint8_t, but that was just an arbitrary example. It would be very nice for any array.array type.

1 Like

While you’re at it, similar improvements should be possible for min, max, any and all this way: any and all can check multiple small integral values at once by type punning; min and max can bail out if they encounter the extreme values for the type. (There are some bit-masking techniques to allow rejecting multiple small integral values at once with min and max, too, but the overhead might not be worth it.)

(You might also be interested in my corresponding previous (not well received) proposal, to add dunder hooks so that other types can specify these kinds of optimizations - or implement those methods for e.g. non-iterable container abstractions, such as an interval on the real number line. Of course, that latter doesn’t have a sum, but you get the idea.)

On my machine, this gives 294.

➜ python
Python 3.11.3 (main, Jun 27 2023, 06:53:21) [GCC 13.1.1 20230429] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> sum(b'abc')
294
2 Likes

Thanks. I made a type sum(b"ABC") returns 198. I will correct it.

Out of curiosity, would you keep the original types and just overflow if it comes to taht, or what would be your actual casts?

No, that optimization would be incorrect in general as being a buffer does not imply the type is iterable, so sum() ought not to work on all buffers:

>>> import pickle
>>> import collections.abc
>>> pickle.PickleBuffer(b"abc")
<pickle.PickleBuffer object at 0x100f58440>
>>> sum(pickle.PickleBuffer(b"abc"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'pickle.PickleBuffer' object is not iterable
>>> isinstance(pickle.PickleBuffer(b"abc"), collections.abc.Buffer)
True

Some buffers might even support iteration and return something different than an iterable of byte values.

So your proposed optimization is a behavior change.

1 Like

An infamous example is mmap.mmap(). It supports the buffer protocol, but iterating it produces bytes objects of length 1.

>>> m = mmap.mmap(-1, 1024)
>>> next(iter(m))
b'\x00'
3 Likes

I also considered this and thought these instances to be rare. Now that they turn out to actually be in the stdlib. I think a fast-path for sum is indeed not a very good idea.

Would it make more sense to add a “.sum()” method to array.array and memoryview instead?

When you start with adding a “.sum()” method to array.array and memoryview, you continue with adding “.min()”, “.max()”, “.all()”, “.any()”, “.prod()”, “.mean()”, “.stdev()”, and not stop until you found yourself adding “.write_as_gzipped_csv_and_upload_to_aws()”.

2 Likes

I also have seen the actual array.array code. Every method is duplicated for each machine type, because that is how the hardware works. Different instructions are needed for each type. In addition I wrote a small mockup C function to test how big the gains would potentially be and found that the array may not be C contiguous so that is also something to take into account. So it will be quite code-intensive, just for .sum() only.

And then you would still have something that is just a poor-man’s version of numpy’s functionality. So definitely not worth the effort.

Thanks everyone for the feedback. That saved me quite a lot of coding time. I hope the feedback did not take too much of your time.

2 Likes