Similar to Numpy slice assignment and C++ std:fill for example, it would be useful to be able to fill a list or bytearray object with a constant value/object. For example:
some_list[:] = 0
or:
some_list.fill(None, startpos, endpos)
The fastest method seems to be to just re-allocate the object, but that is not always desirable/convenient.
I’m personally trying to convert some Python code to C++ using Shedskin, which doesn’t use reference counting so re-allocations are undesirable. Using std:fill is also way faster than using a loop to update a list/bytearray element-by-element.
It may be the case that this is somewhat of a niche problem, although as mentioned Numpy/C++ at least also support this.
There is the somewhat obscure some_bytearray[:] = b'\x00' * len(some_bytearray). This isn’t ideal if the byte array is very large, but should be faster than other currently possible solutions.
to full assign / content swap as likely one of the fastest paths. The bytes repeat code is fairly efficiently.
In 3.15 should hopefully be even faster if the base object is a bytearray. In 3.14 the bytes object b'\0' * 100 constructs gets memcpy into the bytearray storage but in 3.15 if the bytes object is uniquely referenced the bytearray can jut “take” it and save an allocation + copy. In my perf testing the allocation + copy is over 10% of the runtime for the case (bytes repeat is quite quick).
fundamentally, doing an allocation, filling it, then copying over the contents, then getting rid of the allocated memory (refcounting or GC’ing) is not as fast as doing a single memset.. in fact it should be at least twice as slow I think?
this may only become visible once comparing it to an efficient C/C++ implementation, not to other (even slower) python methods..
also, considering (future) JIT compilation and/or non-cpython implementations (which may not use refcounting), it’s nice to have a path available that does not involve object allocations, since that often becomes the bottleneck once the code itself runs faster.
Not with the changes I’m working on which will hopefully be in 3.15 :). Already I’ve updated bytearray to internally store a bytes. With that can optimize operations (ex. GH-141862 for += (iconcat)) so when they are passed a bytes that is uniquely referenced it will just replace the bytearray storage with no memcpy or deallocation (if you’re familiar with C++ very similar to std:move).
More Details / Minutia (what's needed for single memset)
In the interpreter there’s work to make sure uninitialized memory isn’t exposed (aside: the C API doesn’t guarantee this). That means when an object is allocated its memory is filled with a value (ex. bytearray.__init__(count)). So to get a single-memset block of bytes with a specific fill the fill needs to be provided simultaneously the count. b'\0' * count provides that fill as well as the count which goes into an efficient implementation (_PyBytes_Repeat).
Definitely could do something like a bytearray(count=1024, fill=b'\xCD') but fast assign to unique reference bytes adds less complexity to me as it just makes existing operations/code more efficient
thank you for your work on cpython, and explanation! it won’t really help alternative python implementations though, which is why I was bringing this up.. so in the case of Shedskin, which doesn’t even use refcounting, I also need to do some special-case detection to make it run fast.