List/bytearray fill with single value/object

Similar to Numpy slice assignment and C++ std:fill for example, it would be useful to be able to fill a list or bytearray object with a constant value/object. For example:

some_list[:] = 0

or:

some_list.fill(None, startpos, endpos)

The fastest method seems to be to just re-allocate the object, but that is not always desirable/convenient.

I’m personally trying to convert some Python code to C++ using Shedskin, which doesn’t use reference counting so re-allocations are undesirable. Using std:fill is also way faster than using a loop to update a list/bytearray element-by-element.

It may be the case that this is somewhat of a niche problem, although as mentioned Numpy/C++ at least also support this.

2 Likes

There is the somewhat obscure some_bytearray[:] = b'\x00' * len(some_bytearray). This isn’t ideal if the byte array is very large, but should be faster than other currently possible solutions.

1 Like

Specifically for bytearray and zero (b'\x00'), there’s

some_bytearray[:] = bytes(len(some_bytearray))

Didn’t measure so don’t know relative speed.

Corrective edit: that actually also works for a regular list.

+1 from me, but I’m also C++ dev.

The “one-- and preferably only one --obvious way to do it”:

for i in range(len(some_list)):
    some_list[i] = 0

:+1: to full assign / content swap as likely one of the fastest paths. The bytes repeat code is fairly efficiently.

In 3.15 should hopefully be even faster if the base object is a bytearray. In 3.14 the bytes object b'\0' * 100 constructs gets memcpy into the bytearray storage but in 3.15 if the bytes object is uniquely referenced the bytearray can jut “take” it and save an allocation + copy. In my perf testing the allocation + copy is over 10% of the runtime for the case (bytes repeat is quite quick).

fundamentally, doing an allocation, filling it, then copying over the contents, then getting rid of the allocated memory (refcounting or GC’ing) is not as fast as doing a single memset.. in fact it should be at least twice as slow I think?

this may only become visible once comparing it to an efficient C/C++ implementation, not to other (even slower) python methods.. :slight_smile:

also, considering (future) JIT compilation and/or non-cpython implementations (which may not use refcounting), it’s nice to have a path available that does not involve object allocations, since that often becomes the bottleneck once the code itself runs faster.

Not with the changes I’m working on which will hopefully be in 3.15 :). Already I’ve updated bytearray to internally store a bytes. With that can optimize operations (ex. GH-141862 for += (iconcat)) so when they are passed a bytes that is uniquely referenced it will just replace the bytearray storage with no memcpy or deallocation (if you’re familiar with C++ very similar to std:move).

More Details / Minutia (what's needed for single memset)

In the interpreter there’s work to make sure uninitialized memory isn’t exposed (aside: the C API doesn’t guarantee this). That means when an object is allocated its memory is filled with a value (ex. bytearray.__init__(count)). So to get a single-memset block of bytes with a specific fill the fill needs to be provided simultaneously the count. b'\0' * count provides that fill as well as the count which goes into an efficient implementation (_PyBytes_Repeat).

Definitely could do something like a bytearray(count=1024, fill=b'\xCD') but fast assign to unique reference bytes adds less complexity to me as it just makes existing operations/code more efficient :slightly_smiling_face:

thank you for your work on cpython, and explanation! it won’t really help alternative python implementations though, which is why I was bringing this up.. :slight_smile: so in the case of Shedskin, which doesn’t even use refcounting, I also need to do some special-case detection to make it run fast.

2 Likes

Since Python 3.14 you can efficiently 0-fill an existing bytearray by resizing it to 0 and then resizing it back, without memcpy:

def zero_fill(a):
    size = len(a)
    a.resize(0)
    a.resize(size)
2 Likes

I think this could be quite a good place for a more general sequence protocol (optional) extension:

# For numeric types:
seq.resize(size, fill=0)

# For PyObject containers
seq.resize(size, fill=None)
3 Likes

I would assume, fill is to fill new elements only. bytearray.resize 0 memset-s only the new fragment.

Wouldn’t this be more clear?

seq.fill(None, resize=size)

Yeah, that was the idea following bytearray example. But yeah, this does feel like trying to do more than it should and it doesn’t do it well.

Well, in this case, we already have resizing protocol:

del list[slice]
seq.extent

So it would probably be better to separate concerns and simply add seq.fill(None, startpos, endpos) as suggested by the OP.

seq[slice] = 0 is ambiguous. It works for numpy, but what if I want to fill list with lists?
seq[slice] = [0] - this already has a meaning, which is different.

So I think if there is enough need, seq.fill(None, startpos, endpos) might be a good candidate.


Also, to add, if this ever happens, I would really like startpos, endpos from the beginning.
Things like this allow making actually performant stuff.
And these are quite unambiguous and easy to add.

If they are not added from the beginning, then people get used to not not doing it in as optimal path does not exist. And either make peace with something good enough or use 3rd party stuff.
And it is hard to add later as everyone is sort of happy.

This doesn’t apply to everything, but at least things that are of good performance could be made a bit more useful. E.g.:
a) Performant solutions due to significant effort - list.sort
b) Things that are easy to implement efficiently (e.g. only does array swaps) - list.reverse
c) Or something in the middle - list.rotate (maybe one day…)

2 Likes

Thanks, I fully agree! Btw, I apologise for starting a new thread for this. I just came across this one:

What would be a next step for this, probably a PEP would be required..?

btw, I now use global list/bytes objects pre-filled with zeros, and slice-assign those over the objects that need to be zero’d out. that’s simple and easy to optimize for any implementation (and doesn’t require 3.14/3.15). still it would be nicer to have a fill method.. :’-)

I suppose you meant to write seq = list[slice]? And then calling the extend method.

There are already several ways to do this. list(reversed(my_list)), would be the most readable of them, whilst my_list[::-1] would be the shortest.

I don’t really think, that a list.reverse is necessary, especially because the question of adding this to other sequences will be raised (probably even unsorted ones?!).

Edited the post. Meant to write:

del list[slice]   # truncate
seq.extend        # extend

Combination of these is pretty much resize

There is list.reverse. Let’s not remove it :slight_smile:

And rotate is not reverse. Think deque.rotate, but for sequences instead of linked lists (where operation is trivial in comparison). Here is a good overview on different approaches: GitHub - scandum/rotate: A collection of array rotation algorithms..

As I said - maybe one day.
As it is not necessity (and might never be, TBH).
The easiest solution is to call reverse 3 times.
While the fastest one can be ~2x faster than that.

Also, if list.reverse had start / stop arguments, then rotate solution with 3 x reverse would also be privy to this extension.

This is why I think these can be quite important in certain places - they propagate elegantly to derivative solutions, that in turn end up being both memory efficient and performant.

Thus, I take opportunity to stress this where appropriate.

It makes life easier to know that these arguments can be expected in all appropriate places.
And it can be difficult to add them later as people sort out something good enough for themselves and don’t care much about the stream of people continuously facing the same sharp edge.

E.g. re.match vs re.compile('').match - Top-level functions of re.module should support the string indexing pos/endpos arguments · Issue #113304 · python/cpython · GitHub.

  1. At first I was annoyed and confused why these aren’t consistent.
  2. Then I got used to compiling when I need these.
  3. By now I just compile pretty much everything to enjoy smooth sailing.

Oh yeah, forgot about that…

Yeah, adding arguments should be trivial in that matter.

It might actually be too late to add more arguments to such functions, but it’s worth to give it a try anyways, no?

The code you shared does make more sense with del, now that I think about it.

1 Like

I am mildly surprised that it is so much faster than using itertools.repeat.

import itertools
global_fill = [0] * 100_000
seq = [1] * 100_000
n = len(seq)
%timeit seq[:] = [0] * n                # 220 µs
%timeit seq[:] = global_fill[:n]        # 266 µs
%timeit seq[:] = itertools.repeat(0, n) # 435 µs

However, it seems to be slightly slower than current standard solution, which does not require keeping global list. Not to mention the fact that you might need many such global lists for each fill value.


I don’t think any of the current solutions can beat potential seq.fill.

%timeit fill(seq, 0)               #  90 µs

There is one possible alternative, which would not bloat sequence method list (and instead bloat builtin (or some other low level) namespace :)), while keeping same performance and have start / stop implicitly:

class Scalar:
    def __init__(self, value):
        self.value = value

seq[slice] = Scalar(0)

# E.g.:
a = [1] * 10
a[2:9] = Scalar([])
print(a)
# [1, [], [], [], [], [], [], [], [], [], 1]

This could be fairly useful primitive in sequence space in general.