Similar to Numpy slice assignment and C++ std:fill for example, it would be useful to be able to fill a list or bytearray object with a constant value/object. For example:
some_list[:] = 0
or:
some_list.fill(None, startpos, endpos)
The fastest method seems to be to just re-allocate the object, but that is not always desirable/convenient.
I’m personally trying to convert some Python code to C++ using Shedskin, which doesn’t use reference counting so re-allocations are undesirable. Using std:fill is also way faster than using a loop to update a list/bytearray element-by-element.
It may be the case that this is somewhat of a niche problem, although as mentioned Numpy/C++ at least also support this.
There is the somewhat obscure some_bytearray[:] = b'\x00' * len(some_bytearray). This isn’t ideal if the byte array is very large, but should be faster than other currently possible solutions.
to full assign / content swap as likely one of the fastest paths. The bytes repeat code is fairly efficiently.
In 3.15 should hopefully be even faster if the base object is a bytearray. In 3.14 the bytes object b'\0' * 100 constructs gets memcpy into the bytearray storage but in 3.15 if the bytes object is uniquely referenced the bytearray can jut “take” it and save an allocation + copy. In my perf testing the allocation + copy is over 10% of the runtime for the case (bytes repeat is quite quick).
fundamentally, doing an allocation, filling it, then copying over the contents, then getting rid of the allocated memory (refcounting or GC’ing) is not as fast as doing a single memset.. in fact it should be at least twice as slow I think?
this may only become visible once comparing it to an efficient C/C++ implementation, not to other (even slower) python methods..
also, considering (future) JIT compilation and/or non-cpython implementations (which may not use refcounting), it’s nice to have a path available that does not involve object allocations, since that often becomes the bottleneck once the code itself runs faster.
Not with the changes I’m working on which will hopefully be in 3.15 :). Already I’ve updated bytearray to internally store a bytes. With that can optimize operations (ex. GH-141862 for += (iconcat)) so when they are passed a bytes that is uniquely referenced it will just replace the bytearray storage with no memcpy or deallocation (if you’re familiar with C++ very similar to std:move).
More Details / Minutia (what's needed for single memset)
In the interpreter there’s work to make sure uninitialized memory isn’t exposed (aside: the C API doesn’t guarantee this). That means when an object is allocated its memory is filled with a value (ex. bytearray.__init__(count)). So to get a single-memset block of bytes with a specific fill the fill needs to be provided simultaneously the count. b'\0' * count provides that fill as well as the count which goes into an efficient implementation (_PyBytes_Repeat).
Definitely could do something like a bytearray(count=1024, fill=b'\xCD') but fast assign to unique reference bytes adds less complexity to me as it just makes existing operations/code more efficient
thank you for your work on cpython, and explanation! it won’t really help alternative python implementations though, which is why I was bringing this up.. so in the case of Shedskin, which doesn’t even use refcounting, I also need to do some special-case detection to make it run fast.
Yeah, that was the idea following bytearray example. But yeah, this does feel like trying to do more than it should and it doesn’t do it well.
Well, in this case, we already have resizing protocol:
del list[slice]
seq.extent
So it would probably be better to separate concerns and simply add seq.fill(None, startpos, endpos) as suggested by the OP.
seq[slice] = 0 is ambiguous. It works for numpy, but what if I want to fill list with lists? seq[slice] = [0] - this already has a meaning, which is different.
So I think if there is enough need, seq.fill(None, startpos, endpos) might be a good candidate.
Also, to add, if this ever happens, I would really like startpos, endpos from the beginning.
Things like this allow making actually performant stuff.
And these are quite unambiguous and easy to add.
If they are not added from the beginning, then people get used to not not doing it in as optimal path does not exist. And either make peace with something good enough or use 3rd party stuff.
And it is hard to add later as everyone is sort of happy.
This doesn’t apply to everything, but at least things that are of good performance could be made a bit more useful. E.g.:
a) Performant solutions due to significant effort - list.sort
b) Things that are easy to implement efficiently (e.g. only does array swaps) - list.reverse
c) Or something in the middle - list.rotate (maybe one day…)
btw, I now use global list/bytes objects pre-filled with zeros, and slice-assign those over the objects that need to be zero’d out. that’s simple and easy to optimize for any implementation (and doesn’t require 3.14/3.15). still it would be nicer to have a fill method.. :’-)
I suppose you meant to write seq = list[slice]? And then calling the extend method.
There are already several ways to do this. list(reversed(my_list)), would be the most readable of them, whilst my_list[::-1] would be the shortest.
I don’t really think, that a list.reverse is necessary, especially because the question of adding this to other sequences will be raised (probably even unsorted ones?!).
As I said - maybe one day.
As it is not necessity (and might never be, TBH).
The easiest solution is to call reverse 3 times.
While the fastest one can be ~2x faster than that.
Also, if list.reverse had start / stop arguments, then rotate solution with 3 x reverse would also be privy to this extension.
This is why I think these can be quite important in certain places - they propagate elegantly to derivative solutions, that in turn end up being both memory efficient and performant.
Thus, I take opportunity to stress this where appropriate.
It makes life easier to know that these arguments can be expected in all appropriate places.
And it can be difficult to add them later as people sort out something good enough for themselves and don’t care much about the stream of people continuously facing the same sharp edge.
However, it seems to be slightly slower than current standard solution, which does not require keeping global list. Not to mention the fact that you might need many such global lists for each fill value.
I don’t think any of the current solutions can beat potential seq.fill.
%timeit fill(seq, 0) # 90 µs
There is one possible alternative, which would not bloat sequence method list (and instead bloat builtin (or some other low level) namespace :)), while keeping same performance and have start / stop implicitly: