Hi! In “Objects / listobject.c”, list_resize(…) function is defined (which “over-allocates proportional to the list size, making room for additional growth”).
The body of the function includes the formula …
new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;
… and the “growth pattern”:
* The growth pattern is: 0, 4, 8, 16, 24, 32, 40, 52, 64, 76, ...
I thought that it would be interesting to write a program which would test these two (interconnected) things, …
import sys
MAX_APPENDINGS = 130
W = 5 # column width
WORD_SIZE = 8 # bytes
FACTOR = 4 # for testing alignment
a = []
HEADER_SIZE = prev_size = sys.getsizeof(a)
n_cells = 0
print('Theory | Actual')
for i in range(MAX_APPENDINGS):
a.append(0)
if (size := sys.getsizeof(a)) != prev_size:
print(f'{' ' * W if not i or theor == n_cells
else theor:{W}}{n_cells:{W}}')
theor = n_cells + (n_cells >> 3) + 6 & ~3
n_cells = (size - HEADER_SIZE) // WORD_SIZE
prev_size = size
if theor % FACTOR or n_cells % FACTOR:
raise Exception(
f'Some numbers are not multiples of {FACTOR}')
… and here is its output:
Theory | Actual
0
4
8
12 16
24
32
40
48 52
64
76
88 92
108
124 128
In this table, left column, “Theory”, shows results of applying a formula, and the right one, “Actual”, — the result of the immediate working of the program. Numbers in “Theory” are printed only in cases when they do not coincide with the numbers in “Actual”.
So as we can see,
— on the one hand, the first ten numbers in “Actual” coincide with the sequence quoted above (“growth pattern”), which demonstrates that in certain respects our program works correctly, but
— on another hand, although “Theory” often coincides with “Actual”, sometimes it does not. Why?
Since in those cases when the numbers do not coincide, we see that Actual [i] == Theory [i] + 4, the first version which came to my mind is that this is related to the alignment, but when I looked at the numbers more carefully, it turned out not to be the case (although I’m not sure about that), because:
— all the numbers are multiples of 4, but not all of them are multiples of 8. This is tested in the last 3 lines of the program: if FACTOR == 4, exception is not generated, but if FACTOR == 8, exception is generated;
— if our hypothesis is correct, we should expect that the “Actual” numbers are better “aligned” than the “Theory” numbers, — in other words, we expect that in those cases when Actual [i] != Theory [i], Actual [i] is multiple of 8 whereas Theory [i] is not. But this is not always the case. Sometimes this is the case [in the cases of (12, 6) and (124, 128)], but some this is not the case [in the cases of (48, 52) and (88, 92)].
Finally, here are some comments on the code: size and prevsize are sizes of the list measured in bytes, whereas n_cells and theor are, respectively, actual and theoretical numbers of cells. Each cell consists of WORD_SIZE bytes. The list object consists of the header of the constant size (HEADER_SIZE) and of the dynamic part consisting of n_cells cells. n_cells is not the same as the len() of list, because resizing of list involves **over-**allocation of memory, so as to make these resizings occurring seldom. The last several sentences reflect my understanding of the internals of the lists in Python, which may be imprecise, but upon which the program is based.
So to repeat, the question is that I don’t fully understand how the formula words, or, in other words, why the numbers from the “Theory” column of the table do not always coincide with the ones from the “Actual”.