Testing formula in list_resize(…)

Hi! In “Objects / listobject.c”, list_resize(…) function is defined (which “over-allocates proportional to the list size, making room for additional growth”).

The body of the function includes the formula

new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;

… and the “growth pattern”:

* The growth pattern is: 0, 4, 8, 16, 24, 32, 40, 52, 64, 76, ...

I thought that it would be interesting to write a program which would test these two (interconnected) things, …

import sys

MAX_APPENDINGS = 130
W = 5                   # column width
WORD_SIZE = 8           # bytes
FACTOR = 4              # for testing alignment

a = []
HEADER_SIZE = prev_size = sys.getsizeof(a)
n_cells = 0

print('Theory | Actual')

for i in range(MAX_APPENDINGS):
    a.append(0)
    if (size := sys.getsizeof(a)) != prev_size:
        print(f'{' ' * W if not i or theor == n_cells
                    else theor:{W}}{n_cells:{W}}')
        theor = n_cells + (n_cells >> 3) + 6 & ~3
        n_cells = (size - HEADER_SIZE) // WORD_SIZE
        prev_size = size
        if theor % FACTOR or n_cells % FACTOR:
            raise Exception(
                f'Some numbers are not multiples of {FACTOR}')

… and here is its output:

Theory | Actual
         0
         4
         8
   12   16
        24
        32
        40
   48   52
        64
        76
   88   92
       108
  124  128

In this table, left column, “Theory”, shows results of applying a formula, and the right one, “Actual”, — the result of the immediate working of the program. Numbers in “Theory” are printed only in cases when they do not coincide with the numbers in “Actual”.

So as we can see,

— on the one hand, the first ten numbers in “Actual” coincide with the sequence quoted above (“growth pattern”), which demonstrates that in certain respects our program works correctly, but

— on another hand, although “Theory” often coincides with “Actual”, sometimes it does not. Why?

Since in those cases when the numbers do not coincide, we see that Actual [i] == Theory [i] + 4, the first version which came to my mind is that this is related to the alignment, but when I looked at the numbers more carefully, it turned out not to be the case (although I’m not sure about that), because:

— all the numbers are multiples of 4, but not all of them are multiples of 8. This is tested in the last 3 lines of the program: if FACTOR == 4, exception is not generated, but if FACTOR == 8, exception is generated;

— if our hypothesis is correct, we should expect that the “Actual” numbers are better “aligned” than the “Theory” numbers, — in other words, we expect that in those cases when Actual [i] != Theory [i], Actual [i] is multiple of 8 whereas Theory [i] is not. But this is not always the case. Sometimes this is the case [in the cases of (12, 6) and (124, 128)], but some this is not the case [in the cases of (48, 52) and (88, 92)].

Finally, here are some comments on the code: size and prevsize are sizes of the list measured in bytes, whereas n_cells and theor are, respectively, actual and theoretical numbers of cells. Each cell consists of WORD_SIZE bytes. The list object consists of the header of the constant size (HEADER_SIZE) and of the dynamic part consisting of n_cells cells. n_cells is not the same as the len() of list, because resizing of list involves **over-**allocation of memory, so as to make these resizings occurring seldom. The last several sentences reflect my understanding of the internals of the lists in Python, which may be imprecise, but upon which the program is based.

So to repeat, the question is that I don’t fully understand how the formula words, or, in other words, why the numbers from the “Theory” column of the table do not always coincide with the ones from the “Actual”.

n_cells isn’t newsize. Use len(a):

theor = len(a) + (len(a) >> 3) + 6 & ~3
2 Likes

Thank you, Stefan; indeed, now the program works correctly.

… however, I think that your reply can be slightly improved:

since at those moments when theor is being computed, len(a) == n_cells + 1 (but, of course, this is not true at the other moments), we can replace, in the formula proposed by you, len(a) with n_cells + 1, and then we can simplify the resulting formula in the following way:

taking into account that n_cells is always even [not to say that it is multiple of 4 (which is achieved by … & ~3)], we get …

n_cells + 1 >> 3 == n_cells >> 3

… and, of course,

n_cells + 1 + (…) + 6 … == n_cells + (…) + 7 …

So the final formula is:

theor = n_cells + (n_cells >> 3) + 7 & ~3

… which differs from the original one just by replacement of 6 with 7.

This formula is slightly better that the one proposed by you for the following two reasons: 1) it computes slightly faster, 2) it is simpler from the mathematical point of view: we can analyze the behavior of list_resize(…) by means of a simple recurrent formula, without the need to actually construct any list objects, and so, for instance, we get the “growth pattern” in the following simple way:

n = 0
for _ in range(9):
    print(n := n + (n+1 >> 3) + 7 & ~3,
        end = ' ')

I think that’s worse, not better. Speed doesn’t matter here, and clarity suffers. You now made two modifications to the actual formula that cancel out, and you had to write some proof that they do cancel out.

I agree. If our purpose is just to test something experimentally, then it’s better to refrain from subsequent mathematical manipulations, because, even if they really simplify something, they may contain errors. So from this perspective, my previous message was off-topic, sorry.