Question about asdl_seq memory allocation in CPython

I was looking at CPython’s parser implementation and found something puzzling in the ASDL sequence constructor macro.

Here’s the macro in pycore_asdl.h:

#define GENERATE_ASDL_SEQ_CONSTRUCTOR(NAME, TYPE) \
asdl_ ## NAME ## _seq *_Py_asdl_ ## NAME ## _seq_new(Py_ssize_t size, PyArena *arena) \
{ \
    asdl_ ## NAME ## _seq *seq = NULL; \
    size_t n; \
    /* check size is sane */ \
    if (size < 0 || \
        (size && (((size_t)size - 1) > (SIZE_MAX / sizeof(void *))))) { \
        PyErr_NoMemory(); \
        return NULL; \
    } \
    n = (size ? (sizeof(TYPE *) * (size - 1)) : 0); \
    /* check if size can be added safely */ \
    if (n > SIZE_MAX - sizeof(asdl_ ## NAME ## _seq)) { \
        PyErr_NoMemory(); \
        return NULL; \
    } \
    n += sizeof(asdl_ ## NAME ## _seq); \
    seq = (asdl_ ## NAME ## _seq *)_PyArena_Malloc(arena, n); \
    if (!seq) { \
        PyErr_NoMemory(); \
        return NULL; \
    } \
    memset(seq, 0, n); \
    seq->size = size; \
    seq->elements = (void**)seq->typed_elements; \
    return seq; \
}

// ...
#define asdl_seq_GET_UNTYPED(S, I) _Py_RVALUE((S)->elements[(I)])
#define asdl_seq_GET(S, I) _Py_RVALUE((S)->typed_elements[(I)])
#define asdl_seq_LEN(S) _Py_RVALUE(((S) == NULL ? 0 : (S)->size))

And here is the structure definitions, taking asdl_int_seq for example:

// pycore_asdl.h
typedef struct {
    _ASDL_SEQ_HEAD
    int typed_elements[1];
} asdl_int_seq;

// asdl.c
GENERATE_ASDL_SEQ_CONSTRUCTOR(int, int);

My understanding:

  1. The typed_elements[1] field marks the start of variable-length storage. This seems to be a pattern called flexible array member in C.
  2. The macro allocates space for the struct header plus additional elements. Since there is already an element in the header, only size-1 extra elements have to be allocated.
  3. When accessing elements via asdl_seq_GET, we get the actual value of type TYPE, not a pointer. So for example:
asdl_int_seq *s;
int x = asdl_seq_GET(s, 0);

What confuses me is sizeof(TYPE *) in the line n = (size ? (sizeof(TYPE *) * (size - 1)) : 0); \. Since we’re storing actual values of type TYPE (not pointers to TYPE), why shouldn’t this be sizeof(TYPE) * (size - 1)?

The asdl_int_seq struct contains _ASDL_SEQ_HEAD followed by 1 int element.

The allocation code calculates the size for n int elements to be sizeof(asdl_int_seq) + (n - 1) * sizeof(int).

That is exactly what I thought. But the expanded macro looks like this:


So instead of sizeof(asdl_int_seq) + (size-1)*sizeof(int), the allocated size is sizeof(asdl_int_seq) + (size-1)*sizeof(int*), which I think would not be the same in general.

Ah, I see what you mean.

For int it might not be a problem as it’s likely that sizeof(int*) >= sizeof(int), but for types in general…