Allocating memory for ints with the C API

Hello dear people,

I’m trying to write a C extensions for Python and I can’t seem to understand how to allocate memory for a PyObject* of type PyLong_Type.

I have this function where I allocate a new node of a dictionary (not the builtin) data structure.

// Allocate a new node given the types of the key and value.
static SSDictNode *SSDictNode_new(PyTypeObject *key_type,
                                  PyTypeObject *value_type) {
  SSDictNode *self = (SSDictNode *)PyMem_RawMalloc(sizeof(SSDictNode));
  if (self == NULL) {
    // Memory allocation request failed, do something.
  }
  self->key = _PyObject_New(key_type);
  self->key_hash = -1; // init value, a real hash could never be -1 
                       //  because the hashing function PyObject_Hash
                      // returns -1 on failure
  self->value = _PyObject_New(value_type);

  return self;
}

In the above example I’m using _PyObject_New, which I’m not sure If I’m supposed to, but I’ve also tried with key_type->tp-alloc(key_type, 0) following this example. In both cases, the function _PyObject_InitVar is called and I see from the source code that there is an assertion assert(typeobj != &PyLong_Type) that prevents the allocation of int types.

static inline void
_PyObject_InitVar(PyVarObject *op, PyTypeObject *typeobj, Py_ssize_t size)
{
    assert(op != NULL);
    assert(typeobj != &PyLong_Type);
    _PyObject_Init((PyObject *)op, typeobj);
    Py_SET_SIZE(op, size);
}

The whole flow works if I use strings instead of ints when calling from Python.

TLDR: How am I supposed to allocate memory for Python ints, i.e. PyLong_Type with the C API?

You usually create Python int objects with PyLong_FromSsize_t(Py_ssize_t v) and similar.

You can’t really separate the memory allocation from the rest of the initialization – for one, the memory size needed for a particular value is an internal implementation detail.
So, yes, use a function like PyLong_FromLong, PyLong_FromString or PyLong_FromNativeBytes.

Thank you for the response.

I managed to get things through by implementing the below. I allocated the int as a reference to the zero int object, then on initialization I decref the zero int object and allocate the given int.

// Allocate a new node given the types of the key and value.
static SSDictNode *SSDictNode_new(PyTypeObject *key_type,
                                  PyTypeObject *value_type) {
  SSDictNode *self = (SSDictNode *)PyMem_RawMalloc(sizeof(SSDictNode));
  if (self == NULL) {
    PyErr_SetString(PyExc_MemoryError,
                    "failed to allocate memory for SSDict node");
    return NULL;
  }

  self->key = _allocate_pyobject(key_type);
  self->key_hash = -1; // init value, a real hash could never be -1 
                                   // because the function PyObject_Hash 
                                  // returns -1 on failure
  self->value = _allocate_pyobject(value_type);

  return self;
}

Where _allocate_pyobject is:

// Allocate an object of the given type.
PyObject *_allocate_pyobject(PyTypeObject *type) {
  if (type == &PyLong_Type) {
    // Allocation of ints requires their value to be known. Use this temp 
   // value now and later on (on initialization) decref it and allocate
   //  for the new value.
    PyObject *temp_value = PyLong_FromLong(0);

    return PyLong_FromSsize_t(PyNumber_AsSsize_t(temp_value, NULL));
  }

  return type->tp_alloc(type, 0);
}

So on initialization of the structure (SSDictNode) that holds the int:

// Initialize the values of the node. Increments the reference to the key 
// and value.
static void SSDictNode__init__(SSDictNode *self, PyObject *key,
                               PyObject *value) {
  Py_hash_t hash = PyObject_Hash(key);

  if (hash == -1) {
    PyObject_HashNotImplemented(key);
    return;
  }

  if (Py_TYPE(key) == &PyLong_Type) {
    // Integer values are not actually allocated on calling 
   // SSDictNode_new__ because you have to know the actual value.
  // Remove (decref) the temporary value set on allocation and set the
 // new actual value.
    Py_DECREF(self->key);
    self->key = PyLong_FromSsize_t(PyNumber_AsSsize_t(key, NULL));
  }
  *(self->key) = *key;
  Py_INCREF(key);

  self->key_hash = hash;

  if (Py_TYPE(value) == &PyLong_Type) {
    Py_DECREF(self->value);
    self->value = PyLong_FromSsize_t(PyNumber_AsSsize_t(value, NULL));
  }
  *(self->value) = *value;
  Py_INCREF(value);
}

Why are you trying to allocate objects manually? That really can’t be done safely in general, since it’s up to a type as to how large it needs to be, and how it needs to be initialised. For booleans and None it’s outright not allowed to make new instances. Also you’re trying to copy the contents of a PyObject, that’s also very not safe. That’s only going to copy the header, you don’t know the full size of the object, or if it would be valid to do a direct copy like that.

What you should be doing is having the objects be created outside this function, then passing them in. If you want the members to always be valid, combine new/init into one function, so that you can allocate and immediately assign the correct values. If you really need a copy to guard against mutation, the only robust solution would be to import and call copy.copy. Even then there’s still plenty of objects that cannot be copied, for example open files.

3 Likes

Thanks @TeamSpen210 for the clarifications. I know understand that I actually do not have to allocate anything (Python will handle that), merely store the pointers to the PyObjects.


// Allocate a new node given the types of the key and value.
SSDictNode *SSDictNode_new() {
  SSDictNode *self = (SSDictNode *)PyMem_RawMalloc(sizeof(SSDictNode));
  if (self == NULL) {
    PyErr_SetString(PyExc_MemoryError,
                    "failed to allocate memory for SSDict node");
    return NULL;
  }

  self->key = NULL;
  self->key_hash = -1; // init value, a real hash could never be -1 because the
                       // hashing function PyObject_Hash returns -1 on failure
  self->value = NULL;

  return self;
}

// Initialize the values of the node. Increments the reference to the key and
// value.
void SSDictNode__init__(SSDictNode *self, PyObject *key, PyObject *value) {
  Py_hash_t hash = PyObject_Hash(key);

  if (hash == -1) {
    PyObject_HashNotImplemented(key);
    return;
  }

  self->key = key;
  self->key_hash = hash;
  self->value = value;
}