Define module constants in PyModuleDef

Over the past few releases the module system has seen great improvements. Kudos to everybody who was involved in the design and implementation of PyModuleDef and multiphase module implementation.

There is one issue, which has not been addressed yet and that has annoying me for years: module level constants. The module API has helpers like PyModule_AddStringConstant and PyModule_AddStringMacro, but they are tedious to use. A common problem is also the lack of error checking. Some stdlib modules donā€™t verify the return values yet.

I came up with a proof of concept for a simpler and more readable approach. The idea is inspired by structmember. Module constants are declared in a new PyModuleConstants_Def array. The array is part of PyModuleDef and initialized by the module initialization framework. The PoC defines a couple of types (int, str, None, boolean, float, function call) and helper macros.

typedef struct PyModuleConstants_Def {
    const char *name;
    int type;
    union {
        const char *m_str;
        long m_long;
        double m_double;
        PyObject* (*m_call)(void);
    } value;
} PyModuleConstants_Def;

posix module example:

static PyModuleConstants_Def _posix_constants[] = {
#ifdef F_OK
    PyMC_LongMacro(F_OK),
#endif
    ...
    {NULL, 0},
};

math module example:

static PyObject*
m_inf_o(void)
{
    return PyFloat_FromDouble(m_inf());
}

static PyModuleConstants_Def math_constants[] = {
    PyMC_Double("pi", Py_MATH_PI),
    PyMC_Double("e", Py_MATH_E),
    PyMC_Double("tau", Py_MATH_TAU),  // 2pi
    PyMC_Call("inf", m_inf_o),
#if !defined(PY_NO_SHORT_FLOAT_REPR) || defined(Py_NAN)
    PyMC_Call("nan", m_nan_o),
#endif
    {NULL, 0},
};

What do you think about the approach?

3 Likes

I like it. Some modules expose myriads constant. You need to write

if (PyModule_AddIntConstant(m, "P_WAIT", _P_WAIT)) return -1;
if (PyModule_AddIntConstant(m, "P_NOWAIT", _P_NOWAIT)) return -1;
if (PyModule_AddIntConstant(m, "P_NOWAITO", _P_NOWAITO)) return -1;

Or use macros and define variations of the following macros in every file:

#define ADD_INT(name, value) \
    do { \
        if (PyModule_AddIntConstant(m, name, value) < 0) { \
            return -1; \
        } \
    } while (0)

In some file you need return -1;, in others Py_DECREF(m); return NULL;, and in others goto error;.

With this proposition we will have standard macros available in every file and do not bother error handling.

Some attributes will still be added manually using PyModule_AddObject() or better its fixed variant PyModule_Add() (https://bugs.python.org/issue42327).

1 Like

I identified two more common patterns in module initialization: types and exceptions. My PR contains two helper functions:

PyTypeObject *
PyModule_AddNewTypeFromSpec(PyObject *module, PyType_Spec *spec,
                            PyObject *base)
PyObject *
PyModule_AddNewException(PyObject *module, const char *name, const char *doc,
                         PyObject *base, PyObject *dict)

PyModule_AddNewTypeFromSpec() is PyType_FromModuleAndSpec() + PyModule_AddType(). It also handles non-tuple bases like PyErr_NewException(). In most cases the base argument is either NULL or a single-item tuple. The extra check gets rid of tuple packing in the caller.

PyModule_AddNewException() is PyErr_NewExceptionWithDoc() + PyModule_AddObjectRef() with correct short name. PyErr_NewException*() takes a qualified name but PyModule_Add*() expects a short name.

Example code:

    _hashlibstate *state = get_hashlib_state(module);

    state->EVPtype = PyModule_AddNewTypeFromSpec(
        module, &EVPtype_spec, NULL);
    return state->EVPtype == NULL ? -1 : 0;
    PySSLErrorObject = (PyObject *)PyModule_AddNewTypeFromSpec(
        m, &sslerror_type_spec, PyExc_OSError);
   ...
    PySSLZeroReturnErrorObject = PyModule_AddNewException(
        m, "ssl.SSLZeroReturnError", SSLZeroReturnError_doc,
        PySSLErrorObject, NULL);

Thanks for taking this one! Generally looks OK. But:

  • PyModuleDef is part of the stable ABI; we canā€™t just add a new member to it and expect itā€™ll be NULL if unset. Please add a PyModule_AddConstants(PyObject *module, PyModuleConstants_Def *consts) instead. (Extending/versioning PyModuleDef is a can of worms weā€™ll have to open eventually, but I donā€™t think itā€™s necessary right now.)
  • Could we use e.g. PyModuleConst_ rather than PyModuleConstants_/PyMC_?

This week, my CPython day unfortunately falls on a national holiday. Next week I should have time to review more thoroughly.

Thanks Petr!

I have renamed new API functions and constants to use PyModuleConst_ prefix.

I see, stable ABI makes this tricky. How can we safely handle new struct members then? Or should I exclude the new feature from the stable ABI for now?

Add a PyModule_AddConstants function. It will be useful even if/when PyModuleDef is extended ā€“ we already have functions like PyModule_AddFunctions that let you add only specific parts of the PyModuleDef.
I donā€™t see a good way of excluding the feature from the stable ABI.

To ā€žextendā€œ PyModuleDef, IMO weā€™ll need to add a new struct with a version number + some macro(s) to comfortably/automatically fill that version number, and new function(s) that use that new struct.

I have another idea. There is no need to extend the PyModuleDef struct. We could also use PyModuleDef_Slot with a new slot type Py_mod_constants. This avoids an ABI mismatches.

My PoC PR now contains PyModuleConst_ prefix, the requested PyModule_AddConstants function, and Py_mod_constants instead of PyModuleDef.m_constants.

1 Like

Please donā€™t. PyModuleDef_Slot is for functions. There are a few non-function slots, but in the long run Iā€™d like to deprecate them. Mixing data pointers and function pointers is not valid standard C.

You canā€™t deprecate them until we deprecate the entire limited API, at which point it can all change anyway.

Until then, weā€™re stuck with only supporting platforms that can mix data and function pointers, so we may as well use it.

1 Like

I think I can :ā€Œ) Expect a PEP by end of year. Sorry Iā€™m slow ā€“ not only is there a lot of parts to think about, but also writing in English takes me a lot of time.
The main idea is that it is the ABI that is stable; pieces of the limited API could be deprecated as long as the ABI still works. But of course, the devil is in the details.

Oh, and in case youā€™re interested, I do have a brainstorming repository for my PEP-to-be. Issues #9 and #14 there are relevant.

I mean, in as much as you can simply label it deprecated, sure. But that doesnā€™t change that you canā€™t do anything with how it behaves. So since weā€™re stuck with it, why make the rest of the API unnecessarily complicated?

Hm. I donā€™t think it makes the rest of the API necessarily complicated.
If you disagree, add the slot ā€“ itā€™s not my place to forbid that. Iā€™m just telling you is that it doesnā€™t fit in my long-term plans; if I stick around long enough, Iā€™ll want to remove it again :ā€Œ)

Letā€™s postpone the decision. The new APIs are useful and convenient without the slot. I have updated the PR yesterday and removed the slot.

Thatā€™s totally fine, I want to remove them too (before weā€™ve added them) :slight_smile:
But weā€™re not going to be able to until we break the API completely, so I donā€™t see why weā€™d make things more complicated in the meantime (in this case, ā€œmore complicatedā€ mean ā€œsome data objects provided one way, others provided in another way, even though they look like the same kind of thingā€).

ā€œMore complicatedā€ == one additional one-line function per module

I can live with that trade-off if it makes Petr happy. :heart:

Actually 3-4 lines with error handling. But I am fine with this.

No error handling required:

static int
math_init_constants(PyObject *module)
{
    return PyModule_AddConstants(module, math_constants);
}
    {Py_mod_exec, math_init_constants},

Accepting both a tuple and a single class without packing it into a 1-tuple in PyType_FromModuleAndSpec() is a good idea. Opened https://bugs.python.org/issue42423 for it.

1 Like

Iā€™m sure there are other viewpoints, but I generally consider the ergonomics of the C API less important than correctness. For ease of use, Iā€™d rather have people switch to Cython :ā€)
The API added here is good mainly because it encourages more correct code: it eliminates lots of boilerplate error checking, which is often missing/wrong in the stdlib.