Define module constants in PyModuleDef

tiran · November 14, 2020, 4:09pm

Over the past few releases the module system has seen great improvements. Kudos to everybody who was involved in the design and implementation of PyModuleDef and multiphase module implementation.

There is one issue, which has not been addressed yet and that has annoying me for years: module level constants. The module API has helpers like PyModule_AddStringConstant and PyModule_AddStringMacro, but they are tedious to use. A common problem is also the lack of error checking. Some stdlib modules don’t verify the return values yet.

I came up with a proof of concept for a simpler and more readable approach. The idea is inspired by structmember. Module constants are declared in a new PyModuleConstants_Def array. The array is part of PyModuleDef and initialized by the module initialization framework. The PoC defines a couple of types (int, str, None, boolean, float, function call) and helper macros.

typedef struct PyModuleConstants_Def {
    const char *name;
    int type;
    union {
        const char *m_str;
        long m_long;
        double m_double;
        PyObject* (*m_call)(void);
    } value;
} PyModuleConstants_Def;

posix module example:

static PyModuleConstants_Def _posix_constants[] = {
#ifdef F_OK
    PyMC_LongMacro(F_OK),
#endif
    ...
    {NULL, 0},
};

math module example:

static PyObject*
m_inf_o(void)
{
    return PyFloat_FromDouble(m_inf());
}

static PyModuleConstants_Def math_constants[] = {
    PyMC_Double("pi", Py_MATH_PI),
    PyMC_Double("e", Py_MATH_E),
    PyMC_Double("tau", Py_MATH_TAU),  // 2pi
    PyMC_Call("inf", m_inf_o),
#if !defined(PY_NO_SHORT_FLOAT_REPR) || defined(Py_NAN)
    PyMC_Call("nan", m_nan_o),
#endif
    {NULL, 0},
};

What do you think about the approach?

storchaka · November 14, 2020, 8:30pm

I like it. Some modules expose myriads constant. You need to write

if (PyModule_AddIntConstant(m, "P_WAIT", _P_WAIT)) return -1;
if (PyModule_AddIntConstant(m, "P_NOWAIT", _P_NOWAIT)) return -1;
if (PyModule_AddIntConstant(m, "P_NOWAITO", _P_NOWAITO)) return -1;

Or use macros and define variations of the following macros in every file:

#define ADD_INT(name, value) \
    do { \
        if (PyModule_AddIntConstant(m, name, value) < 0) { \
            return -1; \
        } \
    } while (0)

In some file you need return -1;, in others Py_DECREF(m); return NULL;, and in others goto error;.

With this proposition we will have standard macros available in every file and do not bother error handling.

Some attributes will still be added manually using PyModule_AddObject() or better its fixed variant PyModule_Add() (https://bugs.python.org/issue42327).

tiran · November 15, 2020, 10:03pm

I identified two more common patterns in module initialization: types and exceptions. My PR contains two helper functions:

PyTypeObject *
PyModule_AddNewTypeFromSpec(PyObject *module, PyType_Spec *spec,
                            PyObject *base)
PyObject *
PyModule_AddNewException(PyObject *module, const char *name, const char *doc,
                         PyObject *base, PyObject *dict)

PyModule_AddNewTypeFromSpec() is PyType_FromModuleAndSpec() + PyModule_AddType(). It also handles non-tuple bases like PyErr_NewException(). In most cases the base argument is either NULL or a single-item tuple. The extra check gets rid of tuple packing in the caller.

PyModule_AddNewException() is PyErr_NewExceptionWithDoc() + PyModule_AddObjectRef() with correct short name. PyErr_NewException*() takes a qualified name but PyModule_Add*() expects a short name.

Example code:

    _hashlibstate *state = get_hashlib_state(module);

    state->EVPtype = PyModule_AddNewTypeFromSpec(
        module, &EVPtype_spec, NULL);
    return state->EVPtype == NULL ? -1 : 0;

    PySSLErrorObject = (PyObject *)PyModule_AddNewTypeFromSpec(
        m, &sslerror_type_spec, PyExc_OSError);
   ...
    PySSLZeroReturnErrorObject = PyModule_AddNewException(
        m, "ssl.SSLZeroReturnError", SSLZeroReturnError_doc,
        PySSLErrorObject, NULL);

encukou · November 16, 2020, 9:15am

Thanks for taking this one! Generally looks OK. But:

PyModuleDef is part of the stable ABI; we can’t just add a new member to it and expect it’ll be NULL if unset. Please add a PyModule_AddConstants(PyObject *module, PyModuleConstants_Def *consts) instead. (Extending/versioning PyModuleDef is a can of worms we’ll have to open eventually, but I don’t think it’s necessary right now.)
Could we use e.g. PyModuleConst_ rather than PyModuleConstants_/PyMC_?

This week, my CPython day unfortunately falls on a national holiday. Next week I should have time to review more thoroughly.

tiran · November 16, 2020, 10:26am

Thanks Petr!

I have renamed new API functions and constants to use PyModuleConst_ prefix.

I see, stable ABI makes this tricky. How can we safely handle new struct members then? Or should I exclude the new feature from the stable ABI for now?

encukou · November 16, 2020, 10:46am

Add a PyModule_AddConstants function. It will be useful even if/when PyModuleDef is extended – we already have functions like PyModule_AddFunctions that let you add only specific parts of the PyModuleDef.
I don’t see a good way of excluding the feature from the stable ABI.

To „extend“ PyModuleDef, IMO we’ll need to add a new struct with a version number + some macro(s) to comfortably/automatically fill that version number, and new function(s) that use that new struct.

tiran · November 16, 2020, 11:10am

I have another idea. There is no need to extend the PyModuleDef struct. We could also use PyModuleDef_Slot with a new slot type Py_mod_constants. This avoids an ABI mismatches.

My PoC PR now contains PyModuleConst_ prefix, the requested PyModule_AddConstants function, and Py_mod_constants instead of PyModuleDef.m_constants.

encukou · November 16, 2020, 11:35am

Please don’t. PyModuleDef_Slot is for functions. There are a few non-function slots, but in the long run I’d like to deprecate them. Mixing data pointers and function pointers is not valid standard C.

steve.dower · November 16, 2020, 2:40pm

You can’t deprecate them until we deprecate the entire limited API, at which point it can all change anyway.

Until then, we’re stuck with only supporting platforms that can mix data and function pointers, so we may as well use it.

encukou · November 16, 2020, 4:10pm

I think I can :‌) Expect a PEP by end of year. Sorry I’m slow – not only is there a lot of parts to think about, but also writing in English takes me a lot of time.
The main idea is that it is the ABI that is stable; pieces of the limited API could be deprecated as long as the ABI still works. But of course, the devil is in the details.

encukou · November 16, 2020, 4:16pm

Oh, and in case you’re interested, I do have a brainstorming repository for my PEP-to-be. Issues #9 and #14 there are relevant.

steve.dower · November 16, 2020, 11:15pm

I mean, in as much as you can simply label it deprecated, sure. But that doesn’t change that you can’t do anything with how it behaves. So since we’re stuck with it, why make the rest of the API unnecessarily complicated?

encukou · November 17, 2020, 7:18pm

Hm. I don’t think it makes the rest of the API necessarily complicated.
If you disagree, add the slot – it’s not my place to forbid that. I’m just telling you is that it doesn’t fit in my long-term plans; if I stick around long enough, I’ll want to remove it again :‌)

tiran · November 17, 2020, 7:34pm

Let’s postpone the decision. The new APIs are useful and convenient without the slot. I have updated the PR yesterday and removed the slot.

steve.dower · November 17, 2020, 8:56pm

That’s totally fine, I want to remove them too (before we’ve added them)
But we’re not going to be able to until we break the API completely, so I don’t see why we’d make things more complicated in the meantime (in this case, “more complicated” mean “some data objects provided one way, others provided in another way, even though they look like the same kind of thing”).

tiran · November 18, 2020, 11:39am

“More complicated” == one additional one-line function per module

I can live with that trade-off if it makes Petr happy.

storchaka · November 18, 2020, 5:44pm

Actually 3-4 lines with error handling. But I am fine with this.

tiran · November 18, 2020, 6:01pm

No error handling required:

static int
math_init_constants(PyObject *module)
{
    return PyModule_AddConstants(module, math_constants);
}

    {Py_mod_exec, math_init_constants},

storchaka · November 21, 2020, 10:37am

Accepting both a tuple and a single class without packing it into a 1-tuple in PyType_FromModuleAndSpec() is a good idea. Opened https://bugs.python.org/issue42423 for it.

encukou · November 24, 2020, 4:14pm

I’m sure there are other viewpoints, but I generally consider the ergonomics of the C API less important than correctness. For ease of use, I’d rather have people switch to Cython :‍)
The API added here is good mainly because it encourages more correct code: it eliminates lots of boilerplate error checking, which is often missing/wrong in the stdlib.