C API design goal: hide implementation details, convert macros to regular functions, avoid memset() or errno in header files, etc

vstinner · March 24, 2023, 1:24pm

Hi,

I would like to know if you agree with me with the general guidelines: hide as many implementation details as possible in the C API header files to make the Python C API usable by the majority of users. I made many changes in last years, but I got backfire on some of them so I had to write PEP 670 (accepted) and PEP 674 (rejected) for example.

I’m working on the C API to hide “implementation details”. Here I would like to talk about changes to avoid using errno or <string.h> functions like memset() in the header files, but only used them in the implementation of regular functions. It’s related to PEP 670 which “advices” converting macros to static inline functions, or even regular functions.

Since Python 3.11, <Python.h> no longer includes the following header files in the limited C API version 3.11 or newer:

stdlib.h (ex: exit())
stdio.h (ex: FILE*)
errno.h (ex: errno)
string.h (ex: memcpy())

The intent is to make the C API to be used as easily as possible by the majority of use cases, to reduce the dependencing on an exact C/C++ compiler or on specific C/C++ compiler flags. And also to be able to use the whole C API in programming languages others than C and C++, like Rust.

In the past, the worst example for me was when atomic variables (“pyatomic.h”) were exposed as part of the C API: it causes many troubles when the C API was used in C++, or with some C compilers. The fix was to remove atomic variables from the C API (move them to the internal C API).

A recent concrete problem is related to my Py_CLEAR() macro fix to avoid the duplication of side effects: issue #98724. My fix uses memset() in the header files, whereas the limited C API no longer includes <string.h> on purpose.

I proposed a fix which implements the macro with a call to a regular function which calls the memset() function: PR #100121. Thomas Wouters proposed instead to revert the fix to keep the bug (duplication of side effects).

The Py_CLEAR() problem is very tricky since my first incorrect fix leaded to a strict aliasing bug which miscompiled Python and so caused crashes! It took me a while to dig into the machine code to fully understand this very tricky compiler issue. The problem is called type punning and is related to strict a aliasing.

It seems like other core devs like Guido van Rossum prefers to fix the bug (don’t duplicate side effects), but I don’t see a very clear consensus here, since some discussions were far from the issue, and not in the public space.

Victor

steve.dower · March 24, 2023, 1:36pm

This seems to me to be an easy “yes, of course”. As I understand it, it’s already a bug to use C library functions without including the correct header yourself (that is, by including a different C library header), and there’s no reason why Python.h should be assumed to provide any of them either.

Getting “clean” across all the different implementations is no trivial task, unfortunately. It’s entirely likely that we’re relying on some of these ourselves, and they’ll show up on platforms we don’t regularly test.

I lean towards a bit of planning and careful timing to do a complete cleanup - the fixes that users would have to do should apply just fine to earlier versions.

And yeah, that includes the Py_CLEAR change. Doubling side-effects is just embarrassing, it’s basic macro cleanliness that we messed up and should just fix.^[1]

I have no idea who messed it up, FWIW, so I don’t mean that as being directed at anyone in particular. Many of us had the chance to review and notice it and we didn’t, so by now it’s a collective mess-up ↩︎

encukou · March 27, 2023, 10:53am

I agree this is good general practice. New code should use functions rather than macros (and perhaps additionally static inline functions, but with an exported function available for other languages).

However, for existing code, I think we should only change it if there’s also a different reason to do it. Any change can have unforeseen consequences, and I don’t think the risks are worth it just for clean-up.

vstinner · March 30, 2023, 12:44pm

For Py_CLEAR() and Py_SETREF(), the “different reason” is that these macros have bugs: they have bad side effects. See Macro Py_CLEAR references argument two times. · Issue #98724 · python/cpython · GitHub for the rationale.

encukou · March 31, 2023, 11:42am

I don’t think the side effects are particularly bad – as a C programmer I’m not surprised if a macro evaluates its argument twice. It’s not ideal, sure, but I wouldn’t call it broken.

In this case, looks like the cure is worse than the disease. That wasn’t clear before you started fixing it, of course – back then there was just the risk of the unforeseen consequences.

storchaka · September 12, 2023, 9:03am

I disagree with converting macros to regular functions.

Not all macros can be converted to regular functions, for example you cannot convert Py_BEGIN_ALLOW_THREADS and Py_MAX to regular functions, for different reasons.

And Py_CLEAR is one of macros which should remain a macro, because while it is possible to make it a wrapper around an inline function, you can lose type safety and performance. You only creates problems which you need to fight.

Py_MAX has the same “bug” as Py_CLEAR, but I do not think that it is a bug.

vstinner · September 12, 2023, 11:18pm

When we wrote PEP 670, we were not aware of subtle strict aliasing and type punning issues when converting macros to functions (regular functions or static inline functions). I suggest updating PEP 670 to describe these issues and explain which macros should remain macros, even if they are affected by “Macro Pitfalls” issues, they are “convenient” to use.

The trend for the C API, especially in the limited C API, is to provide an alternative for programming languages which cannot use macros or static inline functions. For Py_CLEAR(), I don’t think that an alternative is need. It can be reimplemented in other programming languages (which cannot use the macro), no? I would prefer to offer Py_SetRef() and Py_Clear() functions, but strict aliasing and type punning make such API hard to design… If only all Python objects would be opaque PyObject*!

Some macros have corner cases like Py_CLEAR()/Py_SETREF(), Py_MAX(), Py_TOUPPER() and PyUnicode_KIND().

cc @erlendaasland