What to do with unsafe macros

Hi, everyone.

I created bpo-43502 as an attempt to remove the worst C macro pitfalls in the C API. For example, there are currently 88 macros that reuse their arguments under the Include/ directories. There are cases in the CPython code base where some of these are called with the increment operator, thus possibly executing the increment multiple times instead of once. One solution is to fix the callers. Another is to fix the macro, for example by implementing it fully or partially using static inline functions (or ordinary functions). The former is a quick fix, but does not fix the real problem. The latter fixes the real problem (the macro), but has multiple issues. Comments from @vstinner:

  • Backward incompatible C API changes may be introduced by mistake
  • New compiler warnings may be introduced (functions are typed, macros are not)

Comments from @pablogsal:

  • Macros transformed into function may or may not get inlined so there is a risk of affecting performance if we just transform them in bulk
  • Because the macro is defined as ‘static inline’ in the header file, then it can be inlined everywhere if the compiled decides to do it. It will be one copy symbol in the text segment on every compilation unit. Notice that ‘inline’ has a special meaning in C99 and it doesn’t only mean “inline this function”. It has visibility implications (see section 6.7.4 of the C99 standard).
  • There is risk to introducing a backwards-incompatible ABI change by mistake as Victor mentions.
  • This can affect also the link patterns of extension modules because these symbols may be visible in the resulting binaries when they were macro-level-inlined before. I cannot see any problem that can happen because of this but we should carefully consider it.

Comments from @rhettinger:
“Be careful about performance. We know for sure that macros will inline. In contrast, inline functions might or might not inline (there are rules for when it can be done). When applied across source files, inlining often only occurs with an LTO build — any other build can’t inline across object files. Historically, we’ve never had to depend on LTO, so that would be a regression for some users. We’ve already have a performance bug that needed to be reverted because of too aggressive conversion of macros to inline functions.”

Note: I do not propose getting rid of all macros, and I do not propose converting macros blindly; I only propose getting rid of obvious unsafe macros that are likely to be misused (for example, there’s an index argument) if the fix has no negative side-effects (see above).

Is it worth it pursuing this? If no, I’ll close the issue :slight_smile:

1 Like

I’m with Raymond on this one.

Macros should certainly be made safer (e.g. by adding extra () around arguments), where they aren’t already.

The “multiple arg use” situation doesn’t have to be a problem in practice, so I don’t think that this is a good argument for converting macros to functions across the board. It may be useful where using the argument multiple times is likely to have side effects.

Overall, use of macros always adds risk at the benefit of better performance; it’s a tradeoff. For inner loops, they are essential, since you cannot rely on compilers always inlining functions.

Finally, we should not forget that people using macros should be considered “consenting adults”. People who feel unsafe with macros should simply not use them.

3 Likes

Thanks Marc-André. Based on the feedback on bpo and your comment, I think I’ll close the issue. Incorrect usage of macros can be fixed in separate issues if needed.

1 Like

The idea that compilers may not be able to inline short “inline” functions is rather unsubstantiated. The one situation where this is true is when you compile without optimizations. :slight_smile: C++ performance depends heavily on inlining, so modern compilers are all able to inline aggressively at this point of their development. Yes, you can encounter situations where they don’t, and usually those are with very non-trivial functions.

Also I don’t understand where the LTO argument comes from. We’re talking about functions defined in header files, so the entire body of the function is visible at all times even without LTO.

3 Likes

The “inline” keyword only suggests to compilers that they may inline
a function. The decision is still at the compiler’s discretion. In
particular when optimizing code for space (e.g. for running in a container
or on embedded devices), the compiler may well decide not to inline,
in particular, when such inlined functions are nested.

What’s worse: some compilers don’t even tell you when they ignore
the “inline” keyword. E.g. MSVC defaults to not warning about this:

and Python even explicitly disables this warning unconditionally (for some
reason): cpython/pyport.h at master · python/cpython · GitHub

So detecting such failures to inline a function, which will cause
performance regressions, is hard.

With macros, the inlining is explicit and such regressions cannot
happen.

BTW: I do wonder why e.g. the inline code in object.h does not use
the macros from pyport.h.

1 Like

What’s important is not that they inline it at every call site, it’s that they inline it in places where it matters.

How is this a bad thing? Don’t ask the compiler to optimize for space if you don’t want it to.

But really, this is besides the point. The macros are of the form:

#define PyBytes_AS_STRING(op) (assert(PyBytes_Check(op)), (((PyBytesObject *)(op))->ob_sval))

In release mode (with assertions disabled), the inlined form is shorter than the non-inlined form (because of calling convention overhead). So when optimizing for size, the compiler has a good reason to inline this macro.

Neither clang, gcc nor MSVC will tell you that they didn’t inline a function. It’s not “worse”, it’s a feature. In most cases, it would be completely stupid to emit a warning when an inlinable function isn’t inlined.

2 Likes

Antoine, the idea behind using inline functions instead of macros is
to have the inline functions replace the macros and essentially
give the same compilation results as with macros – with the added
benefit of removing some of the pitfalls which macros have, such
as possible evaluating the arguments more than once or unintended
expansion which can result in malformed code.

What Raymond and I are saying is that they are not a perfect replacement,
since there are situations where the functions are actually called
and not inlined.

The keyword “inline” is only a hint, nothing more. It’s not an explicit
directive to inline the code as is using a macro.

By using inline function we no longer have control over how the code
is compiled, even though we want to have it inlined unconditionally –
otherwise we’d use the corresponding function call instead of the
macro.

It does work in many cases, but not always and especially in tight
loops or for very often used operations, it’s essential to keep
full control and not leave the decision to compilers.

1 Like

Sure. It would be necessary to run performance measurements to ensure that no significant regression can be observed. What’s not helpful is to oppose a change simply because it is not a “perfect replacement”.

It’s not essential at all. Lots of high-performance software is written that depends largely on the compiler’s willingness to inline small functions. CPython is probably on the “easy” side of the spectrum for compiler optimizers.

4 Likes

[pitrou] Antoine Pitrou https://discuss.python.org/u/pitrou pitrou CPython
core developer
March 20

malemburg:

What Raymond and I are saying is that they are not a perfect replacement,
since there are situations where the functions are actually called and not
inlined.

Sure. It would be necessary to run performance measurements to ensure that no
significant regression can be observed. What’s not helpful is to oppose a change
simply because it is not a “perfect replacement”.

But that’s not what I’m saying. I’m just outlining that it is not a
1-1 replacement and thus needs to be done carefully where it makes sense
and where macros can create problems.

I am opposing the blanket approach, which does not take this into
account.

malemburg:

especially in tight loops or for very often used operations, it’s essential
to keep full control and not leave the decision to compilers

It’s not essential at all. Lots of high-performance software is written that
depends largely on the compiler’s willingness to inline small functions. CPython
is probably on the “easy” side of the spectrum for compiler optimizers.

You are always thinking of code which is optimized for speed. There
are use cases, where space is important as well (such as the ones
I mentioned). In those cases, we still don’t want Python to turn into
a lame duck :slight_smile:

1 Like

I already answered to this :wink:

1 Like

Maybe I’m misinterpreting you, but I’m pretty sure I didn’t suggest a «blanket approach», neither on BPO, nor here. If it was mistaken for that, I’ll try to express myself clearer and less ambiguous in future post.

@pitrou, thanks for your input; I’m aligned with your arguments.

FYI: I’ve closed the issue as “won’t fix”.

I don’t understand the discussion about inlining. It’s now a very standard feature of C compilers and compilers provide many options to control inlining. Are you saying that C compilers are dumb and produce the less efficient code than finely tuned C code written manually? For me, this is FUD and I would like to see benchmarks numbers on real code.

We had a long discussion about the “always inline” hint on static inline functions. I looked at the machine code and how compilers decide to inline or not. In short, using “always inline” is a bad idea (or at least, it’s not needed).

Discussion: Issue 35059: Convert Py_INCREF() and PyObject_INIT() to inlined functions - Python tracker

Py_INCREF() and Py_DECREF() are critical for performance and were converted to static inline function in Python 3.8. There was no significant impact (good or bad) on performance according to the pyperformance benchmark suite.

IMO compilers know way better than me how to produce the most efficient x86-64 machine code. Deep inlining (multiple nested functions) makes the function larger and increases the number of variables, more variables means less available registers. Function size and variables are parameters which are very well understood by compilers which are very likely smarter than me :slight_smile:

Moreover, Python is built with PGO: the compiler uses real profiling data to take even smarter decisions. I don’t think that developers can usually guess these data.

Well, I tried to mark manually “hot code” using __attribute__((hot)) (_Py_HOT_FUNCTION macro), but I failed to produce code as efficient as GCC PGO: Issue 28618: Decorate hot functions using __attribute__((hot)) to optimize Python - Python tracker (my intent was to avoid slow PGO build and try to get reproducible performance).

Recently, I discovered that GCC is now able to emit a specialized flavor of PyTuple_New(n) for n=0 (.constprop suffix), I was impressed! There is also variants with .isra suffix:

Perform interprocedural scalar replacement of aggregates, removal of unused parameters and replacement of parameters passed by reference by parameters passed by value.

My advice is to forget all your assumptions and always run benchmarks. Only trust benchmarks.

By the way, read also this interesting article about the relationship between code size and performance (gcc -Os): Rethinking optimization for size [LWN.net] Smaller code does not always mean faster :wink:

3 Likes

Thanks for pitching in, @vstinner!

+1!

If someone else wants to push this, I’ll be happy to contribute. Maybe a PEP is the best approach :slight_smile:

My advice is to forget all your assumptions and always run benchmarks. Only trust benchmarks.

Those benchmarks should be executed as well in debug builds then. Macros will ALWAYS be inlined because that’s their nature, even if you compile with -O0, while inline will never get inlined in -O0. Is unclear what will happen with -Os or even -O1. Is also unclear how different versions of clang, gcc, xlc, cc, mscc will behave in these situations. With macros ALL OF THEM will inline.

Take into account that’s only true in general for Linux builds. PGO is more challenging on other platforms or even other compilers. That should not be used as a generalistic argument as we support many places where PGO doesn’t work that well or even doesn’t work at all.

1 Like

Why is -Os relevant? I thought we were talking about production builds for desktops/laptops. Embedded is a different world, where one may deliberately want to trade speed for size.

Regarding -O0: I understand why you would want to benchmark a debug build, but surely benchmarking a build that’s closer to what most people run must have some kind of value, no?

-Os can be a production build: one for which the system cares about the binary size. You still want that to be fast if possible.

Absolutely, I am claiming that we should benchmark both. Debugging some problem and finding that running your application is twice as slow is not a good experience either. Or running the test suite in debug mode, which all of us do constantly. I think that is important (although obviously much less important than a production build made to target speed,l

2 Likes

By the way, I recently proposed to build Python in debug mode on the Windows CI: bpo-43244: GitHub Action builds Python in debug mode on Windows by vstinner · Pull Request #24914 · python/cpython · GitHub But I abandoned the idea since it made tests 1.6 slower (12 min 19 sec => 19 min 41 sec), and the Windows CI (in release mode) is already one of the slowest CI.