Adding a CI Check to Detect New C Global Variables

tl;dr Are there any objections to adding a (required) CI check that helps us avoid accidentally adding any new C global variables? I expect failures to be very infrequent.

In my recent “Status Update about Interpreter Isolation” thread I mentioned how we have effectively eliminated nearly all mutable C global variables from CPython, which is exciting. However, globals remain a threat to interpreter isolation, so the trick now is how to keep us from adding more, especially accidentally.

A few years back I added Tools/c-analyzer to help us identify global variables. That tool can be used to help keep us from adding new unsupported globals. (While it isn’t perfect, and perhaps I’d do it differently if I did it again, we have the tool now and it does a good job.)

With c-analyzer tool as the basis, I have a PR up that adds a check to CI (as part of the existing “Check if generated files are up to date” check):

I wanted to check in here before merging it, since it directly impacts all contributors (even if very unlikely to do so).

As with the rest of the “Check if generated files are up to date” CI job, normally the globals check will pass and contributors won’t notice. Rarely, it might fail for one of the following reasons:

  • legitimate:
    • an unsupported global was added accidentally
    • an unsupported global was added on purpose and shouldn’t have been
    • an unsupported global was added on purpose and accidentally omitted from the ignore list
  • defects in the tool:
    • a supported global was added but the tool identified it as unsupported
    • the c-analyzer choked while parsing some large literal initializer
    • the c-analyzer couldn’t handle some esoteric C syntax that wasn’t in the code base before

Note that none of those cases are common. Since I added the tool in 2020, I’ve run it manually every few months and fixed all the failures. Each time there were only a handful of globals I had to fix, and most of those have been related to new static types or new array/struct literals that span thousands of characters (e.g. in generated code). I’ve only had to fix the tool’s parser a few times to adjust for C syntax corner cases. I expect that the CI check will fail for literally only a few PRs each month, if that, and each failure will be easy to sort out.

To help with failures, I’m adding a long explanatory text that gets printed to stdout when the check fails. (See the PR.) This should help contributors deal with those infrequent failures.

Anyway, I wanted to check in here, to see if there are any objections to adding the CI check. Again, it will help us avoid adding unsupported globals, which is otherwise easy to do accidentally. I expect failures to be rare.




It makes sense to me.

An excellent idea!

SGTM, I’ll approve the PR.

Thanks for the feedback, everyone. I’ll add the check today.