A fast variant of `PyType_GetModuleState`

For CPython internal usage, we’ve got _PyModule_GetState, which is a fast variant of PyModule_GetState, the module check in the latter is simply an assert in the former.

For PyType_GetModuleState, there are three ifs (two of them implicitly in PyType_GetModule):

  1. check that the given type is a heap type
  2. check that the given type has an associated module
  3. check that the result of PyType_GetModule is not NULL

For stdlib core extension modules, all of these conditions are always true (AFAIK). We could create a fast static inlined variant, _PyType_GetModuleState with a fast variant of PyType_GetModule inlined, where all three conditions are assert()ed. It would speed up a lot of heap type methods that need to access module state.

Unless there are objections, I’ll create an issue and a PR.

4 Likes

Issue opened as GH-101476, PR opened as GH-101477.

Uh, did you actually try to measure or make an informed estimate of the cost of the API you’re trying to replace?

From a quick read of your message, it seems like you’re worried about the cost of three boolean checks that will always predict true. Am I wrong?

1 Like

Also the cost of two regular C function calls. I’ll try to find a good way to benchmark it.

FTR, we have many similar fast variants for other APIs in the internal C API, often where branches are simply converted to asserts.

In the end it depends in which context the API is used. If it’s likely to be used in a tight loop, such as PyTuple_GET_ITEM, it makes sense to have an extremely trimmed down macro or inline function. But I’m not sure in which circumstances PyModule_GetState performance would really need to be trimmed down to the last nanosecond.

So, unless PyModule_GetState does something heavyweight such as a dict lookup (does it?), this doesn’t seem a very attractive proposition.

cc @gpshead for opinions.

1 Like

asyncio C implementation was recently converted to heap types and all and this function is called for almost everything. I would expect to have the best performance out of it, the entire point of that module is performance otherwise we have a pure python implementation on hand.

I would say as far as that such boolean checks for API misuse should not have been added in the first place, everyone has to pay for the price for such effective useless checks for API misuse.

Overall I am +1 and I have approved the PR.

1 Like

FYI, PyModule_GetState already has a fast variant (_PyModule_GetState). My proposal is to add a fast variant for the slightly more complex PyType_GetModuleState API.

1 Like

Ah, sorry, I mistyped my response indeed. The overall point still holds.

Did you try to benchmark the module before/after the conversion to check if something changed? Or are you just making intuitive assumptions about performance?

2 Likes

Last I benchmarked, there was a 3-4% slowdown. Microbenchmarks don’t tell the whole story here because of branch prediction and cache behaviour. Also if there wasn’t slowdown then _PyModule_GetState (which does less work) and exists already would not have been added in the first place. The slowdown might be more apparent in other modules but I don’t have bandwidth to investigate further.

Is it 3-4% on an applicative benchmark or on a micro-benchmark? If the latter, then it’s probably negligible.

Well, let’s not try to second guess past decisions without actually measuring things :slight_smile: CPython probably has had its share of dubiuous API decisions.

This was on a FastAPI server.

For more background, you may want to look at TODO list when PEP 573 "Module State Access from C Extension Methods" will be implemented · Issue #84318 · python/cpython · GitHub

1 Like

I took a look, but I don’t see any discussion of _PyModule_GetState there.

I believe Kumar linked to that issue/discussion not specifically because of _PyModule_GetState, but generally as it is an example of why optimising how to fetch the module state matters.

Regarding benchmarking, I can try to find time for that in a couple of weeks; I’m currently on the road with limited time for CPython stuff.

1 Like

For the sake of avoiding any confusion, I understand fully well that optimizing the module state matters. I was just expressing skepticism about the specific change that’s proposed here.

1 Like

FTR, Gregory responded on the PR: gh-101476: Add _PyType_GetModuleState by erlend-aasland · Pull Request #101477 · python/cpython · GitHub

1 Like