Recording code execution in the interpreter, internals, bytecode, code watching oh my

jgarvin · November 17, 2024, 7:50pm

I want to record if functions in certain py files are ever executed, without requiring user annotations. I’m working on a tool to track which code affected which produced data, with needs similar to but not exactly like a debugger or profiler. I have tried several approaches to this at the python level but none of them feel robust and I’m wondering if the community has any input.

Attempt 0: using the existing tracing/profiling systems. They have high overhead because they monitor every call and then require you to filter yourself after. I don’t want to monitor the vast majority of functions. Also after a function is executed once, I can stop monitoring it. I only need to know yes/no if something ran, not how many times.

Attempt 1: Install an import hook that just for those modules wraps every top level function it finds, but this gets hairy quickly. You need to handle staticmethod, classmethod, property, functool.wraps, etc. If the hook is a post-import hook then any function you see has already had its decorators applied, and some of the functions I want to monitor may be decorators, so they’ve already executed. If it’s a pre-import hook you need to modify the AST to insert the wrapper around all function definitions, but that’s hard too, you need to spot lambdas that might be in the middle of complex expressions. And in general there is no good way to reach into function objects and wrap their nested functions because the nested functions lack a stable function object (they are recreated every time the outer function is called).

Attempt 2: patch the bytecode of functions to contain a small preamble at runtime to record the fact they were called. The problems here are similar to attempt 1, you still have the general problem of it being difficult to find all the functions that you need to patch. But also CodeType objects are immutable and can’t be patched directly. You can only reasssign the func_object.__code__. But nested functions only have a stable code object, not a a stable function object, so there’s no single function object you can get away with patching.

Attempt 3: Maybe there is something for this in the C API? Turns out there is a new code ‘watcher’ API, but it only notifies you about the creation and deletion of code objects in general, not the execution of specific objects. Still, maybe I could use it to detect creation and then patch the bytecode? But then you still hit the fact that the code objects are immutable, and co_code on PyCodeObject is no longer public because of the adaptive interpreter, and only PyCode_GetCode is exposed with no setter. It is just C code though, I could break encapsulation and overwrite bytes myself but it’s not super clear how to do this in a way that won’t break adaptive, and that’s an area of the code that seems likely to churn as optimization is added in the future.

Attempt 4: PEP523 adds frame evaluators to let you have more direct control of execution. But you can only have 1 (there’s a single field for one in the interpreter struct), so my shared object would break composability with any other system that wanted to use such an evaluator. Since my evaluator would be read only and then just delegate to the default one it could be composable, if the API supported chaining. I could have my evaluator wrap whatever one currently is set, but then my code is sensitive to load order. I also noticed that _PyInterpreterState_SetEvalFrameFunc calls _Py_Executors_InvalidateAll which may disable the JIT/adaptive? I would like to keep the benefits. Also this would be adding overhead to all calls, not just the ones I want to monitor.

Attempt 5: What if I just modified the interpreter directly? I could just find where MAKE_FUNCTION is processed and try making it so the bytecode is what I want from the beginning. I’m having trouble finding the “1” location for that though. It looks like there is the interpreter, tier 1, and tier 2, and a good amount of generated code. If this is the way to go tips for where would need to be patched would be appreciated

MegaIng · November 17, 2024, 8:56pm

The correct answer is Attempt 0. If the performance overhead is too much (try it, don’t just assume), maybe the somewhat new sys.monitoring will be enough? Otherwise you can also implement these tracing/monitoring functions in C and register them using the C API. That will have a negligible overhead for sure.

jgarvin · November 17, 2024, 9:37pm

sys.monitoring letting me do CALL monitoring with the local events feature so I can just do it on some code objects may do the trick, will try thanks