CPython 3.12, greenlet and tracing/profiling: How to not crash and get correct results

jamadden · September 6, 2023, 5:54pm

Greetings,

I’m a greenlet maintainer trying to make sure greenlet works correctly with Python 3.12, and I’m running into trouble when it comes to profiling. I understand that greenlet isn’t a supported use of CPython, but I was hoping someone might have some advice or suggestions to help avoid crashes and get correct results in the best possible way.

In summary, when there are suspended greenlets (whose frames are not “executing”), installing or removing profilers/tracers does not update the instrumentation state of those frames (because they’re not in the linked list), leading to failures when they are later resumed. I’m hoping to find fast, correct, ways to handle that, hopefully without getting into too much of the implementation of CPython.

I’ll provide an example of code that currently crashes (debug builds of) the interpreter due to assertion failures, describe what I think the situation is, and then discuss what I’ve looked at as potential solutions.

First, this code will crash the interpreter (greenlet == 3.0rc1, CPython 3.12rc2):

import greenlet
import sys

def profile(frame, event, *stuff):
    print("\tProfile", frame.f_code, event, *stuff)

def g1_run():
    f = sys._getframe()
    print('In g1 with code  ', f.f_code, 'installing profiler')
    sys.setprofile(profile)
    g2.switch()
    return 42

def g2_run():
    f = sys._getframe()
    print('In g2 with code  ', f.f_code, 'removing profiler')
    sys.setprofile(None)
    g1.switch()

g1 = greenlet.greenlet(g1_run)
g2 = greenlet.greenlet(g2_run)
f = sys._getframe()
print('In main with code', f.f_code, 'starting greenlet')

# Start the first greenlet
g1.switch()

This code starts a greenlet (g1); that greenlet installs a profiler, and then switches context to a second greenlet (g2). That greenlet removes the profiler, and then switches back to the first greenlet; at this point, the interpreter crashes.

When run, it produces output like this: (I’ve modified CPython’s instrumentation.c to print the lines starting with “CPython”.)

In main with code <code object <module> at 0x1013f6560, file "/tmp/min_trace.py", line 1> starting greenlet
In g1 with code   <code object g1_run at 0x1011228e0, file "/tmp/min_trace.py", line 7> installing profiler
CPython: Instrumenting frame at 0x1012ac020 and code at 0x1011228e0
CPython: Calling instrumentation for code 0x1011228e0 ([code ver] 1 == [interp ver] 1?)
	Profile <code object g1_run at 0x1011228e0, file "/tmp/min_trace.py", line 7> c_call <built-in method switch of greenlet.greenlet object at 0x1014537f0>
CPython: Calling instrumentation for code 0x101122a70 ([code ver] 1 == [interp ver] 1?)
	Profile <code object g2_run at 0x101122a70, file "/tmp/min_trace.py", line 14> call None
CPython: Calling instrumentation for code 0x101122a70 ([code ver] 1 == [interp ver] 1?)
	Profile <code object g2_run at 0x101122a70, file "/tmp/min_trace.py", line 14> c_call <built-in function _getframe>
CPython: Calling instrumentation for code 0x101122a70 ([code ver] 1 == [interp ver] 1?)
	Profile <code object g2_run at 0x101122a70, file "/tmp/min_trace.py", line 14> c_return <built-in function _getframe>
CPython: Calling instrumentation for code 0x101122a70 ([code ver] 1 == [interp ver] 1?)
	Profile <code object g2_run at 0x101122a70, file "/tmp/min_trace.py", line 14> c_call <built-in function print>
In g2 with code   <code object g2_run at 0x101122a70, file "/tmp/min_trace.py", line 14> removing profiler
CPython: Calling instrumentation for code 0x101122a70 ([code ver] 1 == [interp ver] 1?)
	Profile <code object g2_run at 0x101122a70, file "/tmp/min_trace.py", line 14> c_return <built-in function print>
CPython: Calling instrumentation for code 0x101122a70 ([code ver] 1 == [interp ver] 1?)
	Profile <code object g2_run at 0x101122a70, file "/tmp/min_trace.py", line 14> c_call <built-in function setprofile>
CPython: Instrumenting frame at 0x1012b0020 and code at 0x101122a70
CPython: Calling instrumentation for code 0x101122a70 ([code ver] 2 == [interp ver] 2?)
CPython: Calling instrumentation for code 0x1011228e0 ([code ver] 1 == [interp ver] 2?)
Assertion failed: (code->_co_instrumentation_version == tstate->interp->monitoring_version), function call_instrumentation_vector, file instrumentation.c, line 967.
Fatal Python error: Aborted

Current thread 0x000000020511a080 (most recent call first):
  File "/tmp/min_trace.py", line 11 in g1_run

The assertion that fails is code->_co_instrumentation_version == tstate->interp->monitoring_version.

In Python 3.12, there is a new monitoring infrastructure specified by PEP 669, and sys.settrace and sys.setprofile are implemented using this infrastructure, which is based on modifying code objects. Under the covers, when a tracer or profiler is installed or removed, instrumentation.c winds up calling instrument_all_executing_code_objects(), which walks the list of thread states the interpreter knows about, and for each one of those, walks its list of frames, instrumenting each code object that’s executing in the thread.

When g1 installs the profiler, the frame that g1 is executing (“In g1 with code <code object g1_run at 0x1011228e0…”) is found and instrumented (“CPython: Instrumenting … code at 0x1011228e0”).

We then switch to greenlet g2. Because this is independent of the first greenlet and because greenlets have their own stack, this manipulates the current thread’s list of frames, making the frame for g2 current (“In g2 with code <code object g2_run at 0x101122a70,…”), and unlinking the frame for g1 (because it is no longer running).

As a result, when g2 removes the profiler, the only active frame to be instrumented is g2’s (“CPython: Instrumenting … code at 0x101122a70”). Switching back to g1 makes current a frame that wasn’t instrumented and thus whose _co_instrumentation_version hasn’t been incremented in lock step with the thread state’s. This leads to the assertion error.

My first attempt to solve this was to do the obvious, and save and restore tstate->interp->monitoring_version and tstate->interp->monitors around greenlet switches. While this does prevent the crash in simple cases like this, that’s clearly not the right solution. For one thing, it gets incorrect results (as what was supposed to be a global trace function is now effectively acting like a greenlet-specific trace function). More seriously, though, its unlikely to work in more complex scenarios.

I then searched for ways to force the frames to have their instrumentation updated on switching. I found that using sys.monitoring.set_local_events() appears to do this. Because a non-running greenlet can give you its frame in the gr_frame attribute, we can modify gr2_run to do this; this prevents the assertion error:

sys.monitoring.use_tool_id(2, 'greenlet') # pick a tool id. 
def g2_run():
    f = sys._getframe()
    print('In g2 with code  ', f.f_code, 'removing profiler')
    sys.setprofile(None)
    sys.monitoring.set_local_events(2, g1.gr_frame.f_code, 1)
    g1.switch()

That’s not production quality:

If the call stack is more than one level deep, I’d need to do that for each f_back.
I’m destroying whatever local events may have already been set, and its a no-op if you attempt to set events that match what is already set. Thanks to get_local_events this can probably be dealt with.
I dislike using up a tool ID. The PEP says that ids 6 and 7 are reserved for settrace and setprofile, so maybe I could use those in some cases. But in general, I think this needs to work for all monitoring so I don’t think I can count on those IDs.

That said, I think that can be made to work. But I’m hoping there’s a better way, both for performance and to eliminate some of the issues mentioned.

Because sys.monitoring.set_local_events is only callable from Python, and because greenlet switching is performance sensitive, I’d only want to do this if I detected that the profiling state has changed. Accessing tstate->interp->monitoring_version appears to do that, but I’d really really rather not be touching fields of PyInterpreterState — that’s only accessible if you include internal/pycore_interp.h with the correct defines set, and even then, on some platforms, that’s rather hard to do (greenlet compiles as C++, and that header isn’t meant to be included from C++).

I had some hope from a line of code I saw in instrumentaiton.c suggesting that maybe setting _co_instrumentation_version = UINT64_MAX on switch (all the way down the stack) might cause re-instrumentation automatically, but that appears not to work. (And it also requires using private fields I’d prefer not to have to touch.)

With all that said, I guess my questions come down to:

Is there a better way to force instrumentation updates for non-running code objects from C besides going through the Python set_local_events API?
If not, is there a way to know that I need to do that without accessing fields of the supposed-to-be-opaque PyInterpreterState object?
If I wind up needing set_local_events, is it guaranteed to have the effects I need?

Thank you for any suggestions you can provide, they are appreciated.

AlexWaygood · September 6, 2023, 5:57pm

cc. @markshannon

markshannon · September 8, 2023, 6:13pm

I don’t think there is a way to solve this for 3.12 without delving into the internals of CPython.

For 3.13, adding an API for greenlets (and any other code that wants to swap stacks) is the way to go.
Could you take a look at Add "unstable" frame stack api · Issue #91371 · python/cpython · GitHub