Python `3.11` frame structure and various changes

Hi everyone,

I am the maintainer of a tracing profiler written in C called yappi (sumerc/yappi/) and Blackfire. I have been using some weird ways/undocumented structures/APIs and somehow maintaining it for 10 years, but this 3.11 version was really, REALLY hard to support. Any help is really appreciated.

The version I am using is 3.11.0b5. My current problems is as following:

  1. I have been using frame->f_state structure to check its generator state to detect if there is a coroutine running on it.Yappi supports asyncio wall-time profiling, thus I am using these states to measure coroutine enter/yield/exit.
    See: yappi/_yappi.c at 58c876b52740aa7120daa7446543d2f0928b9623 ¡ sumerc/yappi ¡ GitHub.

    I was using following code:

    return (frame->f_state == FRAME_SUSPENDED);
    

    Now thinking to use something like following:

    _PyFrame_GetGenerator(frame)->gi_frame_state ???
    

    But not sure if this is the correct way to do it.

  2. I have been using:

    const char *firstarg = PyStr_AS_CSTRING(PyTuple_GET_ITEM(cobj->co_varnames, 0));
    

    But now, co_varnames is gone. I have read some suggestions on this:

    either: (this seems to be slow)

     PyObject *co_varnames = PyObject_GetAttrString((PyObject *)cobj, "co_varnames");
    

    or I think Guido suggested to use below internal API for this:

    _PyCode_GetVarnames(..)
    

    But, when I compile the extension, with the including the header, I get symbol errors. How can I link these internal APIs in the extension? Any help on this?

  3. This is the most problematic for me as it is causing a SIGSEGV without any clue. I have been calling Python function from C code to retrieve some metadata depending on the library used (asyncio, greenlet, threading…etc)

An example usage:

And I am calling these callbacks via following from C side:

PyObject_CallFunctionObjArgs(cbk, args)

Now, this type of calling currently throws seg. fault and I don’t have any clue on why. Here is a traceback:

(gdb) bt
#0  _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3873
#1  0x00005555557de1ea in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fed808, tstate=0x555555d1d1c8 <_PyRuntime+166312>) at ./Include/internal/pycore_ceval.h:73
#2  _PyEval_Vector (tstate=0x555555d1d1c8 <_PyRuntime+166312>, func=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>)
    at Python/ceval.c:6424
#3  0x00005555556b8184 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=0, args=0x7fffffffc230, callable=0x7ffff61649a0, tstate=0x555555d1d1c8 <_PyRuntime+166312>)
    at ./Include/internal/pycore_call.h:92
#4  object_vacall (tstate=0x555555d1d1c8 <_PyRuntime+166312>, base=base@entry=0x0, callable=0x7ffff61649a0, vargs=vargs@entry=0x7fffffffc290) at Objects/call.c:819
#5  0x00005555556bcf38 in PyObject_CallFunctionObjArgs (callable=<optimized out>) at Objects/call.c:925
#6  0x00007ffff6432fab in _call_funcobjargs (func=<optimized out>, args=args@entry=0x0) at yappi/_yappi.c:342
#7  0x00007ffff64342d3 in _current_context_name () at yappi/_yappi.c:358
#8  _yapp_callback (self=<optimized out>, frame=0x7ffff61c4e10, what=<optimized out>, arg=0x0) at yappi/_yappi.c:1212

Any help is really appreaciated!

Thanks in advance,

5 Likes

I think you’ll need the help of @pablogsal or @vstinner with this.

2 Likes

Hello @pablogsal , @vstinner . I really need some help/suggestions on some of these issues if you have some time :slight_smile: Now, please ping me if there is anything unclear about my question, I can try clarifying more.

The really blocking issue for me right now is that: I somehow cannot call a Python function from the C extension during profiling. In other words: a function foo is called, this triggers a call_enter event on the tracing profiler and then inside the C extension I try to call a Python function. Maybe you might have any idea on why this happens on 3.11, all other Python versions including 2.7 is just working fine.

Unfortunately, I am currently very busy preparing the 3.11.0rc1 release so I am afraid I won’t be able to look at this in detail :S

The really blocking issue for me right now is that: I somehow cannot call a Python function from the C extension during profiling. In other words: a function foo is called, this triggers a call_enter event on the tracing profiler and then inside the C extension, I try to call a Python function.

What do you mean that you cannot call a function? What error does this trigger? Could you provide a reproducer with no external dependencies for this?

2 Likes

What do you mean that you cannot call a function? What error does this trigger? Could you provide a reproducer with no external dependencies for this?

It currently seg. faults on every platform I tried (Mac+Linux). The traceback I get is below, the Python version I was using is 3.11.0b5. It is simply calling a Python function inside the C extension via PyObject_CallFunctionObjArgs in the context of call_enter profiling event.

Below is the traceback I am getting, if this traceback does not light up anything, I could try share a branch that reproduces the error easily. When you have time, of course.

(gdb) bt
#0  _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3873
#1  0x00005555557de1ea in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fed808, tstate=0x555555d1d1c8 <_PyRuntime+166312>) at ./Include/internal/pycore_ceval.h:73
#2  _PyEval_Vector (tstate=0x555555d1d1c8 <_PyRuntime+166312>, func=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>)
    at Python/ceval.c:6424
#3  0x00005555556b8184 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=0, args=0x7fffffffc230, callable=0x7ffff61649a0, tstate=0x555555d1d1c8 <_PyRuntime+166312>)
    at ./Include/internal/pycore_call.h:92
#4  object_vacall (tstate=0x555555d1d1c8 <_PyRuntime+166312>, base=base@entry=0x0, callable=0x7ffff61649a0, vargs=vargs@entry=0x7fffffffc290) at Objects/call.c:819
#5  0x00005555556bcf38 in PyObject_CallFunctionObjArgs (callable=<optimized out>) at Objects/call.c:925
#6  0x00007ffff6432fab in _call_funcobjargs (func=<optimized out>, args=args@entry=0x0) at yappi/_yappi.c:342
#7  0x00007ffff64342d3 in _current_context_name () at yappi/_yappi.c:358
#8  _yapp_callback (self=<optimized out>, frame=0x7ffff61c4e10, what=<optimized out>, arg=0x0) at yappi/_yappi.c:1212

I tried quickly to do a reproducer but I cannot reproduce. I used this extension:

#define PY_SSIZE_T_CLEAN
#include <Python.h>

#include <assert.h>

static PyObject* the_callback = NULL;

int
PyTraceFunction(
        PyObject* obj,
        PyFrameObject* frame,
        int what,
        PyObject* arg)
{

    switch (what) {
        case PyTrace_CALL: {
            printf("Enter call\n");
            break;
        }
        case PyTrace_RETURN: {
            printf("Return call\n");
            break;
        }
        default:
            printf("Other event: %d\n", what);
            break;
    }


    if (the_callback) {
        PyObject* res = PyObject_CallNoArgs(the_callback);
        Py_DECREF(res);
    }

    return 0;
}
PyObject*
install_func(PyObject*, PyObject*)
{
    PyEval_SetProfile(PyTraceFunction, nullptr);
    Py_RETURN_NONE;
}


PyObject*
set_callback(PyObject*, PyObject* arg)
{
    Py_INCREF(arg);
    the_callback = arg;
    Py_RETURN_NONE;
}



static PyMethodDef methods[] = {
        {"install_func", install_func, METH_NOARGS, "install_func"},
        {"set_callback", set_callback, METH_O, "install_callback"},
        {NULL, NULL, 0, NULL},
};

static struct PyModuleDef moduledef = {PyModuleDef_HEAD_INIT, "native_ext", "", -1, methods};

PyMODINIT_FUNC
PyInit_native_ext(void)
{
    return PyModule_Create(&moduledef);
}

and this driver:

import native_ext

def callback():
    print("Callback done!")

native_ext.set_callback(callback)

def foo():
    bar()

def bar():
    baz()

def baz():
    ...

native_ext.install_func()

foo()

This correctly prints:

Enter call
Callback done!
Enter call
Callback done!
Enter call
Callback done!
Return call
Callback done!
Return call
Callback done!
Return call
Callback done!
Return call
Callback done!

I’m using 3.11.0b5. At this point, it would help a lot if you can give us a reproducer with no external dependencies (no branches or stuff that requires 3rd party libraries other than the standard library).

Hmm… This is already cool! I will use this to identify what is happening differently for me. I will update…

Thanks.

2 Likes

Nice, but please, do this ASAP because the RC release of Python 3.11 is Friday and we need some time to debug or fix if this turns to be a bug.

4 Likes

Ok. Hopefully tomorrow I will be working on this!

Hi @pablogsal, I have looked into the issue. First thing is first: there seems to be no issue that might block the release of Python, whatsoever.

The issue for me was following: I was manually trying to modify ThreadState->c_profilefunc and use_tracing fields to enable the profiler. See this. Once I switched using PyEval_SetProfile(…) seg. faults went away so I believe there is something more to be done to enable the profiler. The reason why I cannot use PyEval_SetProfile() is that yappi profiles all threads so I need to be able to set SetProfile for every thread as following:(pseudocode)

for(is=PyInterpreterState_Head();is!=NULL;is = PyInterpreterState_Next(is))
    {
        for (ts=PyInterpreterState_ThreadHead(is) ; ts != NULL; ts = ts->next) {
            PyEval_SetProfile(ts);
        }
    }

Any suggestion for this kind of usage? Until so far, I was mimicking the implementation of PyEval_SetProfile, but maybe there is another approach I can use?

1 Like

Any suggestion for this kind of usage? Until so far, I was mimicking the implementation of PyEval_SetProfile, but maybe there is another approach I can use?

I’m adding a supported API to do exactly that in 3.12:

In 3.11 you can do something like the following (is using some private APIs so you are on your own if this breaks):


 void
 PyEval_SetProfileAllThreads(Py_tracefunc func, PyObject *arg)
 {
     PyThreadState *this_tstate = PyThreadState_Get();
     PyInterpreterState* interp = this_tstate->interp;
     PyThreadState* ts = PyInterpreterState_ThreadHead(interp);
     while (ts) {
         if (_PyEval_SetProfile(ts, func, arg) < 0) {
             _PyErr_WriteUnraisableMsg("in PyEval_SetProfileAllThreads", NULL);
         }
         ts = PyThreadState_Next(ts);
     }
 }
4 Likes

Cooool :slight_smile: Thanks for this! I have been doing this in an undocumented way for years…

@sumerc Could you open an issue, maybe at Issues ¡ sumerc/yappi ¡ GitHub (or point me to an existing issue)?
Should we able to show that it is a CPython issue, then we can open a CPython issue.

In the your code example

_PyFrame_GetGenerator(frame)->gi_frame_state

What is the type of frame?

Should we able to show that it is a CPython issue, then we can open a CPython issue.

It seems not to be a Python issue. There was one missing line of code necessary to enable profiling of a ThreadState. See this fix here

In short:

I was using below code to enable profiler per-threadstate:

#if PY_VERSION_HEX < 0x030a00b1
    ts->use_tracing = 1;
#else
    ts->cframe->use_tracing = 1;
#endif
    ts->c_profilefunc = _yapp_callback;

and added following line at the end:

    ts->cframe->use_tracing = 255;

And suddenly all my Seg. faults go away. This is the part I was missing from the implementation of PyEval_SetProfile(...) here.

In any case, if you like, you can reproduce the same error in Python `3.11` fixes by sumerc ¡ Pull Request #107 ¡ sumerc/yappi ¡ GitHub by just removing the line ts->cframe->use_tracing = 255; and run tests via python run_tests.py. That triggers an interesting seg. fault.

_PyFrame_GetGenerator(frame)->gi_frame_state

This is the frame I get from profiler callback that I set in PyEval_SetProfile(...) as Py_tracefunc. So, it is a PyFrameObject*.

I’m glad you’ve fixed it.

While this is all still fresh in your memory, would you mind briefly writing up where you you need to delve into CPython internals. It would be really valuable. We are trying to flesh out the API for the sort of introspection you are doing.

By “delving into the internals”, I mean anywhere you need to access the fields of a struct, basically any foo->bar, rather than PyFoo_getBar(foo)

_PyFrame_GetGenerator() takes a _PyInterpreterFrame *, are you not getting a compilation error passing a PyFrameObject*?

_PyFrame_GetGenerator() takes a _PyInterpreterFrame , are you not getting a
compilation error passing a PyFrameObject
?

TBH, I have not tried using it yet. I just seen similar code in ceval.c and asked if this is the right way to do it. See yappi/yappi/_yappi.c at 0fd2b5f9b2885f7fd518e6a42c8ee4ab461c5f96 ¡ sumerc/yappi ¡ GitHub. If you have any idea on how to do it, I would love to hear :slight_smile:

While this is all still fresh in your memory, would you mind briefly writing up
where you you need to delve into CPython internals. It would be really valuable.
We are trying to flesh out the API for the sort of introspection you are doing.

Sure. Would love to do that. I am writing what I need and WHY I need it for. That is because: maybe someone can suggest a better way(s) to do it.

1/
PyFrameObject->f_state: I use this to differentiate a await and an exit. I need this distinction to measure wall-time of a specific coroutine. E.g., I did not increment function call count when function is yielded for an await. And currently, I am not sure on how to do it.

2/
I use frame->co_flags & CO_COROUTINE || frame->co_flags & CO_ITERABLE_COROUTINE || frame->co_flags & CO_ASYNC_GENERATOR to determine if the running function is simply async or not.

3/
frame->co_varnames The reason I need this is the same reason with the thread opened here Get the class name if the function is a method. __qualname__ was not always possible before (2.7 times…), thus I did not change the implementation. There was also another issue on not using qualname (maybe performance), but I honestly do not remember…

4/
frame->f_code->co_flags & CO_VARARGS and max_arg_count = frame->f_code->co_argcount; Capture arguments from the function arguments (by name or by positional index). We are using that in Blackfire profiler EXTENSIVELY. We have a metric system where we collect data whenever a specific function call happens:

e.g., collect the argument query whenever function name is equals to something:

db.execute(query='select * ...').

If the function is built-in(C function) we read it from frame->f_valuestack and if the function is a normal Python function we use: arg_name = PyTuple_GetItem(frame->f_code->co_varnames, int_arg_id-1);

5/
(_pyctx_t *)PyThreadState_GET()->context; I need this for async. profiling. Internally, every context is hold in another slot in a hash table where I aggregate measurements. From profiler’s perspective every context has a separate callstack so they need to be measured separately.

These are things I remember for now.

One question. if you don’t mind: Are you going to implement a new API to access these fields only or are you also going to restrict access to them as well?

2 Likes

Thanks for the feedback.

I think we already have APIs for what you need, although some of them are very new.

Are you going to implement a new API to access these fields only or are you also going to restrict access to them as well?

It is very hard to restrict access to a field in C, but we will try to discourage it if possible.
The main thing is to make sure that there is a way to perform the operation using the API first.

I think we already have APIs for what you need, although some of them are very new.

Cool! Thanks.

It is very hard to restrict access to a field in C, but we will try to discourage it if possible.
The main thing is to make sure that there is a way to perform the operation using the API first.

Relieved to see there will always be a way to perform things. I just asked because I have had a small(ish) experience on writing this argument capturing feature for Ruby. I mentioned this on Bullet 4 above. Ruby was very conservative on what kind of internal data is available for access and I remember we really struggled to introspect this rb_control_frame_t (IIRC) which is similar to PyFrameObject. Then we find a more slower way to do it in the end but the interesting part is: people just write projects to include the type internal type definitions themselves even when they are hidden by the language. Example: GitHub - mark-moseley/ruby_core_source: Retrieve ruby core source files . So, like you said it IS hard to hide these things from the user and if we somehow end up implementing these kind of tools, why it is hidden in the first place? I always feel and say this everywhere: the playability, readability and documentation of the C API gave Python this much adoption, especially in Data world.

Again: I am not an expert on these kind of implementations and above ideas might be subjective with a limited data :slight_smile: I just wanted to share a small anectode/opinion while I got your attention.

Hi,

The frame.f_state member has been removed: replaced by an internal frame.f_frame.f_state member. If you consider that it’s important use case, please request a public API to access it. Until a new API is added (to Python 3.12), you can use the internal C API to access frame.f_frame.f_state.

The code.co_varnames member has been removed: PyCode_GetVarnames() was just added to Python 3.11 before Python 3.11rc1 was tagged. I added it to pythoncapi-compat for Python 3.10 and older: pythoncapi_compat.h API — pythoncapi_compat documentation

To set a profile functions in all thread: this feature is being implemented in GH-93503: Add thread-specific APIs to set profiling and tracing functions in the C-API by pablogsal ¡ Pull Request #93504 ¡ python/cpython ¡ GitHub : add PyEval_SetProfileAllThreads().

ts->cframe->use_tracing = 1; please don’t modify directly this internal API (it changed multiple times!), but use _PyEval_SetProfile() function, or even the internal _PyThreadState_UpdateTracingState() static inline function. For now, _PyEval_SetProfile() is private: bpo-35370: Add _PyEval_SetTrace() function (GH-18975) · python/cpython@309d7cc · GitHub I proposed bpo-39947: Add PyThreadState_SetTrace() function by vstinner · Pull Request #29121 · python/cpython · GitHub to make it public, but I wasn’t sure if the proposed API would fit profiler/debugger needs, so I closed my PR. Issue discussed at: The DISPATCH() macro is not as efficient as it could be (move PyThreadState.use_tracing) · Issue #87926 · python/cpython · GitHub If you consider that the function must be public, please open a issue to request it. But previously discussed PyEval_SetProfileAllThreads() may fit your needs.

Good luck with porting happy to Python 3.11 :slight_smile: Hopefully, Python 3.11 API and internals should no longer change after the Python 3.11.0 final release which is planned soon :wink:

2 Likes

Hi Victor,

Thanks for the precise explanation!

The only question I have is: I still could not find frame.f_frame.f_state like you suggested. I saw below in 3.11.a07 changelog:

Remove the f_state field from the _PyInterpreterFrame struct. Add the owner field to the _PyInterpreterFrame struct to make ownership explicit to simplify clearing and deallocing frames and generators.