Pure C structured traceback

Hi. I’ve been working on various open-source python observability projects at Datadog, and one of the challenges I’ve encountered has been around getting a traceback in a minimally intrusive way. In particular, I’d love to be able to acquire structured stack traces in a signal-safe manner, that involves no memory allocations or GIL releases. This would be useful any tooling that runs inside a signal-handler (e.g. a crashtracker) or for observability wrappers written in C/C++ that want to get backtraces without reentering into the interpreter (e.g. a pure C++ implementation of a memory allocation profiler).

Python already has such a function, but it writes human-readable text to a file-descriptor, rather than a structured array. https://github.com/python/cpython/blob/main/Include/internal/pycore_traceback.h

extern void _Py_DumpTraceback(

    int fd,

    PyThreadState *tstate);

Would there be any objection to my adding a structured variant of this function that writes to a user supplied array of frame objects?

/* Traceback frame info for signal-safe collection.

   This function is signal safe. */

#define Py_TRACEBACK_FRAME_FILENAME_MAX 256

#define Py_TRACEBACK_FRAME_NAME_MAX 256



typedef struct {

    char filename[Py_TRACEBACK_FRAME_FILENAME_MAX];

    int lineno;

    char name[Py_TRACEBACK_FRAME_NAME_MAX];

} PyTracebackFrameInfo;



/* Collect the traceback of a Python thread into an array of structs.

   Caller provides the array and max_frames. Returns the number of frames

   filled, or -1 if tstate is invalid/freed.



   This function is signal safe. No memory allocations or GIL releases.



   Export for _testinternalcapi. */

PyAPI_FUNC(int) _Py_GetTracebackFrames(

    PyThreadState *tstate,

    PyTracebackFrameInfo *frames,

    int max_frames);


3 Likes

FWIW, it’s not really signal-safe or thread-safe. If you call it at a signal-handler or without the GIL / attached thread state, it may crash, although most of the time it won’t.

2 Likes

Interesting. Should the header comment be updated? cpython/Include/internal/pycore_traceback.h at main · python/cpython · GitHub

There’s a proposal to export them publicly, which removes the internal comment, and makes it clearer that it’s meant for cases where you’re already crashing.:

Given that it reads interpreter data structures that may be partially modified, the function might produce incomplete output or it may even crash itself.

2 Likes

Along those lines, I think @alexmalyshev is interested in both types of use cases (crashes and observability).

The current implementation is probably “good enough” for crash handling.

For observability we can either:

  1. Try to harden _Py_DumpTraceback (and/or expose a more structured API).
  2. Encourage out-of-process capture along the lines of 3.15’s _remote_debugging implementation.

For (1), I think it’s going to difficult to make it truly async-signal-safe. It’s not enough to limit C library calls to async-signal-safe function. All the data we touch in PyThreadState has to be “lock-free atomic objects” or volatile sig_atomic_t or meet other similar requirements. You also need a way to get the PyThreadState, which usually means PyGILState_GetThisThreadState. That may or may not be async-signal-safe depending on how Python is compiled.

1 Like

Thanks for the discussion - this is great.

  1. For crash-handling, we recognize that any operations are best-effort, as the crash may have corrupted internal state used by libc etc. Our crash-handler forks the process and does as much work as possible in a clean sidecar to minimize this, and in practice we’ve not seen double-crashes due to the use of _Py_DumpTraceback. A `_Py_DumpStructuredTraceback` would be even more useful to us (and hopefully others) since its always nicer/safer to start with structured input than to try to parse human-readable debug messages.

  2. The other use-case is observability code (e.g. profilers) that wish to get a stack-trace without reentering the python interpreter / releasing the GIL. For example, a memory profiler which attempts to get a back-trace risks having the python interpreter allocate new memory while collecting that backtrace, which means we have to be careful about reentrancy (similar issues apply for a lock profiler). Having a pure C backtrace function which is documented not to allocate or operate on locks would make it easier to write correct and safe profiler implementations.

Assuming this is something that would be of interest to others in the community, how can I be most helpful in making it happen? I see @alexmalyshev is already working on related areas: I’m happy to assist if there is a way in which I would be useful.

1 Like

Heya, thanks for opening the topic. From my end, I had stumbled upon a crash-handler / signal-handler in our codebase (currently 3.12) that was depending on _Py_DumpTraceback (became internal in 3.13). Our use-case is focused on catching crashes, dumping useful Python runtime info like the stacktrace to a file, and then coming back to inspect it later. Hence why making the existing API PyUnstable works out well for us. I get that it’s not fully async-signal-safe, that’s okay, our crash-handler is best-effort.

I believe our crash-handler’s output has generally been looked at by humans directly without any other processing. This is probably going to be fed to LLMs more and more going forward, so as long as we get useful info into a file we’re happy. I think having the structured output would be nice for observability purposes, but I don’t have enough experience here to speak on the tradeoffs between the different approaches (_Py_DumpTraceback, out-of-process debugging, etc.).

Looking at the dev guide it seems the next step would be to open an issue and a PR implementing a structured traceback API. Any objections to my doing so? This feels like small enough of a change not to need a PEP, but happy to create one if its warranted.