Add a High-Level Frame Hook API to CPython

Proposal: Add a High-Level Frame Hook API to CPython

Hi everyone,

I would like to start a discussion to propose a higher-level alternative API to PEP 523 for intercepting and modifying python frames prior to evaluation.

This proposal was developed in collaboration with William Wen, PyTorch core dev.

Disclaimer: Due to posting limits on new accounts, I’ve collected all links in the following gist:

Motivation

This proposal is motivated by the needs of PyTorch’s symbolic interpreter TorchDynamo (Dynamo). Dynamo is a bytecode-to-bytecode transpiler that does not execute frames itself - it inspects a frame’s bytecode, performs optimizations, and generates new bytecode that is then executed by the interpreter.

Dynamo currently relies on the eval frame API (PEP 523) to intercept and capture frame execution. Although PEP 523 is sufficient for implementing Dynamo, it exposes many low-level CPython implementation details that we would not like to excessively depend on. Supporting new Python versions has historically required substantial engineering effort, with changes to the internal frame evaluation mechanism being a significant source of difficulty.

We documented the steps needed to upgrade Dynamo for each new Python version from 3.11 to 3.13 here:

  • Supporting Dynamo in Python 3.11 - NULL
  • Supporting Dynamo in Python 3.11 - CPython Frame Evaluation
  • Supporting Dynamo in Python 3.12
  • Torch.compile support for Python 3.13 completed

In particular, there were some APIs that became private, such as frame creation/deletion and fastlocal buffer conversion to dict. However, these APIs are critical for our use case. To circumvent this, we had to copy-paste segments of CPython’s source code into our codebase, which is not ideal.

Some PEP 523-related issues we encountered included:

  • Formerly public APIs going private or being removed, for example, frame creation/deletion and fastlocal buffer conversion to dict
  • The eval frame hook was given the responsibility to free the frame argument

Part of our workarounds included copy-pasting parts of the CPython codebase into Dynamo, which is obviously not ideal.

To better address these issues, we propose adding a new high-level API to CPython that allows users to register frame hooks without needing to manage the low-level details of frame creation, management, and cleanup. This API would abstract away the complexities of CPython’s internal frame evaluation and provide a more stable interface for tools that need to intercept and customize frame execution.

Relationship to PEP 523

PEP 523 is intended to replace the interpreter’s frame evaluation function. This transfers to the new function the full responsibility for the frame lifecycle. In practice, this requires detailed knowledge of CPython internals and access to internal APIs that are hidden to the public.

The proposed API, on the other hand, aims to provide a structured way to register hooks that can modify the frame before it is executed. In this new model, a tool like torch.compile would only need to implement the hook function using public APIs, and the runtime would take care of calling the hooks and managing the frame lifecycle.

Proposal Overview

Registering a Frame Hook

This proposal introduces a new API that allows extensions to register a frame hook in a similar way to PEP 523. The frame hook function receives the frame about to be executed and may return a replacement code object:

typedef PyCodeObject* (*_PyFrameHookFunction)(struct _PyInterpreterFrame *);
PyUnstable_AddFrameHook(PyInterpreterState *interp, _PyFrameHookFunction hook_fn);
PyUnstable_RemoveFrameHook(PyInterpreterState *interp, _PyFrameHookFunction hook_fn);

This API allows users to register hooks that can modify a frame before it is executed without managing the frame lifecycle directly. The hooks would be called in the order they were registered, and each hook would have the opportunity to modify the frame before it is executed. This allows different tools to compose their behavior by registering independent hooks.

Example Usage

A simplified example of the frame hook function is shown below:

def frame_hook(frame: FrameType) -> types.CodeType | None:
    if not should_optimize(frame):
        return None

    if has_torch_tensor_in_frame(frame):
        # Replace the frame's code with optimized bytecode
        optimized_code: types.CodeType = dynamo.convert_frame(frame)
        return optimized_code

    return None

Interpreter Integration (_PyEval_*)

Internally, the interpreter checks for registered frame hooks before evaluating a frame. A conceptual sketch of the proposed API (_PyEval_FrameHook) might look as follows.

def _PyEval_FrameHook(tstate, frame, throw_flag):
    for hook in tstate.frame_hooks:
        # NOTE: this disabled field may or may not be needed by torch.compile
        if hook.disabled:
            continue
        # Hook is a function that takes a frame and returns a new code object
        code = hook(frame)        old_code = frame.f_code
        if code is not None and code != old_code:
            free_frame(frame)
            frame = new_frame_with_code(code)
        # else, hook did not modify the frame
    # 3.12+: the eval_frame function is responsible for freeing frame
    if tstate->interp->eval_frame == NULL:
        return _PyEval_EvalFrameDefault(tstate, frame, throwflag)
    return tstate->interp->eval_frame(tstate, frame, throw_flag)

This code is written in Python for the sake of simplicity, but the actual implementation is in C.

The hook interface described here is intentionally minimal. A hook receives a frame about to be executed and may return a replacement code object. For the purposes of this initial proposal, we assume that the original and new code objects have compatible localsplus layouts. In other words, the number and ordering of local, cell, and free variable slots must match between the two code objects. This simplifies the implementation and avoids the need to remap frame locals. With future discussions, we can extend the proposal to relax this restriction.

The _PyEval_EvalFrame function can be modified to check for a frame hook before evaluating the frame:

static inline PyObject*
_PyEval_EvalFrame(PyThreadState *tstate, _PyInterpreterFrame *frame, int throwflag)
{
    EVAL_CALL_STAT_INC(EVAL_CALL_TOTAL);
+
+   if (_PyInterpreterState_HasFrameHooks(tstate->interp) &&            
+           tstate->interp->enable_frame_hooks) {
+       return _PyEval_FrameHook(tstate, frame, throwflag);
+   }
+
    if (tstate->interp->eval_frame == NULL) {
        return _PyEval_EvalFrameDefault(tstate, frame, throwflag);
    }
    return tstate->interp->eval_frame(tstate, frame, throwflag);
}

Implementation

We have a prototype for the frame hook API can be found in the following links:

  • CPython
  • PyTorch
  • Frame Hook Example

Please, see the links in the disclaimer above.

Conclusion

The goal of this post is to gather feedback from the community on the proposed idea, its design, and its potential impact on CPython. We are open to suggestions and improvements to the proposed API and look forward to a constructive discussion.

7 Likes

I’m reluctant to add any new hooks for a few reasons:

  • We already have sys.monitoring which allows you to hook into almost any event
  • They can interact in ways that break assumptions about the way the VM acts
  • They can be bad for performance.

Since you can already hook into the start of function execution with the PY_START event, I don’t think you need a new hook. You will need to be able to replace the current frame with a frame for the code object, though.

What would the free_frame() and new_frame_with_code() functions you propose do, exactly?
Bear in mind that frames are not heap objects, they are part of a stack.

Hi Mark,

Thanks for your feedback on the proposal.

We already have sys.monitoring which allows you to hook into almost any event

I think the term “frame hook” may have caused some confusion, as it suggests a general-purpose mechanism similar to sys.monitoring. My understanding is that sys.monitoring is primarily an observational API intended for profiling, debugging, and tracing.

Our use case, on the other hand, requires the ability to intercept execution at function entry and replace the frame’s code object. In particular, Dynamo needs to:

  • inspect a frame before execution, and
  • replace its code object with a transformed version

This is why we rely on PEP 523 today.

That said, if sys.monitoring could support this kind of redirection, we would be very interested in exploring this approach instead of introducing a new API.

For additional context on how Dynamo works in practice, the following two posters illustrate the execution flow, how frames are intercepted at function entry, and how bytecode is transformed before execution:

I’d be happy to discuss any part of them in more detail if helpful.

They can be bad for performance.

That’s a valid concern. I think the relevant comparison is with PEP 523 rather than with sys.monitoring.

When no hooks are registered, the interpreter follows the existing fast path with only a minimal guard (null check), similar to how the eval_frame mechanism is handled today. We do not expect any meaningful overhead in the disabled case beyond that check.

When hooks are enabled, the overhead should be comparable to PEP 523. In practice, for Dynamo, the dominant cost is in the transformation itself rather than the hook dispatch.

What would the free_frame() and new_frame_with_code() functions you propose do, exactly? Bear in mind that frames are not heap objects, they are part of a stack.

The free_frame() / new_frame_with_code() in the earlier sketch were meant as a conceptual illustration of how the frame hook operates, and do not necessarily need to be exposed as part of the API.

In the current prototype, we do not treat frames as heap objects or expose explicit allocation/deallocation. Instead, when a hook returns a different code object, we construct a new _PyInterpreterFrame via the interpreter’s existing mechanisms (_PyFrame_PushUnchecked) and execute that frame.

So rather than mutating or freeing the current frame in place, we:

  • leave the original frame intact during hook processing,
  • transition execution to a new frame when needed, and
  • rely on the interpreter’s existing mechanisms (_PyEval_FrameClearAndPop) for frame lifetime and stack management.

A more realistic (though still simplified) sketch of the control flow, closer to the current prototype, would look like:

def _PyEval_FrameHook(tstate, frame, throw_flag):
    shadow = frame

    for hook in tstate.frame_hooks:
        if hook.disabled:
            continue

        code = hook(shadow)
        old_code = shadow.f_code

        if code is not None and code != old_code:
            if frame != shadow:
                _PyEval_FrameClearAndPop(tstate, shadow)  # or free_frame
            # Conceptually create a new frame with the modified code object
            # Python 3.10 had a `PyFrame_New`
            shadow = new_frame_with_code(tstate, frame, code)

    # Evaluate either the original or transformed frame
    if shadow is frame:
        return eval_frame(tstate, frame, throw_flag)

    result = eval_frame(tstate, shadow, throw_flag)
    _PyEval_FrameClearAndPop(tstate, frame)
    return result

You say that you need to:

  • inspect a frame before execution, and
  • replace its code object with a transformed version

which seems reasonable, but the API you ask for is a general replacement for the interpreter:
_PyEval_FrameHook.

What are your requirements?
Please be concise and precise, if possible.

Would this work?

Add a _PyEval_ReplaceCode(int depth, PyCodeObject *code, int instr_offset) function
which would replace the code object of the frame at the given depth.
The _PyEval_ReplaceCode function would only work if called from a sys.monitoring callback, as we already expect those callbacks to alter the VM state.

You could then register a PY_START callback to then manage which functions you trace, replace or ignore.

BTW, the link TorchDynamo Debugging Tools doesn’t allow access.

BTW, the link TorchDynamo Debugging Tools doesn’t allow access.

Let me know if this one works for you: TorchDynamo Debugging Tools

Would this work?

Add a _PyEval_ReplaceCode(int depth, PyCodeObject *code, int instr_offset) function
which would replace the code object of the frame at the given depth.
The _PyEval_ReplaceCode function would only work if called from a sys.monitoring callback, as we already expect those callbacks to alter the VM state.

You could then register a PY_START callback to then manage which functions you trace, replace or ignore.

Let me check this on a toy project and get back to you. What would be depth in this case?

Other than PyTorch Dynamo, who else would benefit from this? Are there other projects you have talked to where it would make having the Python core team maintain this worth it?

It at least loaded for me.

My guess is Mark meant “depth” as in “stack frame depth”, like what you would pass into sys._getframe().

2 Likes

Can’t say this for sure but I am not aware of any other project using the eval frame API. The one that I knew was PyDev.Debugger but it migrated to use sys.monitoring some time ago.

Meta’s CinderX uses PEP 523 for their JIT