Python 3.11 includes an excellent change that splits CPython’s frame objects into two pieces: the full frame object (a refcounted Python object) used in all previous versions and a new internal C struct that relies entirely on external lifecycle management. The pay-off for the change is that most of the time the interpreter can avoid any refcounting overhead when managing the Python frame stack - it instead keeps track of that memory in other ways, and only hands responsibility off to a full frame object when relying on the frame specific memory management becomes impractical (e.g. when it needs to invoke a tracing function).
However, this split has introduced quite a bit of referential ambiguity into the code base, as I discovered last year when I attempted to sync the PEP 558 development branch with the main branch in the CPython repo and couldn’t readily figure out when the code was working with full frame objects and when it was working with the new frame data structs. When frame->f_frame->f_code
is a valid thing to write, it indicates there is a problem with the local variable and struct field naming conventions being used.
Rather than simply pushing through with that PEP 558 branch merge, I instead filed Issue 44800: Code readability: rename InterpreterFrame to `_Py_framedata` - Python tracker and the associated PR at WIP bpo-44800: Rename `_PyInterpreterFrame` to `_Py_framedata` by ncoghlan · Pull Request #27525 · python/cpython · GitHub to suggest that we reconsider the names in use in order to make the code easier to work with. (Note: the Python 3.11 beta branch date is our last opportunity to refactor this code for readability, as any proposal after that date will be blocked by the desire to avoid complicating backports to the 3.11 maintenance branch)
My first renaming idea was worse than the status quo (see the bpo ticket and the early PR comments for the details), so @markshannon quite reasonably rejected it outright. However, one of his objections to that initial proposal (“From the point of view of Python code, the frame object is the frame, not just a view of it.”) further convinced me that the existing names aren’t right, as calling the underlying C struct an “interpreter frame” also suggests that “interpreter frames” and “Python frames” are different things, rather than one simply being a data storage struct that avoids the overhead of allocating a full Python object.
The PR migrates the Python frame stack manipulation code to the following conventions (quoted from a block comment in pycore_frame.h
):
/* Starting in CPython 3.11, CPython separates the frame state between the
* full frame objects exposed by the Python and C runtime state introspection
* APIs, and internal lighter weight frame data structs, which are simple C
* structures owned by either the interpreter eval loop (while executing
* ordinary functions), by a generator or coroutine object (for frames that
* are able to be suspended), or by their corresponding full frame object (if
* a state instrospection API has been invoked and the full frame object has
* taken responsibility for the lifecycle of the frame data storage).
*
* This split storage eliminates a lot of allocation and deallocation of full
* Python objects during code execution, providing a significant speed gain
* over the previous approach of using full Python objects for both
* introspection and code execution.
*
* Field naming conventions:
*
* * full frame object fields have an "f_*" prefix
* * frame data struct fields have no prefix
*
* Local variable and function argument naming conventions:
*
* * "frame", "f", and "frameobj" are used for full frame objects
* * Exception: "current_frame" in the thread state cframe struct is a frame data struct
* * "fdata" is used for frame data structs
*
* Function/macro naming conventions:
*
* * "PyFrame_*" functions accept a full frame object
* * "_PyFrame_*" functions accept either a full frame object or a frame
* data struct. Check the specific function signatures for details.
* * Other public C API functions that relate to frames only accept full
* frame objects
* * Other private C API functions that relate to frames may accept either a
* full frame object or a frame data struct. Check the specific function
* signatures for details
*
* Function return types:
* * Public C API functions will only ever return full frame objects
* * Private C API functions with an underscore prefix may return frame
* data structs instead
*/
Relative to the status quo, adopting that convention covers the following changes:
-
_PyInterpreterFrame
renamed to_Py_framedata
- dropping the
f_*
prefix fromf_code
,f_builtins
,f_globals
,f_locals
,f_func
,f_state
, andf_lasti
(the current code uses a convention where it keeps thef_*
prefix if the field originally came from the full frame object, omitting it if the field is new or came from a ceval local variable) -
_Py_framedata *
frame
andf
local variables and function parameters renamed tofdata
- generator/coroutine/aync generator
*_iframe
fields renamed to*_fdata
(and type fixed to bevoid *
) - full frame object’s
f_frame_data
field renamed tof_owned_fdata
(and type fixed to be_Py_framedata *
)
Earlier iterations of the PR also tried to disambiguate all the various _PyFrame
internal APIs based on whether they accepted full frame objects or not. I dropped that from the latest iteration of the PR as changing the local variable names looks to be enough to make code snippets unambiguous, even when read in isolation in a diff rather than as part of the full file.