Python 3.11 includes an excellent change that splits CPython’s frame objects into two pieces: the full frame object (a refcounted Python object) used in all previous versions and a new internal C struct that relies entirely on external lifecycle management. The pay-off for the change is that most of the time the interpreter can avoid any refcounting overhead when managing the Python frame stack - it instead keeps track of that memory in other ways, and only hands responsibility off to a full frame object when relying on the frame specific memory management becomes impractical (e.g. when it needs to invoke a tracing function).
However, this split has introduced quite a bit of referential ambiguity into the code base, as I discovered last year when I attempted to sync the PEP 558 development branch with the main branch in the CPython repo and couldn’t readily figure out when the code was working with full frame objects and when it was working with the new frame data structs. When
frame->f_frame->f_code is a valid thing to write, it indicates there is a problem with the local variable and struct field naming conventions being used.
Rather than simply pushing through with that PEP 558 branch merge, I instead filed Issue 44800: Code readability: rename InterpreterFrame to `_Py_framedata` - Python tracker and the associated PR at WIP bpo-44800: Rename `_PyInterpreterFrame` to `_Py_framedata` by ncoghlan · Pull Request #27525 · python/cpython · GitHub to suggest that we reconsider the names in use in order to make the code easier to work with. (Note: the Python 3.11 beta branch date is our last opportunity to refactor this code for readability, as any proposal after that date will be blocked by the desire to avoid complicating backports to the 3.11 maintenance branch)
My first renaming idea was worse than the status quo (see the bpo ticket and the early PR comments for the details), so @markshannon quite reasonably rejected it outright. However, one of his objections to that initial proposal (“From the point of view of Python code, the frame object is the frame, not just a view of it.”) further convinced me that the existing names aren’t right, as calling the underlying C struct an “interpreter frame” also suggests that “interpreter frames” and “Python frames” are different things, rather than one simply being a data storage struct that avoids the overhead of allocating a full Python object.
The PR migrates the Python frame stack manipulation code to the following conventions (quoted from a block comment in
/* Starting in CPython 3.11, CPython separates the frame state between the * full frame objects exposed by the Python and C runtime state introspection * APIs, and internal lighter weight frame data structs, which are simple C * structures owned by either the interpreter eval loop (while executing * ordinary functions), by a generator or coroutine object (for frames that * are able to be suspended), or by their corresponding full frame object (if * a state instrospection API has been invoked and the full frame object has * taken responsibility for the lifecycle of the frame data storage). * * This split storage eliminates a lot of allocation and deallocation of full * Python objects during code execution, providing a significant speed gain * over the previous approach of using full Python objects for both * introspection and code execution. * * Field naming conventions: * * * full frame object fields have an "f_*" prefix * * frame data struct fields have no prefix * * Local variable and function argument naming conventions: * * * "frame", "f", and "frameobj" are used for full frame objects * * Exception: "current_frame" in the thread state cframe struct is a frame data struct * * "fdata" is used for frame data structs * * Function/macro naming conventions: * * * "PyFrame_*" functions accept a full frame object * * "_PyFrame_*" functions accept either a full frame object or a frame * data struct. Check the specific function signatures for details. * * Other public C API functions that relate to frames only accept full * frame objects * * Other private C API functions that relate to frames may accept either a * full frame object or a frame data struct. Check the specific function * signatures for details * * Function return types: * * Public C API functions will only ever return full frame objects * * Private C API functions with an underscore prefix may return frame * data structs instead */
Relative to the status quo, adopting that convention covers the following changes:
- dropping the
f_lasti(the current code uses a convention where it keeps the
f_*prefix if the field originally came from the full frame object, omitting it if the field is new or came from a ceval local variable)
flocal variables and function parameters renamed to
- generator/coroutine/aync generator
*_iframefields renamed to
*_fdata(and type fixed to be
- full frame object’s
f_frame_datafield renamed to
f_owned_fdata(and type fixed to be
Earlier iterations of the PR also tried to disambiguate all the various
_PyFrame internal APIs based on whether they accepted full frame objects or not. I dropped that from the latest iteration of the PR as changing the local variable names looks to be enough to make code snippets unambiguous, even when read in isolation in a diff rather than as part of the full file.