Proposal: rename _PyInterpreterFrame struct as _Py_framedata

Python 3.11 includes an excellent change that splits CPython’s frame objects into two pieces: the full frame object (a refcounted Python object) used in all previous versions and a new internal C struct that relies entirely on external lifecycle management. The pay-off for the change is that most of the time the interpreter can avoid any refcounting overhead when managing the Python frame stack - it instead keeps track of that memory in other ways, and only hands responsibility off to a full frame object when relying on the frame specific memory management becomes impractical (e.g. when it needs to invoke a tracing function).

However, this split has introduced quite a bit of referential ambiguity into the code base, as I discovered last year when I attempted to sync the PEP 558 development branch with the main branch in the CPython repo and couldn’t readily figure out when the code was working with full frame objects and when it was working with the new frame data structs. When frame->f_frame->f_code is a valid thing to write, it indicates there is a problem with the local variable and struct field naming conventions being used.

Rather than simply pushing through with that PEP 558 branch merge, I instead filed Issue 44800: Code readability: rename InterpreterFrame to `_Py_framedata` - Python tracker and the associated PR at WIP bpo-44800: Rename `_PyInterpreterFrame` to `_Py_framedata` by ncoghlan · Pull Request #27525 · python/cpython · GitHub to suggest that we reconsider the names in use in order to make the code easier to work with. (Note: the Python 3.11 beta branch date is our last opportunity to refactor this code for readability, as any proposal after that date will be blocked by the desire to avoid complicating backports to the 3.11 maintenance branch)

My first renaming idea was worse than the status quo (see the bpo ticket and the early PR comments for the details), so @markshannon quite reasonably rejected it outright. However, one of his objections to that initial proposal (“From the point of view of Python code, the frame object is the frame, not just a view of it.”) further convinced me that the existing names aren’t right, as calling the underlying C struct an “interpreter frame” also suggests that “interpreter frames” and “Python frames” are different things, rather than one simply being a data storage struct that avoids the overhead of allocating a full Python object.

The PR migrates the Python frame stack manipulation code to the following conventions (quoted from a block comment in pycore_frame.h):

/* Starting in CPython 3.11, CPython separates the frame state between the
 * full frame objects exposed by the Python and C runtime state introspection
 * APIs, and internal lighter weight frame data structs, which are simple C
 * structures owned by either the interpreter eval loop (while executing
 * ordinary functions), by a generator or coroutine object (for frames that
 * are able to be suspended), or by their corresponding full frame object (if
 * a state instrospection API has been invoked and the full frame object has
 * taken responsibility for the lifecycle of the frame data storage).
 *
 * This split storage eliminates a lot of allocation and deallocation of full
 * Python objects during code execution, providing a significant speed gain
 * over the previous approach of using full Python objects for both
 * introspection and code execution.
 *
 * Field naming conventions:
 *
 *   * full frame object fields have an "f_*" prefix
 *   * frame data struct fields have no prefix
 *
 * Local variable and function argument naming conventions:
 *
 *   * "frame", "f", and "frameobj" are used for full frame objects
 *     * Exception: "current_frame" in the thread state cframe struct is a frame data struct
 *   * "fdata" is used for frame data structs
 *
 * Function/macro naming conventions:
 *
 *   * "PyFrame_*" functions accept a full frame object
 *   * "_PyFrame_*" functions accept either a full frame object or a frame
 *     data struct. Check the specific function signatures for details.
 *   * Other public C API functions that relate to frames only accept full
 *     frame objects
 *   * Other private C API functions that relate to frames may accept either a
 *     full frame object or a frame data struct. Check the specific function
 *     signatures for details
 * 
 * Function return types:
 *   * Public C API functions will only ever return full frame objects
 *   * Private C API functions with an underscore prefix may return frame
 *     data structs instead
 */

Relative to the status quo, adopting that convention covers the following changes:

  • _PyInterpreterFrame renamed to _Py_framedata
  • dropping the f_* prefix from f_code, f_builtins, f_globals, f_locals, f_func, f_state, and f_lasti (the current code uses a convention where it keeps the f_* prefix if the field originally came from the full frame object, omitting it if the field is new or came from a ceval local variable)
  • _Py_framedata * frame and f local variables and function parameters renamed to fdata
  • generator/coroutine/aync generator *_iframe fields renamed to *_fdata (and type fixed to be void *)
  • full frame object’s f_frame_data field renamed to f_owned_fdata (and type fixed to be _Py_framedata *)

Earlier iterations of the PR also tried to disambiguate all the various _PyFrame internal APIs based on whether they accepted full frame objects or not. I dropped that from the latest iteration of the PR as changing the local variable names looks to be enough to make code snippets unambiguous, even when read in isolation in a diff rather than as part of the full file.

2 Likes

Note: this follow up post is just some additional background info on why I find the status quo in the code base problematic, even though the current state was reached for sound reasons (as discussed in Core developer rules of thumb (incl. "What needs a PEP"?) the PRs introducing the frame data storage split intentionally minimised any name changes so that reviewers could focus on the functional changes. Hence struct fields following the naming conventions of their origins, local variables and function parameters keeping their old names even as their data types changed, etc).

When developing large C code bases, there are some useful rules of thumb that can be followed to help readers correctly orient themselves in the code base, even in the presence of 100+ line functions where variable declarations may be a long way from the code that uses those variables.

The relevant ones for this particular discussion that CPython has traditionally followed are:

  • use consistent naming conventions for local variables of a particular type (so the use of the conventional name provides a strong hint as to the type being used)
  • use dedicated prefixes for fields of particular structs (so use of the struct can be readily identified in the code even when a non-conventional variable name is used)
  • use consistent and distinct prefixes for functions and macros that are designed to work with particular data structures (the “C structs-as-objects” coding style)

Taking the “low impact” approach to the introduction of the separate frame data structs has resulted in a situation where all three of those techniques are no longer being applied:

  • frame and f may be either a full frame object or a frame data struct, it depends on whether the code is part of the optimised eval loop or not
  • some of the frame data struct fields still have the f_* prefix if they were originally defined on full frame objects
  • Many _PyFrame_ functions now actually work on the frame data structs rather than on full frame objects

The WIP PR attempts to at least restore the first two conventions where these structures are concerned. As noted in the initial post, previous iterations also changed the function names, but I decided in the latest iteration that there was enough ambiguity reduction without also making that change.

2 Likes

I particularly dislike _Py_framedata, “data” is one of those words that just makes names longer without adding meaning.
_PyInterpreterFrame is a data structure; thus it contains data.

1 Like

But as you yourself pointed out is problematic, the “interpreter frame” name gives the impression that interpreter frames are somehow distinct from full Python frame objects, when they’re really the same thing. It’s just that one of them is a passive C data struct with frame-related data in it, while the other is a full Python object with all the lifecycle management semantics that implies.

I’d actually be OK with using _Py_frame for the data type name - I agree the “data” suffix doesn’t add any real value there, and for quick orientation when reading code diffs, the most important bit of the PR is the frame vs fdata naming convention for variables and struct fields based on which type they’re pointing to.

1 Like

@markshannon I’m leaning towards scaling back the proposed change quite a bit from the current draft PR:

  1. Use _Py_frame as the revised low level struct name (eliminates the redundant data suffix)
  2. Leave the local variable and function parameter names alone (i.e. no fdata variables, stick with frame and f everywhere)
  3. Still drop the f_ prefix from the low level struct fields (as this becomes the main tool for quick orientation in code diffs)
  4. Keep the f_frame to f_fdata in the frame object struct (to eliminate the frame->f_frame oddity)
  5. Keep the f_frame_dataf_owned_fdata rename for the optional frame data ownership (to avoid ambiguity with the renamed f_fdata)
  6. Postpone the *_iframe*_fdata rename for generator objects (and friends). (If we do decide it makes sense to do that, it can be a separate PR, and I suspect an easier path would be to just redefine what the i stands for without renaming the field)
  7. Keep the “full frame object” vs “frame data struct” phrasing in the explanation of the difference between the two levels of data storage

I think that would still give us pretty good code orientation in diffs, as the object & struct usage seems to split pretty cleanly into two categories:

  • small functions where the declaration is visible anyway, so there’s no orientation problem in the first place
  • long functions with lots of field accesses, so restricting the f_ prefix to frame object fields allows anyone familiar with the convention to quickly tell whether a code snippet is working with the full frame object or directly with the frame data struct

With my current PR, even I now think there are too many no or low value changes to functions in the first category, so it obscures the potential benefits in the latter category.

1 Like

The scaled back PR has been posted: bpo-44800: rename _PyInterpreterFrame to _Py_frame by ncoghlan · Pull Request #31987 · python/cpython · GitHub

The updated PR also has a much cleaner commit history, as I started again from the main branch, and made each individual change (i.e. the type rename, and then each struct field rename) its own commit.

1 Like

I think the new PR gets the code to a pretty readable state:

  • the _Py_frame name helps make it clear that the new struct is an implementation detail of the existing Python frame concept, rather than introducing a new concept
  • _PyFrame functions take either PyFrameObject* or _Py_frame*, so the name still aligns with the type even when the function has shifted to working with the new struct
  • the removal of the f_ prefix from the _Py_frame fields that had it should be enough to solve the “quick orientation when resolving a merge conflict” problem that had me looking into this issue in the first place (my ultimate aim with this PR is to make it easier to sync the PEP 558 working branch with all the Python 3.11 performance improvements)

And for the local variable names, we can now say we tried renaming them, and decided the level of code churn was too high for the potential ambiguity reduction the change could have provided.

4 Likes

@vstinner pointed out on the PR that we really don’t want to inflict yet another interim API transition on the projects directly accessing these structs.

I realised there’s a macro hack we can use to keep code written for 3.11a6 compiling until a proper frame stack access API is available (presumably before 3.11b1).

The old field and type names aren’t used anywhere else, so the C preprocessor can be used to automatically translate them at compile time.

I wouldn’t want to permanently ship a hack like that, but it seems potentially reasonable as an interim trick until C API for the frame stack · Issue #309 · faster-cpython/ideas · GitHub is available.

@ncoghlan Can you show an example of how you would define and use such a macro?

It’s just as horrible as you were probably thinking (unilaterally aliasing the struct field name on the assumption it isn’t used anywhere else), and after sleeping on it, I decided I would be happier living with the 7 “misnamed” fields indefinitely than pushing to add something that messy to the C API headers.

Instead, I’m scaling back the proposal again to just:

  • rename the frame data struct to _Py_frame (but keep _PyInterpreterFrame as a compatibility alias to avoid breaking in-flight PRs and third party code trying to track the alpha releases)
  • adopt the convention that all future fields on that struct will omit the prefix
  • tidying up a bit of confusing code in typeobject.c that uses cframe to refer to the current frame, which used to be OK, but is dubious now that we have _PyCFrame as a concept

Leaving the struct field names alone and keeping the _PyInterpreterFrame alias around means that the disruption gets minimised, and the potential for future confusion gets contained, even if it won’t be as low as I had originally hoped to be able to get it.

Do you have a PR? It sounds like we should continue the discussion there. (FWIW I hadn’t thought about the macro at all, but I’’ take your word that it’s horrible. :slight_smile: To me, @markshannon is the decider on this, so it’s him you have to convince.

1 Like

Ah, right, I forgot to post the “just the struct name change” PR link: bpo-44800: unobtrusively rename _PyInterpreterFrame to _Py_frame by ncoghlan · Pull Request #32024 · python/cpython · GitHub

Ironically, @markshannon still doesn’t like the idea, when it was his critique of my first PR on the topic that convinced me that the current struct name is not as clear as it could be.

Oh, it’s hard to follow this discussion. It’s scattered in so many places:

Copy of my comment on Nick’s latest PR.

Honestly, I’m not convinced by the value of this large rename changes. I renamed _PyInterpreterFrame and _PyCFrame just before Python 3.11 alpha6 release which moves the PyFrameObject structure to the pycore_frame.h internal header file. The plan is now to updated 3rd party code projects to make them compatible with these changes (compatible with Python 3.11 alpha6): Issue 46836: [C API] Move PyFrameObject to the internal C API - Python tracker

If we change this internal C API once more time (Python 3.11 alpha7 or another release), it would require one more compatibility layer to the compatibility sandwich of projects like Cython, greenlet and gevent which currently require to access to these members until faster-cpython/ideas#309 is implemented.

IMO investing time on implementing faster-cpython/ideas#309 first would make more sense, since right now there are still too many 3rd party code which require to access PyFrameObject members.

Once 3rd party code will no longer consume this internal C API, we will have more freedom to change it without annoying users. As I wrote, core devs are directly impacted since these changes often break pyperformance: Issue 46836: [C API] Move PyFrameObject to the internal C API - Python tracker

and

I’m not saying that these changes should not be done. The current API can be enhanced as you do. The problem is really a practical problem impacting users right now.

and

PEP 558 is still a draft, no? Is it important to have an up to date implementation? There is also PEP 667. What’s the status of these PEPs?


What I did in Python 3.11 alpha6:

  • Move PyFrameObject to the internal C API to enforce usage of public C API, see bpo-46836 for the rationale
  • Fix the internal C API: add _Py prefix to names to avoid conflicts in 3rd party code including pycore_frame.h

My short term plan:

  • Update Cython, greenlet and gevent for Python 3.11: these projects became blocker issues for my Fedora team. We need to port these projects as soon as possible.

I don’t understand why Nick wants to hold my 3 PRs updating these projects. Python 3.11 alpha6 is already shipped. We can still change the internal C API later (see my previous comments), but right now, it’s really a blocker problem to not be able to use Cython. It’s a very common dependency.

I don’t understand why a draft PEP would prevent fixing 3rd party projects right now. I didn’t see any recent discussion about PEP 558 and PEP 667, I don’t expect them to be accepted soon, but maybe I missed something. None of these PEP was submitted to the SC: Issues · python/steering-council · GitHub


Compatibility code is a major pain in the ass. Please have a look at What’s New in Python 3.11: Python 3.11 already requires a lot of such compatibility spaghetti code. For example, PyThreadState_EnterTracing() requires 3 code paths:

  • Python <= 3.9
  • Python 3.10
  • Python >= 3.11

The pythoncapi_compat project makes this problem less painful for projects which can use it (some projects have license/copyright issues; I recently changed the project license to 0BSD). But this project is restricted to public C API. I propose to now focus on C API for the frame stack · Issue #309 · faster-cpython/ideas · GitHub : design public API to fit Cython, greenlet and gevent use cases. Once Python 3.11/3.12 will have public C API functions, pythoncapi_compat can provide a compatibility layer to older Python versions.

In short, I dislike the idea of adding anything (macros) to provide a backward compatibility for the internal C API. IMO it’s a bad solution, it will cause new issues that we didn’t have previously, and it will cause new issues in the future.


At least, it seems like the issues of using an internal C API in 3rd party code became better known, more people are aware of API issues and want to fix the problem. Now the question is more about what’s the best migration path to solve this problem. Also, practical issues (compatibility code, coordinate releases) are better known now.

By the way, about practical issues: the Cython change is arleady merged.

Since it became clear that any further non-functional changes to these struct definitions at this point would cause significant practical problems for at best arguable improvements in clarity, the final PR that will close bpo-44800 just adds documentation of the de facto naming conventions (and a bit of the history that drove them) to the internal C API header that defines the core frame structs: bpo-44800: Document internal frame naming conventions by ncoghlan · Pull Request #32281 · python/cpython · GitHub