Hello everyone,
I’ve been working on a third-party project to systematically document CPython’s reference counting semantics—including internal APIs—through automated analysis. So far, I’ve collected 1,534 entries covering a variety of functions. The analysis is largely automated, with an estimated accuracy of around 90%, though full manual verification is still ongoing.
Your feedback matters — whether it’s a
、
, a comment, or a suggestion, any form of engagement from the community would mean a lot to me and help improve this work.
Current Design & Format
The data is structured in JSON to facilitate processing, integration, and further tooling. Each entry includes the function name and its reference semantics (e.g., “return new reference,” “stealing reference,” “return borrowed reference,” etc.).
Example structure:
[
{
"function": "mocked_funcA",
"semantics": [
{ "semantic": "return new reference" },
{ "semantic": "stealing reference", "stealing param": 0 },
{ "semantic": "stealing reference", "stealing param": 1 },
{ "semantic": "stealing reference", "stealing param": 2 }
]
},
{
"function": "mocked_funcB",
"semantics": [
{ "semantic": "return borrowed reference" }
]
},
{
"function": "mocked_funcC",
"semantics": [
{ "semantic": "return immortal reference" }
]
}
]
Sample from the current dataset:
{
"name": "_PyDict_GetItemRef_KnownHash_LockHeld",
"semantics": [
{
"semantic": "return a new reference via an output pointer parameter",
"new ptr param": 3
}
]
},
Purpose & Hope for Collaboration
This dataset aims to serve as a machine-readable reference for developers working with CPython’s C API, aiding in debugging, static analysis, and tooling development.
I would love for the community to:
- Review and discuss the approach and structure.
- Help validate entries, especially for edge cases or internal APIs.
- Consider whether something like this could be useful as a supplemental resource or possibly integrated into CPython’s documentation ecosystem in the future.
The full JSON file are available here:
CPython_PyAPI_FUNC_RF_Semantics
Looking forward to your thoughts, feedback, and hopefully a lively discussion!