Machine-readable Specification for Deprecated and Removed APIs of CPython

Overview

I propose creating a standardized, machine-readable specification for managing deprecated and removed CPython APIs. This specification aims to improve tooling support, such as autofixers, by providing structured metadata.

Motivation

Currently, the lifecycle of CPython APIs, especially deprecations and removals, is documented in a human-readable format. This approach makes automated tooling challenging to implement. A structured, machine-readable approach will streamline tooling and automation around API usage.
(Well, I got a motivation from the language summit talk by @itamaro )

Proposal

Metadata Formats

Metadata will be maintained in two forms:

1. Source TSV File (deprecated_and_removed.tsv )

Stored at the root of the CPython repository, structured as follows:

  • package : Name of the module/package (e.g., ā€œsysā€, ā€œasyncioā€, ā€œc-apiā€)
  • api : Name of the API being deprecated or removed
  • api_type : Type of API (function , method , class , module , or c-api )
  • status : Status of API (deprecated or removed )
  • deprecated_at : Version when API was deprecated (nullable)
  • removed_at : Version when API was removed (nullable)
  • replacement : Recommended replacement API (optional)

Example TSV:

package	api	api_type	status	deprecated_at	removed_at	replacement
sys	getdefaultencoding	function	deprecated	3.12.0		
asyncio	Task.all_tasks	method	removed	3.8.0	3.10.0	asyncio.all_tasks
c-api	PyEval_CallObject	c-api	removed	3.9.0	3.12.0	PyObject_Call

2. Generated JSON File (deprecated_and_removed.json )

Automatically generated from the TSV file and made available via docs.python.org. Example JSON:

{
  "_version": "3.14.1",
  "_generated_at": "2025-05-15T00:00:00Z",
  "apis": [
    {
      "package": "sys",
      "api": "getdefaultencoding",
      "api_type": "function",
      "status": "deprecated",
      "deprecated_at": "3.12.0",
      "removed_at": null,
      "replacement": null
    },
    {
      "package": "asyncio",
      "api": "Task.all_tasks",
      "api_type": "method",
      "status": "removed",
      "deprecated_at": "3.8.0",
      "removed_at": "3.10.0",
      "replacement": "asyncio.all_tasks"
    },
    {
      "package": "c-api",
      "api": "PyEval_CallObject",
      "api_type": "c-api",
      "status": "removed",
      "deprecated_at": "3.9.0",
      "removed_at": "3.12.0",
      "replacement": "PyObject_Call"
    }
  ]
}

Generation and Distribution

  • A Python script (generate_api_metadata.py ) will read the TSV file and produce the JSON file.
  • The generated JSON file will be hosted as a static resource on docs.python.org .
  • JSON schema validation ensures correctness.

Integration with Sphinx Documentation

To include the JSON in Sphinx documentation:

Place the generated JSON file in the _static directory, and update the conf.py file:

The file becomes available at a stable URL like:

https://docs.python.org/3.14/_static/deprecated_and_removed.json

Benefits

  • Enables automation tools to easily detect deprecated or removed APIs.
  • Provides clarity and transparency about CPython API lifecycles.

Downside

  • People should write duplicated information for the TSV file and docs.python.org RST file, but I believe that we can make consistency if we make an effort via tooling or something else.

Next Steps

  • Feedback from the community regarding the approach and metadata structure.
  • Write PEP if the idea looks good and implement a prototype tooling script.
  • Integrate with the official Python documentation infrastructure.

Thank you for your feedback and ideas!

I would also like to mention people who may be interested in this idea. @vstinner and @hugovk

16 Likes

I suggest:

  • documenting the TSV format as internal & subject to change, so it’s possible to extend/replace it in the future
  • documenting that we can add fields & enum values to the JSON schema in the future
  • putting the TSV in Misc/
3 Likes

The idea of making the API deprecation/removal schedule machine-readable is great, but I have concerns about the choice of TSV as the source file format:

  1. Tabs are harder to type in many IDEs.
  2. TSV files don’t tend to be very human-readable because columns of different rows often appear misaligned when there are items with widths of different multiples of the tab width. And tab widths also vary with different viewer settings.
  3. Since none of the columns are expected to have spaces in the values, why not allow values to be separated by any number of spaces instead, so that the file can be easily edited with any IDE, be aligned properly and look consistent in any viewer? This format (known as SSV) is supported by csv.reader with the delimiter=' ', skipinitialspace=True options.
  4. JSON, when properly indented, is still fairly human-readable and editable. So why not maintain just a single JSON file instead?
  5. TOML may be a good machine-readable format to consider too, while being both arguably more human-editable and less prone to becoming human-unreadable than JSON.
4 Likes

I do recommend an internal ā€œsourceā€ file (whose exact format doesn’t really matter, as we can change it at any time) and a separate ā€œpublishedā€ file. If we ever add some info that can’t fit in the ā€œpublishedā€ schema, we can introduce a new file, but keep generating the old one with limited information.

5 Likes

I think something like this would be useful.

A URL field would be useful, often the replacement isn’t a single function or one-to-one and more guidance is needed on how to update.

There are also some deprecations which are more complex, like a parameter, or something used only in a certain way. Can we include those?

cc Pylint maintainer @Pierre-Sassoulas, who has expressed interest in some sort of deprecations API, and also Ruff maintainer @zanie.

See also this earlier proposal from @CAM-Gerlach, but the current proposal sounds like less work and easier to get done, and then iterated on.

5 Likes

The idea of making the API deprecation/removal schedule machine-readable is great, but I have concerns about the choice of TSV as the source file format:

I am open to using TOML as the source format. It might be easier to manage. I am not sure how comfortable core developers would be with it, but I will explore a TOML-based format that can be more flexible and extensible.

I do recommend an internal ā€œsourceā€ file

I agree with having a separate internal source file and a different file for publishing. That separation makes a lot of sense.

A URL field would be useful, often the replacement isn’t a single function or one-to-one and more guidance is needed on how to update.

If the URL is meant for a migration guide, I think migration_url is a good field name. I originally considered issue_url as well but removed it, thinking it might be too much for core developers. If needed, it can always be added back.

There are also some deprecations which are more complex, like a parameter, or something used only in a certain way. Can we include those?

There are also more complex cases, such as deprecations that involve specific parameters or context-dependent usage. We probably cannot cover every case from the beginning, but the format should allow for future expansion.

Input from linter maintainers would be really helpful in shaping the format.

3 Likes

Something like this would be useful when we consider the enum or parameters?

{
  "apis": [
    {
      "package": "sys",
      "name": "getdefaultencoding",
      "api_type": "function",
      "status": "deprecated",
      "migration_url": "https://docs.python.org/blah/blah", 
      "deprecated_at": "3.12",
      "removed_at": null,
      "replacement": null,
      "details": null
    },
    {
      "package": "<:package_name>",
      "name": "<:mehtod or function>",
      "api_type": "parameter",
      "status": "deprecated",
      "migration_url": "https://docs.python.org/blah/blah",
      "deprecated_at": "3.14",
      "details": {
        "parameters":  ["<:name of param>"]
      }
    },
    {
      "package": "<:package_name>",
      "name": "<:enum name>",
      "api_type": "enum_member",
      "status": "deprecated",
      "migration_url": "https://docs.python.org/blah/blah",
      "deprecated_at": "3.13",
      "details": {
        "enum_members": ["<:enum member name>"]
      }
    }
    }
  ]
}

And the toml file will be something like this.

_version = "3.14.1"

[[api]]
package = "sys"
name = "getdefaultencoding"
api_type = "function"
status = "deprecated"
migration_url = "https://docs.python.org/blah/blah"
deprecated_at = "3.12"

[[api]]
package = "email.utils"
name = "localtime"
api_type = "parameter"
status = "deprecated"
migration_url = "https://docs.python.org/blah/blah"
deprecated_at = "3.12"
removed_at = "3.14"
details = { parameter_names = ["isdst"] }

[[api]]
package = "xxx"
name = "yyyy"
api_type = "enum_member"
migration_url = "https://docs.python.org/blah/blah"
status = "deprecated"
deprecated_at = "3.13"
details = { enum_members = "FOO_BAR" }
1 Like

Speaking as matplotlib core dev: Such infrastructure and tooling would also be helpful for packages in the ecosystem. Itā€˜s good to start and try this out in CPython. But maybe keep the idea of broader application in the back of your head.

6 Likes

Thank you for pinging me Hugo.

There was a lot of nuance in the first thread that are not present here in the status (with only deprecated / removed). Soft deprecation was added in PEP 387 PEP 387: Add Soft Deprecation section (#3182) Ā· python/peps@57b1d94 Ā· GitHub. Some talked about ā€œdiscouraged usageā€ (something ā€œbetterā€ – according to some – exists but the old thing is not going away, like for getopt), or ā€œobsoleteā€ (no longer recommended?).

Two deprecation warnings exists, but I’m not sure that it’s equivalent to soft/hard deprecation:

It’s probably better to at least go for ā€œsoft deprecationā€ / ā€œhard deprecationā€ / ā€œremovedā€ in the status, maybe more options.

Also ā€œreplacementā€, should be ā€œreplacementsā€ and allow for a list of values (Formalize the concept of "soft deprecation" (don't schedule removal) in PEP 387 "Backwards Compatibility Policy" - #82 by brettcannon) I like the idea of having an URL to explain how to migrate when ā€œreplacementsā€ is not easy to fill.

3 Likes

Would there be a way to represent if a parameter to a function is deprecated (vs the whole function being deprecated?)

Chiming in here after a great followup talk IRL with @corona10 about how we can combine the strengths of our respective proposals! My original detailed proposal and working prototype used and built upon the existing deprecated-removed directive in the docs as the canonical source of deprecation information, using it to output both machine-readable info in a CSV and JSON (as proposed here), and human-readable deprecation information inline and in summary list form (What’s New, Deprecations/Removals page, etc), and that supports deprecation specificity down to individual parameter names, types and values.

After talking through it with him, I’ve come around to Donghee’s idea of having the single source of deprecation information be the TOML file, which requires more migration work up front and doesn’t extend as directly to extracting other machine-readible info from the docs but is easier to maintain in the long term, avoids too-tight coupling of the source information with the physical structure and syntax of the docs, avoids the complexities involved in handling removed documentation and makes for a simpler MVP.

Under the hood, this can leverage a greatly simplified version of what I’ve already already spec’ed out and partially prototyped above, to read in the deprecation/removal information (just from the TOML rather than the more complex task of extracting it from the docs themselves) and output it in human-readible form in the docs and machine-readable as JSON and CSV, as part of our existing docs build rather than requiring creating and maintaining new bespoke tooling and a CI job.

The originally-proposed complex deprecation directives in the docs would be replaced by just .. deprecation:: <id> (modulo bikesheding the name) where appropriate, where <id> references the deprecation ID in the TOML (or could be omitted for deprecations of an entire function/method/class/module/etc., etc.), pulling the deprecation information and message from there. Meanwhile, the .. deprecations and .. removals directives would work as previously proposed and prototyped, outputting a (optionally filtered, e.g. by version deprecated/removed) list/table of deprecations and removals for use in What’s New in Python, the dedicated Deprecations/Removals page some have requested, and elsewhere–but using a .

As a Sphinx extension, it would be straightforward to package into a form other projects (like @timhoffm 's Matplotlib) could pick up and start using easily, and we can keep the core parsing and I/O logic in a separate module from the Sphinx-specific bits so down the road it could be adapted to other documentation systems and contexts if desired.

I’m working on a new, greatly cut-down and simplified draft of the input and output schema, as well as a new streamlined working prototype eliminating the complex directives and data extraction and just reading in from a TOML instead. I’ll post both here as soon as they are ready to gather more feedback and iterate further.

1 Like

Can the information be extracted directly from source code or documentation? We have annotations for deprecations in both Python modules and C header files, although some extra work would be needed for removed APIs.

The proposed new TSV file would be a second or third place to document deprecations, which increases the risk that the various sources get out of sync.

For starters, here’s a draft of the input schema, simplified from the previous worked-out design and retaining the various bits that address the requests here and other likely needs.

(YAML is more convenient to express the schema, though the actual input file is TOML):

Note: str* values can be a list[str] can have

_schema_version: str ("0.1")

<domain, e.g. py, c, envvar, none>:
  module: str or list[str], required for py domain ("email.utils")
  name: str or list[str], optional if domain none or module specified ("SomeEnum.member")
  type: str enum, required if name specified ("function", "classmethod", "enum.Enum")
  arg: optional
    name: str or list[str], required if arg defined ("isdst")
    kind: str, optional ("positional") # Keyword, optional, default
    type: str or list[str], required if value defined ("bool", "collections.abc.Sequence")
    value: str or list[str], optional ("True")
  # Short free-text description of the specific behavior deprecated; false if the entire specified item is deprecated
  behavior: str, list[enumerated string] or false, required ("when using named placeholders", ["set", "del"])
  id: str, required if behavior ("named-placeholders")
  version:
    deprecated: str, required ("3.10")
    warned: str or false, optional, default same as version_deprecated ("3.12")
    # Version or "notplanned" (removal delayed indefinitly), "notscheduled" (removal planned, but no version scheduled yet), "tbd" (decision yet to be made)
    removed: str, required ("3.14")
  # Deprecation/removal category: correctness, security, safety, usability, obsolete, efficiency, alias, defunct, superseded, other
  reason: str enum, required ("security")
   # Short description of deprecation reason
  rationale: str, required if reason is other ("it is vulnerable to SQL injection attacks")
  description: str, optional ("...")  # Extended description in reStructuredText
  replace: optional
    name: str or array of str, optional ("dict", ["subprocess.run", "subprocess.Popen"])  # Name of the object/argument/type/value/module (per original `type`) to replace this with, if available
    description: str, optional ("")  # Prose description of replacement
    version: str, required if not thirdparty # Version replacement is available
    dropin: bool, optional default false  # True if replacement is fully dropin
    thirdparty: bool, optional default false  # True if replacement is third-party
  raises: str or false, required ("FutureWarning") # Warning/error currently raised
  moreinfo: str, optional  # Doc xref or URL for more information

Here’s a couple examples:

Sequence instead of dict for parameters in sqlite3.Cursor.execute* when using named placeholders

_schema_version = "0.1"

[[py]]
module = "sqlite3"
name = ["Cursor.execute", "Cursor.executemany"]
type = "method"
arg = {name="parameters", type="~collections.abc.Sequence"}
behavior = "when using named placeholders"
id = "named-placeholders"
version = {deprecated="3.12", removed="3.14"}
reason = "usability"
replace = {name="dict"}

Which you’d reference in the docs as

.. deprecation:: named-placeholders

and renders as:

Deprecated in version 3.12 (usability), to be removed in version 3.14: Passing an argument of type Sequence to the parameters parameter of the Cursor.execute() Cursor.executemany() methods when using named placeholders. Pass an argument of type dict instead.

or for example

logging.warn alias

[[py]]
module = "logging"
name = "Logger.warning"
type = "method"
behavior = false
version = {deprecated="3.4", removed="notscheduled"}
reason = "alias"
replace = {name="~logging.Logger.warning", version="2.3", dropin=True}
raises = "DeprecationWarning"

Which renders as

Deprecated in version 3.4 (alias), removal not yet scheduled: The warn() method. Use the warning() method instead (added in version 2.3), which is a drop-in replacement.

2 Likes

I shared that concern and that was my original detailed proposal from a while back that I described above and in the post Hugo linked, to keep the information in the deprecated-removed directives, have them register their data for output to a JSON and CSV on docs build, and extend the existing custom directive as needed to include additional desired metadata.

However, as described in recent posts it no longer applies to the current state of the proposal, which is to have a TOML file (not TSV) be the single source for deprecation information, and that information output in human-readable form in the docs and in machine-readable form as a JSON and CSV.

Keeping and adding to the information in the deprecation directives instead is still an option we could implement and I don’t oppose it, but it seemed the general consensus was against it for the reasons I described above, e.g. needing to handle removed docs and other edge cases adding complexity, making updating deprecation info/adding new metadata more tedious, and coupling it too tightly with the RST/Sphinx format.

2 Likes

One concern with generating docs from a few templated fields, will we lose the ability to add any free-form text in the deprecations?

For example, the asyncio.get_event_loop() removal notes include three common use patterns with before and after code snippets. This is good.

We do need to improve our ā€œhow to upgradeā€ instructions, especially for non-trivial replacements lacking a drop-in replacement.

4 Likes

This is my concern with tying this too closely to documentation. The format also must clearly define what is in and out of scope (FutureWarning? Behavioural changes? Tweaks to the meaning of a parameter?), as the easy cases of wholesale removals are straightforward, but there is a long tail of possible edge-cases.

A

3 Likes

Given the new file, would we be able to add corresponding test cases that ensure the documented hard deprecations are actually reported correctly at runtime (and/or compile time for C APIs)?

A static scan that picks up when the source-of-truth file is out of sync with the source code would also help ensure it actually is the source of truth for this information.

2 Likes

Not at all, I just realized falling asleep last night that I forgot to add a separate description field in the TOML schema since that was just the body of the original RST directive that exists now. Added now; with ā€˜ā€™ā€™ strings in TOML it can be arbitrary free form RST just like docstrings (which can be rendered in the docs and stripped in the machine-readable output). Additionally, the moreinfo link/xref can point to more detailed migration guidance.

The combined proposal incorporating both the ideas here and my previous work handle all those cases explicitly, and any deprecation and removal/breaking change–including in the C API, env vars/entry points, and those not tied to any specific module/API at all (e.g. interpreter behavior).

Yup, that was exactly the plan at least in my original design and just as possible here, at least for the great majority of deprecations. The raises argument specifies the warning or error that invoking the deprecated API, passing the parameter/type/value should raise, or triggering the deprecated behavior should raise, which is easily checkable at least in the former case, and would be possible in the latter case and likely most behavior cases too if we add an example or testcase field. And even without that, we could also check (as the deprecation decorator at runtime currently does) that APIs and parameters that are supposed to be removed in fact don’t exist if the version is greater or equal to beta 1 of the removal version.

@CAM-Gerlach
TOML spec looks fine to me,
except when we have to care about multiple params deprecation and removal.
It would be great to have an example of a multiple parameters deprecation and removal cases that has a different lifecycle under the same method.

Anyway, let’s write PEP during the sprint.

1 Like