Machine-readable Specification for Deprecated and Removed APIs of CPython

CAM-Gerlach · May 16, 2025, 9:21pm

Yeah if you mean deprecating multiple parameters in one deprecation, yup that was another bit like the other suggested here included in my original proposal that I elided in the initial draft here for simplicity. I added it back, which essentially just makes making module, name and arg accept list[str] to apply to multiple objects. For two distinct parameters of the same function deprecated/removed in separate deprecation on different timelines, those could just be separate deprecation entries.

Totally! As part of that I also hope to rewrite the working prototype to read from the TOML (at least a core subset of the schema), construct the deprecation message in the docs and output the JSON/CSV.

hugovk · May 17, 2025, 5:34pm

Does this even need a PEP?

corona10 · May 17, 2025, 6:08pm

It will impact to all core devs working process and kind of goverence. I thought “yes”

AA-Turner · May 17, 2025, 6:19pm

We wouldn’t do a PEP for a new documentation construct (e.g. the recent versionadded:: next), and this is a new/different way of recording deprecations, rather than a change to how decisions regarding deprecations are made. I’d tend to suggest that this doesn’t require a PEP, given this argument. An Informational PEP could still be written, though.

A

sirosen · May 17, 2025, 7:06pm

As precedent, I don’t think there is any PEP regarding intersphinx mapping data, which seems pretty similar in spirit.

It would be nice to have an informational PEP or doc page at some point which covers both of these resources, and any other machine consumable documentation data.

hugovk · May 18, 2025, 11:53am

Yeah, I’d prefer effort go into a proof-of-concept PR that we can see in practice, then refine, merge, and iterate.

Also not against an informational PEP later, but would prioritise living docs in the form of instructions in the devguide.

encukou · May 19, 2025, 7:36am

For C API: yes, but parsing C reliably is… tricky.^[1]
IMO, the way is to have a tool that checks the list, but allows exceptions.

To reliably detect deprecation of a Windows-only function, either preprocess with #define MS_WINDOWS with MS libc headers (and support any compiler-specific magic the headers use), or use a specialized parser. Same for a pthreads-only function, etc. ↩︎

encukou · May 19, 2025, 9:41am

If you don’t use versionadded:: next, it’ll be a bit more cumbersome for you to backport the patch. No big deal.
This proposal is for a reliable list, so it needs a required step in the deprecation process. It needs a PEP 387 update at least.

storchaka · May 19, 2025, 2:45pm

Not all deprecations can be described in this format. For example, there are many types of deprecations related to a parameter, not a function or method:

parameter can be completely deprecated
passing argument by keyword can be deprecated, but it can be passed as positional
passing argument as positional can be deprecated, but it can be passed by keyword
argument will be required in future
default value will change in future (this is a case for FutureWarning)
passing specific types or values can be deprecated
passing specific combinations of arguments is deprecated
only specific combinations of arguments will be allowed in future
deprecation for a method or its arguments depends on the state of the object

For attributes, setting or deleting can be deprecated, while reading is okay.

Operators can be deprecated for specific combinations of arguments. Using some general function (like len(), round(), typing.get_origin(), etc) can be deprecated for specific objects.

It iterability of the object is deprecated, should it be specified as deprecation of the __iter__() method of this object or deprecation of arguments of specific type in iter()?

FutureWarning is also a member of the party. DeprecationWarning is used if the end state is error, FutureWarning is used if it is a changed behavior. SyntaxWarning can be used if the part of the syntax is deprecated.

CAM-Gerlach · May 19, 2025, 6:00pm

Per the feedback here, @corona10 and I have settled on prioritizing a PR for with a working prototype/MVP implementation while keeping the format documentation in a HackMD for the moment till it is decided where to put it (I’m partial to your suggestion of living docs in the devguide myself)

Thanks for pointing that out. To note, that PEP doesn’t actually mention adding a .. deprecated[-removed]:: directive with the appropriate arguments and message in the docs either, which is the step that this would be replacing. Once this is established, we could add a step there that links to the devguide doc on this

Thanks for bringing up these additional cases @storchaka ! Some of these, generally the most common ones, are already covered by the updated proposal above, combining Donghee’s ideas with my prior draft proposal on the topic. These include:

Of the remaining, we can add the ability to precisely specify all but one of them by adding a single subkey to arg with a handful of enumerated string values.

The argument/parameter kind changes:

…can all be accommodated by adding a single kind (or similar, modulo name) key to the arg table with the string enum values keyword, position, optional, and default, respectively, representing the deprecated kind.

For example, suppose spam.ham() deprecated passing eggs by position:

[[py]]
module = "spam"
name = ["ham"]
type = "function"
arg = {name="eggs", kind="position"}
version = {deprecated="3.14", removed="3.16"}
reason = "usability"

Likewise, these can be handled by adding a corresponding string value to arg.kind for each of the cases; I also considered the ability to specify arbitrary logical relationships with a AND/OR/NOT syntax but that seems overly complex for diminishing returns and can be handled by the behavior flag.

For this, its a bit overloaded but we could e.g. allow passing a list (array) to behavior with enumerated string values get, set, del for type = "attribute". Or just fall back to behavior generally.

The one remaining item:

…is a characteristic example of where we’d use the behavior key that allows describing them as precisely as possible and describing the remaining specifics in free-text. Indeed, the latter closely match the cases where static tooling consuming this information would not be able to precisely determine much further than we’re already specifying.

That’s a good question; I’d say the former method since the deprecation is a concern of the object rather than the iter() function, and tooling will be looking for uses of the object rather than the iter() function specifically.

Thanks for the context–the warning can be specified via our schema above, and we could perhaps even use it to validate that the appropriate warning is used per the specified nature of the deprecation.

itamaro · May 20, 2025, 2:59am

I’m excited about this proposal, and all the engagement on this discussion thread! Thanks for starting this @corona10 , and thanks for connecting this with your previous proposal @CAM-Gerlach . I wish I had the opportunity to stay for more sprints days to collaborate with you on this!

Some thoughts:

I suggest stress testing the proposal by applying it to a diverse collection of actual deprecations, removals, and other breaking changes from the last few releases. Naturally, the existing docs contain many of them, but we can also use the examples from my language summit talk as a fairly diverse and representative collection.
I think a pretty strong motivation to making this information available in structured, machine-readable form, is to make it possible for tooling to consume this information. The discussion brings up the sphinx use-case, which is great, but I would encourage that we also consider the linter / auto-fixer use-case as first-class citizen. At least for those breaking changes that we think should be possible to lint for and maybe also auto-fix, can we stress test the proposal by actually demonstrating how a tool would consume the information? Could be an existing tool (like ruff, or fixit, or pyupgrade), or a prototype tool.
- I don’t remember who brought it up at the language summit, but it was mentioned that something similar was done for numpy 1->2 migrations, so we could borrow ideas and compare notes.
Can we include “ranking” of the breaking change as structured metadata? (e.g. whether the suggested auto-fix is a safe drop-in replacement, does it require new external dependencies, …)
Potentially expanding on the “description” field, what do you think about a “guidance” / “prompt” field (naming bikeshed notwithstanding)? What I mean by this is “free text prose explanation of what is the change and how to go about addressing it in existing code, with before/after examples” - maybe an explanation one might write in a task given to a newcomer (or LLM) who is tasked with preparing a codebase for a python upgrade. I am happy to contribute prompts I used when leveraging LLMs for some of the migrations I ran internally.

I am also interested in continuing the conversation about potential process improvements around making breaking changes, using the “breaking change taxonomy” (and ranking) to inform the process, and estimating the potential blast radius of a breaking change before actually making it. I think these conversations are off-topic enough for this particular thread though, so I’ll start separate thread(s) when I get a chance. I expect these might lead to a PEP, or updates to PEP 387.

AA-Turner · May 20, 2025, 3:07am

This gives me some pause, as above I think the prose documentation should still be distinct. As Serhiy’s short list of examples shows, there are many edge cases, and the flexibility to describe a change in written prose is very useful.

My fear, perhaps unwarranted, is that tools such as PyCharm will start using a machine-readable document such as this but loose the nuance, e.g. applying the ~~strikethrough~~ IDE formatting to a function where all that has changed is that (w.l.o.g.) a parameter is now scheduled to become keyword-only.

A

corona10 · May 20, 2025, 3:26am

Looks good to me.

timhoffm · May 21, 2025, 4:28am

Thinking in analogy of linter rules, and potential auto-fix, shouldn’t we have unique ids per kind, something like type=„arg-kwonly“? IMHO flat is better than nested here. Or what‘s the benefit of splitting the info?

ntBre · June 18, 2025, 2:39pm

As a Ruff maintainer, this sounds like an interesting proposal. We have the numpy2-deprecation lint rule and also a collection of apache-airflow deprecation rules that could serve as additional test cases for the new format.

The one question we had was whether this information could be incorporated into a PEP-702-style decorator? That would likely be easier for our type checker and others, which could emit deprecation warnings when encountering the @deprecated decorator. But Ruff and other single-file static analysis tools might still benefit from the new, separate metadata.

Let me know if there’s any other information that would be helpful!