PEP 692: Using TypedDict for more precise **kwargs typing

Zac-HD · July 24, 2022, 5:34pm

First of all, thank you for writing a PEP! It’s a lot of work and an important service to the community, either improving the language or helping us move on if the idea is rejected. In that spirit, here are my issues with the current proposal, in the hope that they contribute to a better PEP or SC decision.

Why not use keyword-only arguments? They already provide a mechanism for named arguments of specific types, and show those locally without having to look at the definition of the TypedDict class in question.
Very little motivation has been given. Yes, refactoring code that uses **kwargs for clarity and static typing requires some work, but I’m not convinced that this PEP actually improves the situation.
- This PEP will also require a large amount of work by every (mostly volunteer) maintainer of a tool that deals with type annotations. This is already underway in static type checkers, but will also be required in enforcement tools such as beartype and typeguard, serialization libraries like Pydantic, testing tools including Hypothesis, etc., and is probably unsupportable in AST-based static analysis tools.
- The linked thread notes that “in situations with many heterogeneous options listing all options in the signature could be verbose and will require rewriting some existing code”. If there are too many options to put in the function signature, I’d prefer options: OptionsDict (or some other config object) over **options: **OptionsDict.
A few years ago I went through matplotlib converting **kwargs into explicit parameters, and found a pile of explicit bugs in the process where parameters would be silently dropped, overridden, or passed but go unused. Even with this PEP, using **kwargs makes it much harder to detect such problems, or for users to know what can be passed without unrealistically comprehensive documentation - and even then it’s harder to see when it goes out of date.

Overall, I’d love to see more concrete motivating examples, but expect to oppose the PEP regardless on the basis that people should almost always use explicit keyword-only parameters or pass a config dict-or-object instead; and the remaining cases are insufficient to justify complicating the language for both learners and maintainers.

tmk · July 25, 2022, 4:24pm

I think it would at the very least be useful for type stubs for libraries like matplotlib. You don’t want to keep repeating all the valid kwargs for plot(), scatter(), barplot(), etc. (Because that’s the alternative if you want to make useful type stubs: listing all the kwargs explicitly for each function.)

franekmagiera · July 25, 2022, 9:42pm

That’s a fair point, whenever possible using explicit keyword-only arguments should be the default (as described in the Intended Usage section). However, there are some cases where using **kwargs has some advantages, for example when a function could accept optional keyword arguments that don’t have obvious defaults. Also, in cases of modules that contain a bunch of functions expecting the same keyword arguments, using **kwargs is much more concise, and with this PEP could be type hinted appropriately.

I agree, it’s been a bit scarce. I tried to expand it in the latest merge request and give a bit more concrete examples for the use cases mentioned above.

Could you elaborate on what you mean by that? If I understand correctly AST-based static analysis tools (do you mean linters and type checkers?) should be able to support this PEP.

I think this approach is fine as well, but it has a small inconvenience as it would require creating a dictionary whenever a function is called.

Not an expert on linters/language servers but it feels like those issues could be resolved by such tools if the PEP gets accepted and gains some traction. Of course, as you’ve mentioned earlier it would require a large amount of work, but for me it feels like automating those is worth the effort - those bugs can be found during refactoring if someone actually sits down to refactor them, but it seems like it rarely works out this way (mypy issue 4441 has some examples of that linked).

To be honest, I don’t have that much experience, but I don’t think it would be unrealistically comprehensive and harder to notice when it goes out of date compared to using explicit keywords.

I agree, the valid use cases where **kwargs are significantly better than explicit keyword arguments are quite limited. But, still as mypy issue 4441 shows, the **kwargs pattern is widely used.

I still think this PEP could bring a lot of benefits to the Python community. If introducing the new syntax is considered not worth the effort for this use case, I think it would be beneficial to at least use just Unpack for the same purpose.

franekmagiera · July 25, 2022, 9:45pm

I’ve created a merge request with small enhancements based on this thread. As for the dunder name I decided to go with __typing_kwargs_unpack__, but not sure if that’s too verbose.

markshannon · August 4, 2022, 3:19pm

Am I missing something? This seems like a complicated and unnecessary way to do what we already can

The example given in the PEP:

class Movie(TypedDict):
    name: str
    year: int

def foo(**kwargs: Movie) -> None: ...

would be better in almost every way (brevity, clarity and performance, tool support, …) written as:

def foo(*, name: str, year: int) -> None: ...

Why add complexity to the language for this essentially useless feature?

If Movie is a singular thing, then it should be a single parameter.

def foo(movie: Movie) -> None: ...

sirosen · August 4, 2022, 4:33pm

for verbose names; they’re clear. However, I would prefer the suggestion for __typing_unpack__.

__typing_unpack__ doesn’t suggest any constraint future usages for the method. Imagine a future language version finding new uses for this method:

class Foo(TypedDict): ...
class Bar(TypedDict): ...
# dict-like merge, like `x = {**y, **z}`
Baz: TypeAlias = typing.Unpack[**Foo, **Bar]

I’m not saying that the above is a good idea. My point is that there might be interesting usages for __typing_unpack__ in the future outside of the scope of this PEP.

There’s some discussion here about motivation. I think one important one is ease of use when wrapping an API. This shows up a lot when subclassing, but also is easily demonstrated with two functions:

def foo(*, a: int, b: int) -> int:
    return a + b

def bar(x: int, *, a: int, b: int) -> int:
    return foo(a=a, b=b) - x

a function like bar is often written as

def bar(x: int, **kwargs: int) -> int:
    return foo(**kwargs) - x

both for brevity and in order to support the addition of new arguments to foo, as in:

def foo(*, a: int, b: int, c: float) -> int:
    return a + b + int(c)

The only way to get accurate type information for bar is to repeat all of the parameters for foo and to update them whenever foo changes.
**kwargs: int | float, for example, would lose the known argument names in typing information, and the association between specific names and types.

What is wanted is for bar to be declared to take a set of passthrough parameters for foo, def bar(x: int, **kwargs: {ArgsOfFoo}) -> .... The language currently has no facility for doing this, even when foo and bar have the same author, and the PEP will address a portion of these use-cases.

I’m also interested in the case in which foo comes from a 3rd party library which does not provide the keyword arguments as a TypedDict, and bar wishes to describe itself as a wrapper with passthrough arguments. I can imagine this case being handled correctly at runtime today with inspect, but not in ways that type checkers could be expected to understand. Perhaps __typing_unpack__ would make it possible to do this, but I am not confident in that. The problem of generating a TypedDict for type checkers, from an existing function or method, may be better left as future work.

pf_moore · August 4, 2022, 4:51pm

Surely that’s not the only way to get accurate type information. Inventing a syntax off the top of my head, we could have

def bar(x: int, **kwargs: __signature__(foo)) -> int:
    return foo(**kwargs) - x

where __signature__ is a special name recognised by type checkers (much like cast), which basically encodes a type "a dictionary that matches the signature of foo). Sure, that would be a TypedDict under the hood, but the point here is that I don’t want to enumerate the TypedDict definition myself, I just want to be able to say “like foo”.

sirosen · August 4, 2022, 4:56pm

To clarify, you’re not saying that __signature__ exists today, but that it could be added. (Please correct me if I’m wrong.) I meant that repeating all of the arguments is the only way to handle this today.

I would welcome the addition of a primitive that means the same thing as __signature__ above.

pf_moore · August 4, 2022, 8:18pm

Correct. More specifically, I’m agreeing with @markshannon that using a TypedDict to annotate **kwargs is almost always significantly worse than writing out the signature explicitly. You mentioned wrappers as a case when that wouldn’t be possible, and I’m saying that for that specific case I’d rather see a specific solution like __signature__ rather than use it as a justification for annotating **kwargs as proposed by the PEP.

Basically, I think a __signature__ style of solution will address the only real issue here better than PEP 692 does, and for any other situation, PEP 692 offers no benefit, and in fact tempts people to use a sub-optimal approach.

For background, there was a python-ideas subthread recently that was discussing wrappers and copying/adjusting function signatures - see here. That’s where my __signature__ thought came from - so there’s at least some interest in it from elsewhere.

sirosen · August 4, 2022, 8:54pm

FWIW, I also only see the wrapper use-case. I’m not sure why else it would be valuable to be able to annotate **kwargs with a TypedDict.

The current PEP Motivation section mentions codebases which want to get type annotations with minimal rewrites. IMO it would benefit from some motivating “hard to rewrite” examples to explain why this is important enough that the language should change.

Should a __signature__-like method return a TypedDict? That loses positional args. I think there’s a very common subclassing usage that looks something like this today:

from foolib import Foo

class Bar(Foo):
    def __init__(self, *args, x: int = 1, **kwargs):
        super().__init__(*args, **kwargs)
        self.x = x

It would be good to support this.

Relatedly: I don’t like the idea that a library which wants to support subclassing in concert with this feature may have to document that it provides two objects, Foo (a class) and FooKwargs (a TypedDict representing the keyword arguments of Foo). It would be better to have a way to refer to the signature of an existing callable.

EpicWink · August 4, 2022, 10:27pm

Currently there is no way to type-annotate keyword arguments with names which aren’t valid Python identifiers, eg foo(**{"a-b": 42}). Real world example: azure-sdk-for-python/_models.py at 6e29da99d6362cc217883005fb5f2ca6317c5ac8 · Azure/azure-sdk-for-python · GitHub

I’m not sure if this proposal will add support for type-annotating these. I’m also not sure if we want to support these.

sirosen · August 5, 2022, 2:51pm

I don’t think it would be supported, which is probably fine.

The Azure SDK case shows up on this forum in several threads when discussing invalid-identifier-kwargs, which suggests that it’s part of a very small set of such cases.

I’d vote against any special accommodation. If these usages happen to work and people want to color outside the lines, great! – but -1 on extra code to support it.

franekmagiera · August 5, 2022, 11:11pm

Thank you for all the comments.

The proposed feature is not useless. The main usecases are: easier type hints introduction for existing codebases that already use **kwargs, reducing code duplication and copy pasting for sets of functions that have the same signatures and supporting functions that should facilitate optional keyword arguments that don’t have default values.

I agree that those use cases are not prevalent but they are there and in my opinion they have merit. In addition, mypy issue 4441 shows there is a demand for this feature.

That said, if the consensus after discussing this PEP is that this is an essentially useless feature, it seems that at least Python docs should be changed to clearly discourage the use of the **kwargs pattern (and some steps should be taken to remove **kwargs altogether if there is ever any release resembling “Python 4”). Unless that is the case, I think there should be a way to type **kwargs more precisely.

That’s a good point.

When it comes to the idea of introducing something like **kwargs: __signature__(func) that would indicate that **kwargs should have the same structure as func there are some concerns that come to my mind. Firstly, it is based on the idea that the only reasonable motivation for using **kwargs is reducing code duplication for functions with the same signatures. As mentioned above, there are also other use cases that in my opinion TypedDict fits better. Secondly, I am not convinced this idea is better and less complex than using TypedDict, especially taking other use cases into account. In addition, I am not convinced this solution is as elegant as it seems. There is a lot of questions that should be answered for that proposal, for example - what should be happening if func is itself using positional args, *args and **kwargs; what should be the callable assignment rules; how should this behave if func is a method of a class…?

In my opinion, using TypedDict is the most natural choice for precise **kwargs typing - after all **kwargs is a dictionary. Also, TypedDict is already clearly specified. Another use case that **kwargs are good for is for functions that are often called with unpacked dictionaries but only use a certain subset of the dictionary fields. If there ever is any update to the TypedDict behaviour, like for example optional key support, covering a new use case will be easier than in case of __signature__(func), because it will already be clearly specified and the behaviour of kwargs inside the function body will already be clearly specified.

pf_moore · August 6, 2022, 8:28am

Wait. Re-reading the PEP, it says that def foo(**kwargs: T) means that every keyword argument must have type T? That’s weird. I would have expected the annotation to describe the type of kwargs, not of its values (so I’d type that as def foo(**kwargs: dict[str, T])). I’ve never used that syntax, and never even considered that would be the case - it seems wrong to me. When was that feature added, and is there a justification for it that I can read to understand why this is useful? I can’t even imagine it being a useful short form - in my experience kwargs are rarely all the same type.

WIth that extra context, PEP 692 feels to me like a workaround for a broken feature, and I’d rather see the original feature fixed than have it enshrined in the language by having a specialised syntax to do what should have been there in the first place.

mdrissi · August 6, 2022, 8:39am

From the beginning of type system. It was done in PEP 484 here where

def foo(*args: str, **kwds: int): ...

In the body of function foo, the type of variable args is deduced as Tuple[str, ...] and the type of variable kwds is Dict[str, int].

It would be a major backwards incompatible type system change to change that now. All of typeshed/other stubs/type checkers the meaning of *args or **kwargs currently follow that behavior.

I think in practice for most apis being explicit and not using kwargs is preferable. It’s mainly for apis like matplotlib that forward kwargs a lot where I see this pep as useful.

pf_moore · August 6, 2022, 9:20am

That’s a real shame. There doesn’t seem to be much justification for that choice. I still think it sucks that we’re now considering new syntax to implement what probably should have been the meaning in the first place.

sirosen · August 9, 2022, 4:50pm

To clarify, I do not care about reducing code duplication – duplication is acceptable if that’s what your use-case calls for. I want to be able to import and wrap functions from a 3rd party source. i.e.

from foolib import foo_func

def mywrapper(*args, x=1, **kwargs):
    print(x)
    return foo_func(*args, **kwargs)

cannot be properly annotated today without changing its runtime semantics. That holds especially strongly if I want mywrapper to support multiple versions of foolib.

Duplicating all of the args to foo_func would be acceptable to me if it worked as a solution.

The motivation here is missing important detail. The PEP almost solves another related problem, but doesn’t quite. And the motivation seems centered around the notion, a priori, that being able to specify better types for **kwargs is important – it doesn’t address why you should be able to do this in the first place. Nor does it address why **kwargs deserves special treatment but *args does not.

As presented, there are these motivating use-cases:

easier type hints introduction for existing codebases that already use **kwargs
reducing code duplication and copy pasting for sets of functions that have the same signatures
supporting functions that should facilitate optional keyword arguments that don’t have default values

Each of these, however, have relatively simple solutions which do not require changes to the language. Respectively,

Stop using **kwargs. You’re already going to have to make changes in order to fully leverage type annotations. This is one of them.
Duplicate the code.
Use a sentinel value instead of omission. (I have my fingers crossed for PEP 661 – Sentinel Values | peps.python.org to make this easier.)

You may feel that these are bad solutions, but I’d like to hear and understand why.

franekmagiera · August 9, 2022, 9:50pm

Thanks for the comment.

Do you mean the example you mentioned above? Could you explain that in more detail? It seems to me you could annotate this using the proposal from PEP 692.

I’m not sure I understood you correctly but I agree that you wouldn’t be able to support multiple versions of foolib (if the API changes). But, shouldn’t the dependencies be pinned to specific versions anyway? By the way it seems like those kind of API changes in the dependencies can be easy to miss if they use **kwargs, they cannot be annotated properly and the maintainers don’t have the time to refactor the code.

The way I think about it is - **kwargs is a python feature that users can use. If it isn’t discouraged explicitly, then I think the users should be able to annotate it properly. Current specification has a significant limitation (supports only homogenous **kwargs) that is not very realistic (as mypy issue 4441 shows). And the reasons that users should be able to do this in the first place are the same as for using static typing in general - catching bugs earlier (at type checking time, rather than runtime), preventing typos; code is easier to understand and maintain, etc. - but I don’t think it makes sense to reitarte all that in every typing related PEP.

I think those are fine, but I don’t think those are “one size fits all” solutions.

To me this seems like a very opinionated suggestion. From my point of view it is not very user friendly and as long as the use of **kwargs is not being explicitly discouraged, there should be a way to annotate them properly.
If the amount of duplication is small then this is fine, but in certain cases preventing duplication can really make a difference in how easy it is to read and manage the codebase (and prevent all copy pasting related mistakes).

pf_moore · August 9, 2022, 10:35pm

I couldn’t let this pass without comment. Absolutely not. If you’re an application, maybe, but if you’re writing a library you want to work with as many versions of your dependencies as possible. **kwargs is a great way to ensure that you aren’t tightly coupled to parts of the dependency’s API that you don’t care about - and adding coupling by requiring the caller to duplicate the callee’s API for the type annotation defeats that goal.

If this proposal encourages developers to tightly pin dependencies so they can match the annotations, it’s likely to introduce far more problems than it solves.

EpicWink · August 9, 2022, 11:02pm

How would you expect documentation generators (eg Sphinx autodoc) to display this typing? Having the TypedDict separate from the function would be cumbersome, especially with sorting.

I think I would prefer having the types online with the function signature, and the best idea I have for that is:

def f(**kwargs: **{"bar": int, "spam": str})

But really, I consider explicit (ie not variadic) function parameters to be part of the functions type, and typing those parameters to be an extension of that. I see kwargs as a shortcut that doesn’t type the function, which eschews typing for other functionality.

In terms of wrapping, perhaps we could add a wraps decorator in the functools module which type-checkers could use to infer variadic parameter types (perhaps with parameter type specialisation/overriding).