`dict.get()` return values when `Any` involved

tungol · December 11, 2024, 12:10am

I started looking at how dict.get() is defined in typeshed, and I fell down the rabbit hole of how it behaves with a value type of Any. I wanted to bring this discussion here to get broader feedback.

Given d: dict[str, Any], d["key"] is Any, and d.get("key") is Any | None.
But d.get("key", None) is Any, so it’s a bit confusing that being more
explicit gets you a less explicit return type.

However, this is consistent with the behavior of d.get("key", "value"), which
is also Any.

I can see the case for both d.get("key") and d.get("key", None) to return Any,
and I think it would be reasonable for both to retturn Any | None, but the
current state seems like a weird inconsistency.

In typeshed, the current definition is:

class dict(MutableMapping[_KT, _VT]):
    @overload
    def get(self, key: _KT, /) -> _VT | None: ...
    @overload
    def get(self, key: _KT, default: _VT, /) -> _VT: ...
    @overload
    def get(self, key: _KT, default: _T, /) -> _VT | _T: ...

I tested out a new definition in typeshed of:

class dict(MutableMapping[_KT, _VT]):
    @overload
    def get(self, key: _KT, default: None = None, /) -> _VT | None: ...
    @overload
    def get(self, key: _KT, default: _VT, /) -> _VT: ...
    @overload
    def get(self, key: _KT, default: _T, /) -> _VT | _T: ...

This unifies the behavior of d.get("key") and d.get("key", None). It showed
a moderate amount of noise in mypy-primer, almost all of which is the result of
getting Any | None instead of Any.

111 lines total from mypy-primer
25 of these are “note”
Of the 86 error lines:
- 77 are from Any | None instead of Any.
- 4 are converting an arg-type error to an assignment or return-value error
- 2 are errors from custom subclasses of dict where the new overload didn’t match
  (or didn’t match before and now did).
- 1 is a new error from returning Any in a situation that returned a concrete type before.
- 1 is a corner case specific to mypy’s proper types plugin. (I won’t discuss this one further here, but I did in this Github comment.)
- 1 is the result of a mypy bug.

Here are all the scenarios that change if this change is applied. Given this for setup:

from tying import Any

d_any: dict[str, Any] = {}
d_str: dict[str, str] = {}
any_value: Any = None
str_value = "value"
int_value = 1

d_any.get("key", None)

This is the big change. It currently returns Any, but would return Any | None
with this change.

result: str = d_any.get("key", None)

This is not currently an error. With this change, we get: error: Incompatible types in assignment (expression has type "Any | None", variable has type "str")

result: str = d_str.get("key", None)

This is currently an arg-type error: error: Argument 2 to "get" of "dict" has incompatible type "None"; expected "str"
With this change it becomes an assignment error instead: error: Incompatible types in assignment (expression has type "Any | None", variable has type "str")

def test() -> str:
    return d_any.get("key", None)

This is currently a no-any-return error: error: Returning Any from function declared to return "str"
With this change it becomes a return-value error instead: error: Incompatible return value type (got "Any | None", expected "str")

def test() -> str:
    return d_str.get("key", None)

This is currently an arg-type error: error: Argument 2 to "get" of "dict" has incompatible type "None"; expected "str"
With this change it becomes a return-value error instead: error: Incompatible return value type (got "str | None", expected "str")

def test() -> str:
    return d_str.get("key", any_value)

This not currently an error. With this change, it becomes a no-any-return error: error: Returning Any from function declared to return "str"

Pyright mostly agrees with this. The ones that change from one error to another
don’t do that in pyright, they’re more consistent. The major point of divergence is this:

d_str.get("key", any_value)

Which mypy says is Any and pyright says is str. In an unfortunate twist,
this change makes pyright say this is str | None. Mypy avoids this by evaluating all branches of the overload when Any is present, but it seems like pyright is handling it differently and taking the first match, giving a result that’s just wrong.

Putting aside that issue for now, what do people think? For d_any: dict[str, Any], do we like d_any.get("key") -> Any | None but d_any.get("key", None) -> Any? Would d_any.get("key", None) -> Any | None be better or worse? Should it maybe be that d_any.get("key") -> Any instead? I’m not well versed in type theory, so I can’t say what the theory-based answer would be.

mikeshardmind · December 11, 2024, 12:43am

Both the before and after aren’t quite right here, the lack of negation precludes this being typed correctly, but it should just be these two,( just the first one unless we get negation)

def get(self, key: K, default: T = None) -> V | T: ...
def get(self, key: ~K, default: T = None) -> T: ...

And type checkers should be solving that .get without a default makes T for default None

On the theory side, for Any in a union type, Any in a Union does not reduce, see prior theory workup here

tungol · December 11, 2024, 2:50am

The last time the definition of dict.get() was changed was in issue python/typeshed#10293.

At that time, the definition was

@overload
def get(self, key: _KT, /) -> _VT | None: ...
@overload
def get(self, key: _KT, default: _VT | _T, /) -> _VT | _T: ...

The MR python/typeshed#10294 attempted this definition:

@overload
def get(self, key: _KT, /) -> _VT | None: ...
@overload
def get(self, key: _KT, default: _T, /) -> _VT | _T: ...

Which is closer to your first definition, but this was majorly disruptive, unfortunately,
and not just in the expected ways. I’d be surprised if much had changed there. The MR python/typeshed#10501 was accepted instead, which gave us the current definition.

The negation side is interesting. It’s true that we don’t have negation, but as overloads
are broadly first-match-wins, a failover might be possible. But I don’t think it would be
that useful versus the current behavior of it being an error to use a key that has the
wrong type. Is there a use case for accepting arbitrary objects as a possible key value
without checking if they’re the correct type first?

mikeshardmind · December 11, 2024, 2:54am

Yeah, it’s useful when doing transformative replacements, this will often take the form of replacement_map.get(thing, thing) where the items to be replaced can contain types that are never to be replaced, so that type won’t be in the map.

I don’t have an answer for it being disruptive other than that it’s only going to get more disruptive to correct this the longer there’s an incorrect definition here. I don’t think we should ever be lying in the typeshed, and it should be okay for people to have code that isn’t fully typed.

Eneg · December 24, 2024, 7:31pm

Stephen Morton:

@overload
def get(self, key: _KT, /) -> _VT | None: ...
@overload
def get(self, key: _KT, default: _VT, /) -> _VT: ...
@overload
def get(self, key: _KT, default: _T, /) -> _VT | _T: ...

I imagine the problem stems from the 2nd overload. With _VT solved to Any, anything is assignable to default, matching it.
I wonder what’s the purpose of the 2nd overload anyway, isn’t it equivalent to the 3rd one if _T is _VT?

The signature of the entire function could just be

def get(self, key: _KT, default: _T = None, /) -> _VT | _T: ...

But I suppose the authors wanted to reduce the visual noise in the no-default case