Inference of unspecified type parameters as Never

While trying to pass the conformance tests for Zuban, I have encountered a few fundamental issues, this is #3 of 4.

Zuban (like Mypy) infers Never for unspecified type parameters (see example below). This happens quite a lot. For example list() is inferred as list[Never] initially. Both type checkers add an error in foo = list() if there is no foo.append(...) that completes this partial type.

This behavior is however against the spec. I feel like this is unfortunate, because avoiding Any is more strict behavior.

This is the relevant part of the spec:

Otherwise, the default value for the type parameter (or Any, if no default is provided) is assumed. Example:

class Node[T]:
    x: T  # Instance attribute (see below)
    def __init__(self, label: T | None = None) -> None:
        ...

x = Node('')  # Inferred type is Node[str]
y = Node(0)   # Inferred type is Node[int]
z = Node()    # Inferred type is Node[Any]

There are two different ways we could handle this:

(1) Allow both ways

This would mean a small change to the spec: I’m not sure about a good wording, but I would probably say or Any/Never, if no default is assumed and mention in the comment below Node[Any]/Node[Never].

(2) Mandate Never/An error when assigning to a variable

This would avoid the case that z = Node() is inferred without an error. For Pyright this is as easy as enabling reportUnknownVariableType=true and Mypy/Zuban already do this.

z = Node()  # Pyright: Type of "z" is partially unknown

I’m not sure though how strict the wording should be here. I personally care most about the assignment case. I feel like this is the most typical case where people will start modifying attributes and therefore unsafety unfolds. Pyright infers Node[Unknown] even with reportUnknownVariableType=true on. I feel like that’s a fine choice as well, since the error is generated on assignment.

My opinion

I’m personally in favor of (2). I believe we should enforce strict behavior in the conformance tests as much as possible even if the defaults work in a non-strict way. Pyright does a lot of things to make the type system easier for its users, but this should not stop us from specifying stronger guarantees.

I will add a PR to the typing repository and change the conformance tests/spec, for wherever the consensus lands. I hope somebody will help me with good wording.

I would have written this as

class Node[T=None]:
    x: T
    def __init__(self, label: T | None = None) -> None: ...

If T cannot be resolved at construction time, then I probably forgot to add assign a default to T. As a stub author, I’d want to know this without having to construct a Node() myself.
So I’ll go with secret option no. 3 then; preventing this from happening in the first place :slight_smile:.

For example list() is inferred as list[Never] initially. Both type checkers add an error in foo = list() if there is no foo.append(…) that completes this partial type.

Instead of adding even more special casing to built-ins it would be better to introduce a mechanism that allows late binding of the generic type variable that is also transparent and available to end users and library authors.

A particular good example for this are scikit-learn style data transforms and more generally machine learning models. For a generic transform, the generic type may not be clear at initialization time, but only when we call model.fit(data). Likewise, in your example the true generic type of the list is only solved once we call list.append(item).

Defaults are not a good solution for this because you just end up having to default to Any most of the time.

Of course this would be better. But none of the type checkers support this currently and Pyright doesn’t even support inference with list.append. I think the end goal would be better generic inference, but we’re just not there yet, so I’m proposing as a first step to at least allow Never, which for Zuban makes what you want possible.

In general type inference around narrowing is just largely unspecified since Mypy and Pyright (unfortunately) chose such drastically different routes. However it might still make sense to eventually mandate generic type inference like you propose.

I would find it useful to hear the reasoning behind choosing Never here. Is there a basis for this in type theory, or was this a somewhat arbitrary choice made by mypy maintainers?

I don’t think the spec should accommodate both Any and Never here. We’ve been working toward the goal of removing such ambiguities from the spec.

I think there’s a good argument in favor of Any being the correct answer here. PEP 696 introduced type defaults for TypeVars. A TypeVar without a an explicit default falls back to Any as its default. When instantiating a generic class without explicit specialization, any unsolved TypeVars should take on their default type. Since the default type is implicitly Any, they should become Any. It seems inconsistent to me that it should take on the value of Never in some cases. If we want to accommodate an inconsistency here, there would need to be a strong justification and very clear rules for when this inconsistent behavior is applied.

Absent some stronger type-theory-based argument in favor of allowing Never here, I think the typing spec and conformance tests should remain as they are in this case.

I think this is a misguided assessment (that others have made as well) that ends up harming the ease of people using typing.

There’s no reason to avoid Any here other than the fact that you’re binding to a type you see used, rather than checking “Is there some type that is consistent with all uses” (the actual meaning of type checking Any is some unspecified type, not some impossible type and not the first type inferred from usage that may not encapsulate all valid usage).

If the user wants more strict enforcement than that, they can provide the type they want enforced as a specialization themselves.

1 Like

I think I’m starting to understand why you guys prefer Any. I agree that Never is a weird choice, but it basically allowed Zuban to implement an error on assignments like x = list() where the type parameters are not defined. So I’m getting more sympathetic to Any being the choice here.

However I really want to emphasize that I would like to standardize something like Pyright’s reportUnknownVariableType=true. Type checkers should have a way to report that the type params are not defined, even if that isn’t enabled by default. What do you think about that?

The reportUnknownVariableType check in pyright is based on its notion of Unknown. Pyright uses this to distinguish between an explicit and implicit Any. The typing spec currently makes no such distinction. I think it would be a heavy lift to introduce this to the spec and implement across all type checkers, especially mypy.

While the typing spec indicates which checks a conformant type checker must support, it doesn’t prevent a type checker from implementing additional checks. There are many checks that mypy and pyright implement, for example, that are not mentioned anywhere in the spec. Most of these are optional / strict-mode checks. The reportUnknownVariableType is an example of this. It’s not enabled by default in pyright, but it’s available for users who want to enable it. You could do something similar in Zuban if you want, and that wouldn’t require any changes to the spec or the conformance tests.

I’m -1 on changing the spec here.

  • Treating foo = list() as a partial type that is “completed” by a later append call is a nice feature (in fact, pyrefly does this!), but I don’t see how it follows that Never should be inferred when the type isn’t completed.
  • IMO erroring on foo = list() should remain an optional/strict mode check, since it breaks the gradual guarantee. I suppose we could add language to the spec about this being a recommended check, similar to this section on a strict mode for override checks, although I’m gently skeptical of how useful it is for the spec to make optional recommendations like this.
5 Likes

I changed the Never default to Any for Zuban. I think conformance test compatibility was one reason, another and more important one was flexibility to allow some more gradual types. I however still think that Mypy is correct in choosing Never.

Mypy calls it the UninhabitedType internally AFAIK, which is then represented as Never. I don’t see what’s wrong with choosing that from a type system perspective. I understand that choosing Any is also possible, since any Type would be allowed in the generic type. It seems however clear to me that choosing Any leads to various forms of unsafety, while I have seen Mypy’s Never protect users from this unsafety. I think my main pet peeve here is that from a practical perspective Mypy has protected me as a user a lot of times from working with unknown generics and Never is working very well there.

Maybe I’m just not deep enough in type theory here (having written a Python type checker, should count at least a bit :slight_smile:), but I wasn’t able to come up with a scenario where Never did not protect me against using an unknown generic. And that’s what Mypy wants to avoid: Unknown generics.

This check essentially already exists both in Mypy and Zuban. In both there are “specializations” of both Never and and Any. This is how these type checkers can report various issues (like using unannotated function returns). I don’t think the type checkers are very far apart from each other. Mypy simply raises the error code var-annotated for unknown type variables on assignment (e.g. a = Foo() is inferred as Foo[Never] but on assignment an error var-annotated is added and a is inferred as Foo[Any].

I personally think that it is very valuable to specify optional strict checks. I would personally like to add more of the strict Mypy/Pyright strict behavior to the spec. The default modes should not matter too much, I can understand that type checkers want to be welcoming to new users, but this should not stop us from standardizing valuable stricter parts. After all it would be nice if type checkers also behaved similar in more strict scenarios. foo = [] is very very common and specifying how type checkers should behave in that situation sounds very valuable to me (looking at list.append as well as untyped type variables).

Ideally, because python is gradually typed, this would only result in an error if there is no possible solution for an unknown generic which doesn’t error in the absence of a user specified type (checking for total consistent use for each instance of an untyped generic).

Many so called “strict mode” checks aren’t actually any safer from a type theory perspective, and are only taking a very strong stance that the user should have to specify everything. Other languages manage to do better. Even some older type checkers had better behavior here.

Here’s an example of what I mean:

x = []
x.append(1)  # type of x should still be gradual, treating x as list[int] isn't correct here.
x.append("apple")  # this shouldn't error. The user hasn't provided a type for enforcement, and this is still a use for which there is a possible type that is consistent with all use.
print(*x)  # still shouldn't error
sum(x)  # but now we can error.
1 Like

Fwiw, this is more or less a coincidence rather than a particular property of Never. What you’re observing is that if you have some variable that gets inferred as list[Never], then the vast majority of the time that people use that variable, they will get an error since their usage requires some type that is incompatible with list[Never]. But that isn’t because that is some special type that isn’t compatible with anything, it just so happens that most people write code where list[Never] isn’t compatible.

You could have largely the same error detecting ability if you said that unknown type variables are substituted with tuple[int, str, str, str, int, float, complex, str, int]. The vast majority of the time, that is gonna lead to type errors too.

On the other hand, there are many cases where the Never approach doesn’t actually cause visible errors. For example, consider this:

class Receiver[T]:
    def get(self) -> T: ...

my_receiver = Receiver()
some_int: int = my_receiver.get()
some_str: str = my_receiver.get()
some_float: float = my_receiver.get()

It is clear that this usage of my_receiver is not intended, but we won’t get an error since Never is compatible with every other type, so all these assignments are legal. Generally speaking, the issue is that the type variable is used covariantly here, so substituting Never leads to a type that subtypes every Receiver[...] type.

In general, there is no single (concrete) type that leads to the strict type checking behaviour. If you use Never then places where you need a “large enough” type error out because it’s the bottom type, if you use object places where you need a “small enough” raise errros because it’s the top type, if you use some arbitrary other type you get some mixed behaviour.

The only real way to guarantee errors on this kind of thing would be to introduce a new gradual type with the opposite semantics of Any. That is, the gradual type that is compatible with no types. From what I can tell that is largely how pyright’s Unknown behaves. It might be worth adding it (ideally with a more descriptive name) to the typing spec and having it give type checkers the option of whether they want to substitute undefined type variables with it or Any.

I don’t think you have checked how Mypy works in this case. Mypy adds a Need type annotation for "my_receiver" [var-annotated] in your example.

While checking I realized that Mypy does not create an unreachable warning for Receiver().get(). I was not aware of that (or more likely I forgot it again). There’s would obviously be ways to add an error around that in Mypy, but it’s definitely not complete.

I guess that makes my argument defending Mypy quite a bit weaker.

However I’m still in favor of adding checks for unknown generics to the typing spec.

Yes, but the choice to emit this diagnostic is unrelated to the choice to infer Never for the type variable. You can infer Any just as well and still emit this diagnostic. The point is that inferring Never is (in and of itself) pretty arbitrary, and only useful for catching errors in a sort of accidental way (in the sense that any random static type would emit more errors than Any.)

2 Likes