Inference on Literal Types

Dr-Irv · November 25, 2024, 8:26pm

Here’s another example, contrasting the behavior of mypy and pyright with Enum and Literal. From a typing perspective, my colleagues and I like to view Literal as a typing shorthand for creating an Enum class, but pyright makes this hard to do because of the type “widening” going on.

from enum import Enum
from typing import Literal


class MyEnumAB(Enum):
    A = 1
    B = 2


def enum_returnset(thelist: list[MyEnumAB]) -> set[MyEnumAB]:
    y = set([x for x in thelist])
    reveal_type(y)
    return y


MyLiteral = type[Literal[1, 2]]


def lit_returnset(thelist: list[MyLiteral]) -> set[MyLiteral]:
    y = set([x for x in thelist])
    reveal_type(y)
    return y

pyright output

enumadd.py
enumadd.py:12:17 - information: Type of "y" is "set[MyEnumAB]"
enumadd.py:21:17 - information: Type of "y" is "set[type[int]]"
enumadd.py:22:12 - error: Type "set[type[int]]" is not assignable to return type "set[MyLiteral]"
    "set[type[int]]" is not assignable to "set[MyLiteral]"
      Type parameter "_T@set" is invariant, but "type[int]" is not the same as "MyLiteral"
      Consider switching from "set" to "Container" which is covariant (reportReturnType)

mypy output:

enumadd.py:12: note: Revealed type is "builtins.set[enumadd.MyEnumAB]"
enumadd.py:21: note: Revealed type is "builtins.set[Union[type[Literal[1]], type[Literal[2]]]]"

mikeshardmind · November 25, 2024, 8:29pm

It definitely is. Lets look at it:

b: B = C()
a = [b]
reveal_type(a)  # mypy list[B]  pyright list[C]

The developer intent here is creating an object that will be treated as B and must be consistent with B. This line is fine.

The next line is creating a list with that as it’s only element. mypy keeps the developer intent, and this is fine so long as it continues only being treated as list[B]

pyright ignores the developer intent and changes it to list[C], this surprises the user, and makes it incompatible with their own annotation.

to hammer it home:

def make_b() -> B:
    return C()

b: B = make_b()
a = [b]
reveal_type(a)  # mypy list[B]  pyright list[B]

You can’t argue this is unsafe to allow the developer intent to be what is prioritized for inference here, this is a valid statement of developer intent that is within the type system.

oscarbenjamin · November 25, 2024, 8:29pm

To me this seems like something that Literal was not really intended to do. It is clear from PEP 586 that the intended purpose of Literal was for @overload. A proper typing version of Enum should be able to support arbitrary types rather than being restricted to just a few elementary types.

Dr-Irv · November 25, 2024, 8:30pm

I agree with you, but if you look at the original example that I included at the top of this discussion, the natural way to write the code is to not include annotations on the loop variable x. I think pyright should infer that x in the call to myfunc(x) is of type Days.

One other note - this isn’t about mypy vs. pyright. This is about a behavior of pyright that I (and others) disagree with, where mypy is producing a more favorable behavior.

oscarbenjamin · November 25, 2024, 9:04pm

Yes, but you can’t just pick and choose exactly what behaviour you want in a narrow situation without considering how the type checker is supposed to work in general. There are fundamental differences in how mypy and pyright operate and model the types and those will come with tradeoffs.

In the case of pyright I think that if you want your example to work (without a hint) then it also amounts to having something like:

a = [1, 2, 3, 4]
reveal_type(a) # list[Literal[1, 2, 3, 4]]
a.append(5) # error

Apparently mypy’s method to avoid this involves having distinct sorts of literal types or something:

from typing import Literal
a: Literal[1] = 1
reveal_type(a) # Literal[1]
reveal_type(1) # Literal[1]?
reveal_type([a]) # list[Literal[1]]
reveal_type([1]) # list[int]

I’m sure if you dig into it you will find cases where that leads to strange behaviour that isn’t what you wanted either.

Mike’s suggestion for how pyright could instead handle your case by analysing the full graph of program flow presumably comes with significant downsides as well.

mikeshardmind · November 25, 2024, 9:06pm

Oscar Benjamin:

In the case of pyright I think that if you want your example to work (without a hint) then it also amounts to having something like:
a = [1, 2, 3, 4]
reveal_type(a) # list[Literal[1, 2, 3, 4]]
a.append(5) # error

I think this case is fine to leave to inference for now, there’s no annotation involved here that expresses a specific preference of intent. I would like if we could standardize some parts of inference, but the examples above I don’t think should require it, pyright should just stop ignoring the stated developer intent where it exists.

Complexity, runtime, and that because python isn’t a compiled language and allows a lot of dynamic modification, this is still subject to limitations of how sound it can be based on what is visible to the type checker, especially for checking use of library code in a larger application.

mikeshardmind · November 25, 2024, 10:19pm

Pyright isn’t erroring on this, which came up when in a help channel in a discord server, when some asked why the following is the case

x: float = True # NO ERROR
y: int = False # NO ERROR

Code sample in pyright playground

from typing import reveal_type

x: float = True
y: int = False

reveal_type(x)  # Type of "x" is "Literal[True]"
reveal_type(y)  # Type of "y" is "Literal[False]"

def what(x: float):
    if not isinstance(x, float):
        raise TypeError("Expected a float, got a {type(x:!r)}")


if __name__ == "__main__":
    what(True)

It appears this special casing of Literals in pyright’s inference model is not just hurting developer intent, but also allowing things it should not

mdrissi · November 25, 2024, 10:30pm

This looks mostly unrelated and has to do with pep 484 defining that int subtypes float in type system, but that’s false at runtime. int <: float typing relationship is a long known convenient lie and we’ve other discussions on that topic.

edit: Even if no literal special casing exists according to type specs today bool <: int <: float so that code is expected to pass all type checkers. Notably that will pass mypy too, it’s not pyright behavior.

mikeshardmind · November 25, 2024, 10:40pm

sigh right, that awful lie does apply here.

oscarbenjamin · November 25, 2024, 11:34pm

This behaviour from type checkers is very unfortunate. It is true that bool is a literal subclass of int but you would very rarely want that when using a type checker. On the other hand int is not a subclass of float but the very first typing PEP mandated that the type checkers must accept it:

def func(x: float):
    return x.hex()

func(2.0)
func(2) # AttributeError

In pyright there is the capability to distinguish these because when possible it tracks the real type of the object regardless of the annotation:

a: float = 1
reveal_type(a) # pyright: Literal[1] mypy: float

It can also distinguish the real types for type narrowing:

def func(x: complex | str):
    if isinstance(x, complex):
        return 1
    else:
        reveal_type(x) # float | int | str
        # mypy says: str

The fact that the runtime often allows mixing int and float is precisely why it would be useful for a type checker to distinguish them: type errors could be detected that unit testing would miss.

If I was going to have the type system diverge from the runtime here then it would be to say that bool is not a subtype of int but instead we have gone the other less useful way:

z: complex = True

hauntsaninja · November 26, 2024, 12:06am

There are a couple different things going on in this thread. From a mypy maintainer’s perspective:

mypy’s behaviour on Dr-Irv’s original example looks pretty defensible
- In particular, I’m not sure I see the consistency argument between this and what the check should do when inferring the type of a literal list, since there is an explicit type hint involved here
mypy’s behaviour regarding narrowing of declared symbol types on initial assignment is likely to change at some point, see Types not added to binder on initial assignment · Issue #2008 · python/mypy · GitHub. I think the current mypy behaviour has shown to be not particularly intuitive, since mypy will of course narrow on subsequent assignments
- There’s an old typing-sig thread about this where we went into this in some detail
I would accept a flag to mypy that let users no longer treat bool as a subtype of int. This comes up pretty regularly and would help users find real bugs in their code. I would be opposed to including this by default or as part of --strict, unless there was ecosystem or Typing Council consensus. See Consider adding `strict-bool` mode · Issue #8363 · python/mypy · GitHub

oscarbenjamin · November 26, 2024, 1:03am

I don’t think that anyone has objected to it.

What mypy does here is fine but the question in this thread is whether pyright should be expected to match. There is a type annotation but only on one object and this discussion is about the assumed type of another object that has no annotation:

a: B = C()
l = [a]
reveal_type(l)

There is no type annotation for l and in general no way for a type checker to know what the type should be since it is a mutable container: the type is determined by what someone might like to append/insert in future.

In this thread some feel that the type should be assumed (I avoid the word “inferred” since this cannot be inferred) as list[B] but pyright chooses list[C] if not given an explicit annotation.

mikeshardmind · November 26, 2024, 1:28am

I think we can here. One inference choice results in a type checker calling a valid program an invalid program, and the other results in it recognizing it as a valid one. The correct one also uses the type the developer provided.

I think we can easily fix this case of differing behavior with a single line addition to the specification about inference. “When choosing between multiple inferred types, if a developer has provided an explicit annotation that should apply to the value in question, and that type is valid for that value, use the developer’s provided annotation, otherwise error at the site of inference and reference the annotation’s conflict”

This uses what the developer has stated intent for, and refuses to assume they are wrong without informing them that is what is happening, giving them the opportunity to choose the correct fix.

oscarbenjamin · November 26, 2024, 2:00am

This is a dangerous approach for a type checker: reinterpret everything until one possible interpretation might be correct. The goal of the type checker is to distinguish valid and invalid programs so presuming validity as part of interpreting the program in order to then judge its validity is problematic.

mikeshardmind · November 26, 2024, 2:02am

How is that phrasing problematic? If it is safe to assign a value to a symbol with that annotation, it’s safe to say that’s the type, if it isn’t error there. I’m not saying to presume the user is correct to the point of allowing something invalid, I’m saying in the case of multiple interpretations, one of which has an annotation, use the annotation, error if the annotation would be an error.

mikeshardmind · November 26, 2024, 2:11am

class B:
    pass


class C(B):
    pass


b: B = C()  # interpret this as B from here on out. Using things that exist on C, but not B should be an error, because you've specified this is B, and the value you've specified is consistent with B.
a = [b]   # this is now list[B]

This is much better than telling people they have to write something like

for val in list[Literal[...]](gen_expr_here):
    ...

and it’s clearly safe, because you can avoid all of this with what I pointed out earlier:

def make_b() -> B:
    return C()

b: B = make_b()
a = [b]
reveal_type(a)  # mypy list[B]  pyright list[B]

The only issue here is pyright intentionally ignoring an annotation to use something other than what the developer explicitly specified.

oscarbenjamin · November 26, 2024, 2:30am

It doesn’t ignore the annotation because it uses it to constrain assignment. What I don’t quite understand here is why there is a need to write:

b: B = C()
a = [b]

rather than

b = C()
a: list[B] = [b]

The former case adds a hint for no reason when the type is easily inferred. The latter case has a clear purpose: it says what types can go in the list which is something that a type checker cannot know otherwise because it doesn’t know what types we might want to add in future. If the intention was to annotate b in order to constrain a then the annotation is in the wrong place.

mikeshardmind · November 26, 2024, 2:39am

Because that was the MRO for the actual issue with it being in a place an annotation can’t be placed in a way that matches how people naturally write code that is valid at the beginning of the thread.

from typing import Literal, TypeAlias

Days: TypeAlias = Literal["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]

def myfunc(y: Days):
    print(y)

def iterate_this(daylist: list[Days], adict: dict[Days, Days]):
    for x in set([adict[x] for x in daylist]):
        myfunc(x)

Note that here, there’s nowhere that’s obvious to place the annotation, pyright is ignoring the type of what is going in, and there’s not even a non-literal local string construction happening here, it’s a literal when passed to this function. but there’s nowhere obvious to place this. (why would a user think a redundant annotation on an intermediate variable would change the outcome here?!)

I’d have personally written it as a set comprehension, not a set call on a list comprehension, but the point remains in that there isn’t always a natural place to place an additional annotation just to get pyright to respect it for what pyright sees as an intermediate step.

pyright would require instead of writing:

def iterate_this(daylist: list[Days], adict: dict[Days, Days]):
    for x in {adict[x] for x in daylist}:
        myfunc(x)

to write:

def iterate_this(daylist: list[Days], adict: dict[Days, Days]):
    pyright_required: set[Days] = {adict[x] for x in daylist}
    for x in pyright_required:
        myfunc(x)

This is obviously user-hostile to do, and perpetuates people finding typing to be annoying rather than non-intrusively helpful

This gets even worse with laziness and generator expressions, as you then need to import Generator or Iterator and create an intermediate variable for a generator expression, just to get pyright to use the annotation that’s already there.

rsdenijs · November 26, 2024, 7:33am

This argument does not make sense to me. In

b: B = C()
a = [b]

There is no right or wrong regarding the type of a as the question is underdetermined given the available information. The developer did not specify any intention regarding the type of a, it only specified its contents. List[Any] would also be a valid inference.

Liz · November 26, 2024, 10:51am

I would say explicitly creating a homogenous container from an annotated value is extremely specific on intent, especially when later in the program, the rest of the program is only valid when that is the type of the values in the container.

Substituting Any here would be a terrible idea. Is your point that we should specify this better so that inference behaves in the obvious way and is specified?

If not, Should users have to manually annotate every subexpression, even those that have involve situations where annotations would not be syntactically legal?

Whats the point you are trying to make? Saying it’s unspecified, so a developer should expect the type checker to pick anything isn’t useful or constructive to the issues this causes for normal code.

A type checker that errors with valid code that it has all the information it needs to know is valid is also not useful to users.