Well said. I think this is the most important aspect of this discussion to me, and why I’m not a fan of pyright’s behavior in this case, and why I brought it up in the first place.
You are here, just for the other end of it. With your behavior here, the resulting set from the example now allows adding non-Days items to it, so for the idiomatic use of a homogenous container type, this still requires splitting to prevent that from being possible.
I don’t think it should be problematic or require ever expanding rules here. The default behavior for generator expressions, comprehensions, and generics in general should be to use the current known type while respecting annotations. Pyright fails currently by not respecting annotations here, and would otherwise have the expected inference.
This is a consistent rule that then when people have odd cases, they can add information, but the default behavior is the most common use in idiomatic code And would be documented so this isn’t later a point for it to cause problem.
But the entire reason “requires splitting” is an issue, is because the set is ephemeral in this case, making it impossible to add anything else to it. And if you did add something else to it right in the same expression, that would also be a clear indication of user intent that the set can contain other types.
Can you provide a realistic code example where a) it’s necessary to prevent “something else” being added to the container, but b) you still think the right inferred type is obvious?
my_set = {1, 2, 3}
if some_condition:
my_set.add(4)
elif some_other_condition:
my_set.add("4") # oops, type error mistake
if 4 in my_set:
# frustrated bug hunting, why isn't this working?
False negatives arent any better. They’re worse, because users dont see them till they cause a runtime issue.
If we’re going to keep this this way, can we just allow a comment like this:
b: list[B] = ...
c: list[C] = ...
merged: list[B] = ...
for val in (*b, *c): # type: val as B
log.info(val)
merged.append(val)
so that people don’t have to write
from typing import Generator
b: list[B] = ...
c: list[C] = ...
gen: Generator[B, None, None] = (*b, *c)
for val in gen:
log.info(val)
merged.append(val)
to get the inference they want?
Inferring for total consistency rather than immediate use would be good, but this requires program flow inference to be fully correct when generalizing it to expressions that are not obviously temporary, and as @mikeshardmind and I have each brought this up before, to have everyone from Jelle to Eric to Guido shoot down the idea of getting more soundness with fewer false positives this approach, with a couple of them even deriding the academic roots, I’d rather settle for a directive comment to inform the type checker of what type is expected and require a strict match be possible than trust this isn’t just a long term pipe dream that will never be implemented.
It’s also possible for something like a list subclass to override new and reuse what “appears to be” a temporary iterable, because of ways to communicate ownership.
What principle would you want a type checker to use for inferring the type of my_set in this example?
I think the implied heuristic here (“don’t infer a literal type for a type parameter”) is precisely the same heuristic that everyone in this thread is complaining about pyright applying.
If the intent is that only integers can be added to the set, my_set: Set[int] = ... makes that explicit, rather than relying on inference heuristics.
In the case where there are no type annotations, I think pyright is doing just fine as it is.
The issue is when there’s a type annotation.
Literally all the issues people have in this thread involve pyright explicitly ignoring an annotation in favor of its own inference, how does addding an annotation help if type checkers are allowed to ignore it?
a: Literal[1] = 1
my_set = {a}
my_set.add(4) # Here, I would like a type error reported, when pyright doesn't give me one.
I don’t think the initial initialization of a container should be privileged in that way.
a: Literal[1] = 1
b: Literal[2] = 2
my_set = {a}
if condition:
my_set.add(b) # I think a type error here would be a false positive, due to too much guessing
The false negative if that is an error, is worse than the false positive if it isn’t.
A false negative can mean the server crashes in production.
A false positive is minor inconvenience to deal with.
Someone can respond to a false positive to add a type annotation to my_set.
Someone can’t respond to a false negative, because there’s nothing to respond to until it’s too late.
The accumulated impact of lots of false positives is that people feel typing is difficult to adopt and type checkers don’t understand their (perfectly working) code, and in the end, less use of typing. I think false positives are a serious problem. The logical conclusion of “false positives are always better than false negatives” is an unusable type system.
I don’t think there’s any false negative here, because the user didn’t request any enforcement of the type of the container, so the best indication we have to go on is what they actually added to it.
Are you willing to commit to all type checkers needing to check all uses of mutable (actually, all generic ones, not just mutable) containers for total consistency? Anything short of that, and this is a case that sure seems like it should err on the side of what is annotated here, because the error won’t happen in the false negative case until use violates this.
I can’t commit to anything on behalf of all type checkers, but I do think that a (hypothetical) type checker that works in the way I suggested above, should accurately incorporate all observed constraints on the type parameters, in all code flow paths.
And that’s where the problem here lies. People need this to work across type checkers because it causes problems otherwise. If this isn’t something we can specify, then we need something we can specify that does fix this. The answer can’t be predicated on “just annotate more”, because the issues here involve a type checker ignoring those.
I don’t think that pyright is ignoring the annotations. The issue here is about the types that are assumed for objects/variables that don’t have annotations. There won’t be false negatives by the time you return from a function because of the return annotation:
def func() -> set[Literal[1, 2]]:
a = {1}
return a
Likewise if you have a persistent mutable container somewhere then you should set an explicit annotation on that:
class A:
stuff: set[Literal[1, 2]]
Precisely what gets inferred for the mutable containers that are local/intermediate variables doesn’t matter except that we want the types to be checked for consistency and we want to minimise unnecessary constraints on the code as well as redundant annotations.
I think that pyright’s behaviour mostly does a good job with that but it gets it wrong in the OP case involving Literal. The issue as I see it is that this is because Literal is being used like an enum when it wasn’t really designed for that.
There were examples earlier in the thread where it does, and showed what happens when you work around that is what we want to happen.
The original code is using Literal in a way supported by the specification. Your personal view that this should be an enum instead doesn’t change that this is a supported and specified use of literal
This doesn’t make any sense as a method for figuring out whether there are any false negatives.
Using this reasoning, it could be argued that there are ZERO false negatives from any type checker, because it’s reporting all the errors that go against the constraints that this type checker uses.
The standard for a false negative depends on whether the code does what the user wants it to, regardless of what is specified by any annotations.
There’s no way to judge from that example that there isn’t a false negative there.
A type checker is only useful if it finds problems that go against what the user wants. Simply finding what goes against some typing specification is useless without paying attention to what the user wants.
The typing system is a tool, not an ideal standard or end-goal.
I gave an example based on real-world situations that I’ve come across many times, where the program doesn’t do what the user wants.
Your proposed system wouldn’t find the problem that goes against what users want.
Saying it “accurately incorporates all observed constraints on the type parameters, in all code flow paths” is appealing to a useless standard.
“Accurately incorporating all observed constraints on the type parameters, in all code flow paths” is only useful if it finds the problems where the program doesn’t do what the user wants.
Here’s a stripped example that includes pyright ignoring an annotation without literals being in the picture, and then that annotation propagating further, in a generator expression, something not usually possible to annotate without doing type-only gymnastics to create a temporary assignment as most people writing a generator expression are immediately using it in a loop.
from typing import reveal_type
class A:
pass
class B(A):
pass
a: A = B()
b = B()
reveal_type(a) # mypy sees A, pyright sees B
reveal_type(b) # mypy and pyright both see B
reveal_type((a for _ in range(2))) # Generator[T, None, None], pyright solves T as B, mypy A
reveal_type((b for _ in range(2))) #same as above, both see B