Revisiting variance

I can create a different example based on what was provided there if it would help you, but you’ve lasered in on the least important part of that. There’s an int where a Permissions is expected in an invariant generic. Add a print of x._items after the merge and a reveal_type of x and x._items, and remember that x is invariant

I’d rather not wait on someone else tweaking my example to get to the point on this, 3.13 is around the corner and this has been treated as “oh, it’s actually this other issue” back and forth for a while now.

If this still isn’t good enough, I’ll take the time to come up with yet another way it could occur, there are two additional ways it can happen here.

This time with str and int conflict

Isn’t this just a badly implemented TypeIs function? The generic type claims its _merge_func accepts tuples of T but the function never checks whether that actually is the case. From what I understood before your issue is different from it just being hard (or in this case impossible afaik) to correctly implement safe TypeIs functions, right? If I use the unsafe_typeis2 function in kaboom2 I also don’t get the same inferred type but MutabilityIsADistraction[Any], even with mypy.

I made a variant of your example that only uses covariant types and has, from my pov, the same behaviour in this playground. This does at least indicate that there are serious difficulties with actually writing correct TypeIs functions that also correctly cover subtypes. But from what I’ve understood that is not your primary concern here?

1 Like

It’s possible to argue that for the first one, but not the second, which is implemented Collection → Collection[str],

As one typing council member put it:

But we now have an example of such a TypeIs function (Collection, rather than Sequence, but there’s nothing significant different, and it could be replicated with Sequence as well)

it’s there as

def unsafe_typeis2(collection: Collection) -> TypeIs[Collection[str]]:

but even matching to

def unsafe_typeis2(collection: Collection[object]) -> TypeIs[Collection[str]]:

the issue persists

It’s fundamentally not possible for this to be safe (despite the motivation in the pep being for this to be safe and for negation to be possible) with the information erasure possible, and this is before getting into the specification issues in the other thread about type is, and type negation being specified without forbidding types that can’t be negated.

1 Like

T is invariant in this example due to the return type list[T], I appreciate the additional examples, but this doesn’t remove variance from the equation here.

ok fair, I should’ve gone with tuple[T, ...] instead. With that I’m pretty sure the type variable is covariant since it only appears in covariant types in return types. We could even change it to just be T and only return self.special. But the issue is still exactly the same.

How is this ultimately not an issue with the TypeIs function just not being correct at runtime? It claims that the object is of a particular type when it isn’t.

1 Like

I don’t know why so many people are fighting to keep even with it’s issues when this is solvable without implying safety falsely already:

def narrow(s: Sequence[object]) -> TypeIs[Sequence[int]]:
    return all(isinstance(s, int) for x in s)


def outer_one(x: Sequence[int | str]):
    if narrow(x):
        return sum(x)

def outer_two(x: Sequence[int | str]):
    if all(isinstance(s, int) for x in s):
        return sum(cast("Sequence[int]", x))

When I tried your case and made the modifications to tuple, I’m getting a narrowing of the inner type to Any, Which Jelle called a bug when it happened to pyright

and which doesn’t happen in @mikeshardmind 's, this seems to suggest there’s something else going on here. Maybe it’s broken in another way.

However, even if we can say that it’s not a variance issue after all of the bugs shake out, that just changes it from variance erasure makes TypeIs unsafe to information erasure makes TypeIs unsafe, and that leaves the realm of even possible to make TypeIs match it’s promises and should absolutely result in TypeIs being removed.

If it’s just variance, we can do something in the short term that will help and not require a full removal or snap decision; you could require all uses of TypeIs to be:

def safer(instance: G[T1]) -> TypeIs[G[T2]]:
    if type(instance) is G:
        # rest of implementation
    return False

require that G is the same in input, output, and is matched exactly in the body, and that all of G, T1, and T2 are runtime valid non-gradual, non-protocol types.

Even that isn’t without issue, this leaves my objections about type negation from the other thread intact.

No, he just accidentally didn’t add the 2 to unsafe_typeis in the second outer function call. If you actually use the unsafe_typeis2 in any of our example versions the inferred type always has Any as the generic parameter. I can’t tell if this is intended behaviour or not. It might be that type checkers are assuming that the intersection of a concrete type and Any just is Any, or it might be a different bug.

I agree that this issue at least makes it clear that actually implementing a fully correct TypeIs function is significantly harder than it seems. Any type like Collection that exposes some API would reasonably have a TypeIs function use precisely that API to check the type, but this becomes unsafe when a subtype adds to that API in certain ways. In particular, there can be two objects that differ only in the behaviour of the subtype API, and that would make them be elements of two different types, but the entire behaviour of the parent type API is exactly the same between the two objects.

More concretely, in our example we can have two WeirdCollection objects that support the same Collection interface as an empty tuple, which means that they both are members of Collection[T] for arbitrary T. But since they differ in their behaviour on the WeirdCollection specific API, one of them is a member only of WeirdCollection[int] and the other only of WeirdCollection[str].

From what I can tell the initial idea of this thread was that this difference in behaviour across different subtype APIs could only occurr if the variances of the type variables are different. But it seems like just matching variance isn’t enough. What we’d need is to make it so that a type checker interprets the intersection of WeirdCollection[Any] and Collection[str] to mean that the Collection API methods will use str in place of the type variable, but WeirdCollection methods still use Any.

I’m not sure if that actually guarantees soundness in all cases, if it’s feasible to implement, or if it’s even a well defined idea in general. But I do think that this should be the direction a potential change to type narrowing (or the type system behaviour in general) should go down, rather than focusing on variance.

1 Like

The presence of Any in both of these examples contradicts the accepted pep which states:

Formally, type NP should be narrowed to AR, the intersection of A and R , and type NN should be narrowed to A∧¬R, the intersection of A and the complement of R . In practice, the theoretic types for strict type guards cannot be expressed precisely in the Python type system. Type checkers should fall back on practical approximations of these types. As a rule of thumb, a type checker should use the same type narrowing logic – and get results that are consistent with – its handling of isinstance(). This guidance allows for changes and improvements if the type system is extended in the future.

following the rule of isinstance, you can’t have Any show up there, at it’s most unknown, it would be “str or one of it’s subtypes”

Following the set theory instead, you still don’t get Any, and believe me, we spent months arguing this one while working on intersections, and multiple typing council members, as well as multiple people knowledgable in type theory both in python and other languages, came to agree to something that approximates as “Any & T isn’t reducible to any singular type,but has an Unknown upper bound, and a known lower bound T” (Not all of the terminology there even reached an agreement, and much time was spent trying to collapse it directly to either Any or T)

I’m inclined to believe based on working out a few more examples by hand to avoid any issues with type checkers having bugs, that there are issues that go beyond variance with TypeIs here.

With that said, just like there being a mutability issue didn’t also mean that there wasn’t a variance issue, it doesn’t make the variance issue not still an issue separately, just a less pressing one[1] if we agree that TypeIs is broken enough in other ways to justify removing it before it becomes something people rely on.


  1. as there would no longer be a way for the variance-specific portion belonging to higher kinded types and functions issue to surface, and would give us more time to work through a less disruptive solution for it. If we can find a way to make the variance problem irrelevant until someone is ready to tackle HKTs, it would save a lot of headaches and delay having many more contentious issue arguments. ↩︎

The ship has already sailed for TypeIs. Stub libraries are already using it and developers are relying on it (via typing_extensions), so I think it’s too late to remove it from the type system even if we were convinced that it was broken in some way. The time to raise objections about a new type feature is prior to its acceptance.

I’m still unconvinced that TypeIs introduces any new issues. As @Jelle notes above, there are ways to introduce similar issues with other type narrowing techniques. Here’s a simple example:

def make_float_list(v: Sequence[float]) -> list[float]:
    # Do not create a new list if v is already a list.
    return v if isinstance(v, list) else list(v)

int_list: list[int] = []
float_list = make_float_list(int_list)
float_list.append(1.0)

assert all([isinstance(x, int) for x in int_list])

This is a theoretical hole, but I agree with Jelle that it isn’t a problem in practice.

It was accepted for python 3.13, which hasn’t been released yet. The problems with it were not discovered immediately. I don’t think subverting the normal process around features with issues prior to release release with typing_extensions is healthy, as in most cases, this means that the moment a pep would be accepted, unlike non-typing features, there isn’t a beta window where mistakes can be fixed or removed. many typing features existing in typing_extensions before they are even accepted, meaning that anyone interested would have to test every single feature with an unknown deadline for when it might be accepted to avoid missing a window to address issues with things that are accepted.

Short of it being removed though, I don’t think it should imply in any documentation that this is safe in any way, which it currently does in multiple ways, but that can be addressed elsewhere.

3 Likes

I offered my thoughts on some of the more recent examples here over at Problems with TypeIs - #23 by carljm. The only other thing I’ll observe about those examples is that many of them use un-parameterized types like MutabilityIsADistraction or Collection, which are shorthand for MutabilityIsADistraction[Any] and Collection[Any]. As soon as you introduce Any, unsafety is to be expected. So a real hole in the type system should be demonstrable without using any Any types (explicit or implicit) in annotations (or via missing annotations.)

Regarding safety and user preferences: complete safety/soundness is absolutely not a goal of the Python typing specification. For one thing, it’s a gradual type system which doesn’t integrate with the language to insert runtime casts around uses of the gradual type (Any) – this already means that soundness is completely out of reach.

But even in the absence of Any, soundness is not the only or overriding goal. The goal is to provide a type system that is usable by Python developers and successfully catches as many bugs as possible. Static analysis of a very dynamic language is inherently a tradeoff between false positives and false negatives. Aiming solely for soundness aims to reduce false negatives to zero, but unavoidably the result will be an increase in false positives on code that is actually correct at runtime (given invariants the static analysis isn’t smart enough to see.) The way this curve tends to work out in practice, eliminating those last few false negatives will generally have a massive cost in increased false positives (and the proposals in this thread around variance most certainly exemplify this.) Generally the actual result of this (as in the Pyre example above) is that users just get frustrated and choose to silence the type-checker, and the end result is less safe and catches fewer bugs than if the type checker had chosen to accept a few more false negatives in order to achieve many fewer false positives.

So the Python typing specification will always accept some tradeoffs where the cost in false positives is too high to justify eliminating a few false negatives, if they don’t occur that frequently.

Also, it is a goal of the Python typing specification that it actually describes the behavior of Python type checkers, so that it is useful for both type checker implementers and type system users, not that it describes an impractical extreme which no type checker with actual users is willing to implement and few users want to actually use. So there will always be natural push and pull between the type specification and the practical realities faced by type checkers, and that’s good; it makes the type checkers better (and more similar in their behavior), and it makes the type spec more useful.

9 Likes

I think the balance between false positives and false negatives should not be dictated by an official typing spec, but rather should be chosen by each user in their configuration of their tools.

2 Likes

I’ve needed some time to think about this, but I don’t think this is at all a fair characterization of my perspective, why I raised this issue, or even the underlying issue that raised it.

And in the related discussion, we have a concrete example that doesn’t rely on Any, and which shows that information erasure paired with unsafe narrowing is enough to be unsound.

My answers to this were to suggest putting TypeIs on hold until these issues could be fully evaluated, as it seemed all existing cases were already coverable with cast, and then when that was ruled out by typing council members, to continue exploring this to document where users could introduce unsound behavior without requiring specification changes or type checker behavior changes to accompany it.

I, and many others I have spoken with, find that this is the wrong approach and that Python’s type system should define the way types work in Python, not how a subset of current type checkers implement type checking currently[1]

The inversion to accommodate specific type checkers and match their behavior is unfortunate, but it’s the reality we appear to have and I don’t see any willingness for people to just describe Python, and let type checkers build upon that at varying levels of pedanticness in what they raise errors for. This becomes especially apparent in certain situations with the typeshed, where incorrect types are accepted over correct ones, intentionally, to cover up issues from specific type checkers, and the answer has not uniformly been either to just use Any until it could be accurately typed, or have the type checker at issue handle the situation better.

I understand that getting from what we have to that ideal isn’t practical in the current situation, as people who would need to buy into it seem uninterested in it, but this would be an exceptionally clear way forward that would remove the majority of the contention between the type system, type checkers, and users finding it useful.[2] as more soundness would allow for more things to be correctly inferred and not need a complex explicit annotation from users. Much of the currently required type gymnastics is to satisfy type checkers that aren’t smart enough to do more on their own, or in some cases I’m aware of, that have specifically avoided doing something more on their own as that would deviate from specification, even if it is sound.

With that non-interest from a significant majority in mind, it has not been my goal to have absolute soundness in the type system, but to have clarity of where there is and is not soundness and what places introduce unsound behavior. This kind of reference is useful to several invested categories of people interacting with typing without requiring typecheckers change how they currently operate and with only noting it in documentation rather than requiring behavior around it in specification.


  1. right now, there’s only even one type-checker that’s compliant with the specification test-suite, and the oldest type-checker doesn’t even handle isisntance correctly, having an entire issue label for issues that interact with that. ↩︎

  2. This is also a path forward I’m working on at a slow independent pace, also taking the view that the current specification has ossified into what current type checkers are capable of using to support what users want, rather than what typing users want. ↩︎

3 Likes

Typing a language as dynamic as Python is inherently difficult, as the Pythonic conception of types often does not correspond to ready-made concepts prevalent in other less dynamic languages. Several peculiarities exist:

  1. As in gradual typing, things evolve over time and over use.
  2. Typing has its own ecosystem but the release cadence of typing features is quite tightly coupled with CPython (though there are also backports)
  3. “Testing” (not in a strict sense of the word) in release is often the easiest way to go given the particularities of typing in Python.

The typing spec going after what’s practical to be implemented by type checkers is a sensible idea, but is not without its risks. The boundary of “what’s practical” is constantly shifting as the consensus on idiomatic Python and the implementations of type checkers change. I think of typing in Python as somewhat a living experiment: things including generics and variance are already well-described both in theory and in statically-typed languages, but are not easy to conceptualize in dynamic languages such as Python, requiring a lot of extra experimentation (not to mention language-specific design choices like str / Sequence[str] and float / float | int). As the language and typing evolve (potential additions such as algebraic types, immutability, etc.), we are sure to discover more of these typing rough edges.

Implementing variance in the current way may well have been a good practical idea, as it covers (however imperfectly and slightly wrongly, but usefully enough) the majority of current use cases, but may have uncertain impacts down the line. Depending on the decided level of soundness TypeIs might or might not warrant pulling from 3.13 (the ship has sailed, though still not yet in the high seas).

Compared to the well-established parts of the language (syntax and API), typing is at a much higher rate of evolution, and we shall continue to encourage this kind of innovation whilst relieving the conflict of feature A that seemed practical in 2025 not playing well with the upcoming feature B that has become useful in 2027.

Summary

For smaller conflicts not at the scale of PEP 563 vs 649 :smiling_face_with_tear:

I think it is prudent to consider an on-ramp for typing features to receive more thorough evaluation after release but before being very widely employed. If this is at all possible, we can mark TypeIs (and other new features relying on the conception of variance as it is currently implemented) as provisional, and document the decisions so that users can adjust their expectations accordingly.

2 Likes

I’d like to echo this, but a little differently.

Right now, there’s no point in anyone trying anything in the type system if the existing type checkers don’t want to do it. There’s almost no point in even having multiple type-checkers if the specification says they all have to behave the same. And if they don’t all behave the same, then how does something like the typeshed function? Practically, it barely does.

So the specification seems to be targeted at the wrong thing. We don’t need a specification that describes type-checker behaviors, but one that describes python, and then type-checkers can use that to determine what rules would be most useful for their user-base.

But we do also need one that says what features are considered mature enough that they should be able to use in the typeshed. The current specification isn’t good for that either, as only one type-checker is available that can use everything tested for in the specification. This has in turn led to the typeshed having incorrect types on many occasions, with a reluctance to fix the types when it is possible.

Without some level of multiple levels of specification like this, it’s very hard to have meaningful progress. Everything is held back by staying compliant with type-checkers that don’t care about soundness at the language level.

Even additions like typing.Annotated and changes like PEP 649/749 strongly assume that annotations should be valid python expressions that are typehints within the current type system as specified, not how someone else is implementing their own checks, when previously (And currently) you could just use a future import to turn all of it into strings.

I think exploring what isn’t sound in python’s type system and marking those boundaries is as close to that as can be had right now, because it at least lets people interested in a sound type system be better informed when they are leaving what the specification covers.

Right now, the way people work around this when they need more from a type system is by either ceasing to write python, or doing something like based-mypy has and implement things the specification doesn’t support, deciding it doesn’t matter that there is no interoperability. It is self-selecting that people interested in type theory stop trying when issues end up treated this way.


There’s also other problems with ignoring soundness, when it results in people having misconceptions about how type theory works who then contribute back to an audience that doesn’t care enough about theory to check for soundness issues thoroughly before accepting features.

And there’s problems when people don’t plan ahead and accept unsoundness simultaneously. Decisions get made as if they exist in only the vacuum of now, but have long impacts.

TypeIs has surfaced a few issues, but some of these issues aren’t because of TypeIs. The variance one brought up in this thread will prevent HKTs from ever having a sensible implementation. Fixing this doesn’t need to be in the form of fixing variance to be expressed in a way that would break most users and enable HKTs, We could be deliberate, go the other way, rule out HKTs, and make type unification a fully decidable problem, which would allow any type checker to implement better inference using methods that rely on this, knowing it would not be a waste of time invalidated in the future.

The middle ground we exist in with things left as unsound with future additions dangling rather than understanding why things are unsound and how they interact now and in the future makes it hard to argue even using what we do have to do better for users.


It would be accurate to say that I really don’t like just dismissing people’s concerns about soundness and the ability to improve the type system over time by exaggerating their focus on soundness or downplaying the effects of unsoundness. It’s fine to accept some unsoundness, but I’d appreciate more deliberateness to doing so that isn’t dismissive of the concerns and complications, but takes those as part of the decision making process.

6 Likes