Whether private members should affect PEP 695's type-var variance

You can achieve this by declaring K and V manually with the variance you want. auto variance should be safe by default, if you want to do unsafe things, then you have to be explicit, I don’t think that is too much to ask. PEP695 was supposed to be a simplification for the most common cases, it was never intended to replace manually specifying TypeVar variance entirely, since that is not possible.

That being said I think it would be nice to extend TypeVars to allow combining auto variance with either covariance or contravariance, so you would get an error if the auto variance disagrees with your declared variance, which you then can explicitly ignore to force your declared variance.

Then we could extend PEP695 to set the covariant flag in addition to the auto variance flag for any type vars with the suffix _co and the contravariant flag in addition to auto variance for the suffix _contra. This would strike a good balance between being able to take advantage of PEP695 syntax and type safety. The behavior for type vars without either of those suffixes would remain the same. Although this would leave a gap in the specification for when you want to force invariance.

A good idea.

from typing import Callable

# `V` in the next line is inferred as invariant by Pyright, which is our main concern in this example.
class LazyMap[K, V]:  
    __getter: Callable[[K], V]
    __cache: dict[K, V]

    def __init__(self, getter: Callable[[K], V], /) -> None:
        self.__getter = getter
        self.__cache = {}

    def __getitem__(self, key: K, /) -> V:
        try:
            value = self.__cache[key]
        except KeyError:
            value = self.__getter(key)
            self.__cache[key] = value
        return value


m0: LazyMap[object, int] = LazyMap(id)
m1: LazyMap[object, object] = m0  # Reported.

The last line gets reported because V is currently inferred as invariant and object doesn’t equal int. But what will be unsafe for that? Everything that you will get from the map is an int, and will also be a object.

class A: ...
class B(A): ...
class C(B): ...

d0: dict[B, B]

d0 = {}
d1: dict[A, B] = d0 # Dangerous.
d1[A()] = B()
fake_b = next(iter(d0.keys()))  # Type inferred as `B`, but is actually `A`.
assert isinstance(fake_b, B)  # Boom.

d0 = {}
d2: dict[C, B] = d0  # Dangerous.
d0[B()] = B()
fake_c = next(iter(d2.keys()))  # Type inferred as `C`, but is actually `B`.
assert isinstance(fake_c, C)  # Boom.

d0 = {}
d3: dict[B, A] = d0  # Dangerous.
d3[B()] = A()
fake_b = next(iter(d0.values()))  # Type inferred as `B`, but is actually `A`.
assert isinstance(fake_b, B)  # Boom.

d0 = {}
d4: dict[B, C] = d0  # Dangerous.
d0[B()] = B()
fake_c = next(iter(d4.values()))  # Type inferred as `C`, but is actually `B`.
assert isinstance(fake_c, C)  # Boom.

Thanks, I understand now. Can’t you just explicitly declare V with the right variance, as @Daverball said here? You’re using dynamic features of Python here, and I don’t think it’s unreasonable to expect to have to be explicit.

At least that’s what I’ve always been told when I complain that it’s hard to annotate functions that accept duck-typed arguments correctly :grinning:

For dict, it’s members like __setitem__ that make V invariant instead of covariant and members like keys and items that make K invariant instead of contravariant. These members don’t exist on LazyMap in my example.

PEP 695 version of type-vars currently don’t support explicit variances. There has already been a proposal going for it: Proposal: Optional Explicit covariance/contravariance for PEP 695 - Ideas - Discussions on Python.org .

This is explicitly unsafe due to the mutability.

If we extend your example by just three more lines, the issue is clear

k = object()
m1[k] = object()
x = m0[k]  # x is an instance of object, but not an int, breaking expectations of m0

LazyMap doesn’t support __setitem__, so no.

sigh It doesn’t, but the point I’m getting at there still applies because the inner dict does. you can’t freely change that just externally while reusing the type vars publicly

A more detailed example that actually breaks it incoming in a moment.

from typing import Callable

# `V` in the next line is inferred as invariant by Pyright, which is our main concern in this example.
class LazyMap[K, V]:  
    __getter: Callable[[K], V]
    __cache: dict[K, V]

    def __init__(self, getter: Callable[[K], V], /) -> None:
        self.__getter = getter
        self.__cache = {}

    def __getitem__(self, key: K, /) -> V:
        try:
            value = self.__cache[key]
        except KeyError:
            value = self.__getter(key)
            self.__cache[key] = value  # This line right here precludes it
        return value


m0: LazyMap[object, int] = LazyMap(id)
m1: LazyMap[object, object] = m0
# m1.__getitem__ internally calls self.__cache 's setitem which was bound as an int

Your internal use of __setitem__ here prevents this from being provably safe. There’s some fun I could get into with this, but we’re going in circles here. If you are using a dict as a mutable mapping, dicts are invariant for a reason. It being internal doesn’t change that.

Yes, but nobody is forcing you to use PEP695. You can still define TypeVars the old way with the variance you want. I agree it would be nice if you could declare variance with PEP695 syntax, but it also was never intended as a complete replacement for the old way to declare type variables, so you just will have to put up with sometimes still needing to declare a TypeVar manually.

You not wanting to do that is not enough of a reason to make auto variance less type safe.

Please show me how that would be less type safe.

from collections.abc import Callable
from functools import lru_cache


class LazyMap[K, V]:  
    __getter: Callable[[K], V]

    def __init__(self, getter: Callable[[K], V], /) -> None:
        self.__getter = lru_cache(getter)

    def __getitem__(self, key: K, /) -> V:
        return self.__getter(key)

m0: LazyMap[object, int] = LazyMap(id)
m1: LazyMap[object, object] = m0  # this is fine

FWIW, simply not using a dict in a way that attaches to your public types removes the invariance. You’re no longer telling the type checker that your internal type is invariant and depends on the public generics. It’s not that what you were doing is unsafe per-se, it’s that you told the type checker you were using one thing, and it detected an incompatibility. “Fixing” that detection would require the type checker to know all possible ways the code will be used, which can’t happen since static analysis doesn’t have the full code graph as a guarantee like a compiler checking this would.

Just because it is type safe in one narrow example, that still leaves all of your other examples which weren’t type safe. Do you expect the type checker to perform a deep introspection of your entire implementation just to broaden the variance in case you don’t use a private member in an invariant way? Do you want to wait for minutes for the type checker to do its job instead of seconds?

To put it in other words: You have not demonstrated how the type checker is supposed to distinguish between the case where choosing the wrong variance would hide an error in the implementation and one where it would be fine to choose wrong, because only a safe subset of operations have been performed.

The variable being private does not prevent the class’s implementation from performing unsafe operations.

In this example, can calling m1.__getitem__ ever actually get you into trouble? You can’t pass in a value of type V or anything that would let __getitem__ know it was being called as a LazyMap[object, object] instead, so there’s nothing you can do to make m0.__getitem__ return a value of the wrong type (unless you directly access the private member m1.__cache).

So for example, would it be possible to add a new feature Private[] to be used as follows:

from typing import Callable, Private

class LazyMap[K, V]:  
    _getter: Private[Callable[[K], V]]
    _cache: Private[dict[K, V]]

    def __init__(self, getter: Callable[[K], V]) -> None:
        self._getter = getter
        self._cache = {}

    @Private
    def _update_cache(self, key: K, value: V) -> None:
        self._cache[key] = value # "Safe" use of private member bypassing variance

    def __getitem__(self, key: K, /) -> V:
        try:
            value = self._cache[key]
        except KeyError:
            value = self._getter(key)
            self._update_cache(key, value) # Safe call to private method
        return value

m0: LazyMap[object, int] = LazyMap(id)
m1: LazyMap[object, object] = m0
m1['foo'] # Safe: returns an int, which is a valid object
m1._cache['bar'] = object() # Unsafe: accessing private member
m1._update_cache('bar', object()) # Unsafe: calling private method

This would allow safe type checking for both library authors and library users without any hacks or workarounds, and would solve the problem in this thread of being able to correctly infer variance (private members would be ignored when calculating variance). It would also let you actually validate your assumptions that your type can be used safely with the variance you want - if you just force a particular variance with a TypeVar and then # type: ignore any associated type warnings, you have to rely on hoping you reasoned about it correctly. But with Private the type checker can actually check these assumptions for you.

The only final question is - is this actually safe? Or is there still some way you can break m0 by accessing its public interface through m1? I think there’s actually bug in both mypy and pyright, because the way I thought you might try breaking this is by adding:

    def update_getter(self, getter: Callable[[K], V]) -> None:
        self._getter = getter

Then you could do m1.update_getter(lambda x: x) to break m0. But I think the presence of this method should actually force V to be invariant (well, contravariant - but invariant when combined with the other definitions). To see why, try this class:

from typing import Generic, TypeVar

T_co = TypeVar('T_co', covariant=True)

class Wrapper(Generic[T_co]):
    def __init__(self, value: T_co):
        self._value = value

    def get(self) -> T_co:
        return self._value

    #def set(self, value: T_co) -> None:
    #    self._value = value

    def set_from(self, fn: Callable[[], T_co]) -> None:
        self._value = fn()

def f(w: Wrapper[object]):
    reveal_type(w.get())
    #w.set('foo')
    w.set_from(lambda: 'foo')

x: Wrapper[int] = Wrapper(1)
f(x)
print(x.get())

Both mypy and pyright report no errors here, despite the fact that x.get() returns the string 'foo'. However if you uncomment the Wrapper.set() function, they both complain about T_co being used non-covariantly.

The reason this is not a solution and also the reason why your last example does not raise any type errors, even though it arguably should, is actually the same. It would require type checkers to check multiple levels deep, i.e. it would have to look at the implementation of the method in addition to the signature of the method in order to detect variance issues. While this is certainly possible, you can also take this dependency chain infinitely far, when do you want the type checker to stop following the chain?

It is not realistic to have that deep of an bidirectional inference at reasonable speeds, so you would be sacrificing performance for the benefit of detecting a small number of false positives/negatives introduced through shallow inference. So in cases like this it’s up to us to spot these problems and change the variance of the type vars accordingly. That is the price you pay for getting near instant feedback from mypy.

Other type checkers in the future may make a different trade-off and perform more deep inference at the cost of speed.

In that very narrow example, and assuming nothing else touches it ever, no. However, there were several other examples in this thread prior to that that did experience an issue, and at least one of them, marking it as private for the type checker wouldn’t have fixed because the issue occurred inside the class’s own methods.

I used functools.lru_cache to demonstrate decoupling the exposed type, but also previously discussed other options like having the internal cache be typed as dict[Any, Any]. The entire problem was that internally there was a type that the details of became part of the public type by having it use the public generic types

However, there were several other examples in this thread prior to that that did experience an issue, and at least one of them, marking it as private for the type checker wouldn’t have fixed because the issue occurred inside the class’s own methods.

Are you referring to this snippet?

    def __getitem__(self, key: K, /) -> V:
        try:
            value = self.__cache[key]
        except KeyError:
            value = self.__getter(key)
            self.__cache[key] = value  # This line right here precludes it
        return value

In this case, I am claiming (or rather, I think it’s the case, but haven’t 100% convinced myself / proven) that this is safe if you use the Private[] formulation I made above. Even if you call __getitem__ through a LazyMap[object, object] when it is in fact a LazyMap[object, int], __getitem__ has no way of knowing that you’ve done this. The only way it would be able to know is if it can somehow access a value that’s statically typed as V but is actually an object rather than an int.

And my claim is that in order for it to access such a value, the signature for __getitem__ (or some other public member) would have to include V in a non-covariant way. If you gate all such members behind Private[], then anyone using your class correctly only through public members has no way to violate this. In particular, in the example above, value has type V, but it’s come from self.__cache, which means it must have the real runtime type of V (which would be int) and won’t somehow be object.

I do think this claim needs some scrutiny though - it “feels” valid to me, but I haven’t properly sat down to try to formally prove it.

I don’t see a good case in python to typecheck something differently based on private/public convention. The language doesn’t prevent access to private members, and it’s possible to type this correctly without it. The fewer typing divergences from how the language behaves the better.

The initial case in the thread presents a thorn: what happens when a class’s private variable marked this way comes from __init__ and is mutable?

I could come up with a set of rules where this could be done, but it’s an increase in complexity around variance, which is already a complex topic that people get wrong with the current rules, and it wouldn’t be complexity isolated to the type checking side, it would change how developers need to reason about it as well. I’m not inclined to see that as improving the developer experience, and would lean instead towards better diagnostics from type checkers that explain what caused that specific variance and a reference doc on patterns for dealing with variance.