Whether private members should affect PEP 695's type-var variance

It’s not clear in PEP 695 whether private members of a class should be taken into account when variance of the class’s type-vars gets inferred. Pyright’s current implementation (v1.1.336) are using them all, but I personally think that they need to be excluded. What will be your thoughts?


For example –

class Translator[K, V]:
    __dictionary: dict[K, V]

    def __init__(self, dictionary: dict[K, V], /) -> None:
        self.__dictionary = dictionary

    def __getitem__(self, key: K, /) -> V:
        return self.__dictionary[key]

I imagined that K would be contravariant and V would be covariant, because K is only used as input and V is only used as output. But in fact, K and V are both inferred as invariant, because __dictionary is typed as dict[K, V] and both arguments of dict are invariant.

What’s worse, even after re-typing __dictionary into collections.abc.Mapping

from collections.abc import Mapping

class Translator[K, V]:
    __dictionary: Mapping[K, V]

    def __init__(self, dictionary: Mapping[K, V], /) -> None:
        self.__dictionary = dictionary

    def __getitem__(self, key: K, /) -> V:
        return self.__dictionary[key]

K is still inferred as invariant instead of contravariant, because there are methods like keys and items on collections.abc.Mapping which return the key type and make it invariant.

To make this sample work as I expected, I need to add a custom protocol, with only __getitem__ method which I’m going to use, to further restrict the type of __dictionary. But I don’t think that should be required.

There’s no such thing as a private member. There are conventions, and with the double underscore, Name mangling happens there which can prevent accidental overriding in subclasses, but this can still be accessed outside the class. Even if it weren’t accessible outside the class, this should need to be invariant though. V being potentially mutable is enough to force invariance here

This feels similar to mutable keyword in C++ that intended for classes to specify variables that are internal only and can be mutated in const methods. From public API perspective of usage the const nature is maintained. Similarly here I think any type holes that involve accessing/manipulating private member should not be considered for variance.

It’s common practice in stubs for private members to not even be included at all. Most typeshed stubs exclude internal members/apis. If typical way of writing a stub for a class is covariant, implementation also should be covariant.

Even if this were accepted (And I don’t think it should be, python isn’t C++, and there’s fewer guarantees about mutability and private meaning private) There are problems with that approach. The public methods return things contained by the private container in the example above, and not as a copy. If those are mutable this extends the issue to the public by convention API. See the link in my prior post for a long discussion of why this is the case.

I don’t follow how your link applies here. Let’s do some concrete examples,

class Foo:
  x: float

class SubFoo(Foo):
  x: int

Here mutability is an issue as you can make SubFoo → pass it to function with Foo → and mutate x. But that relies on x being an attribute. You can not change object SubFoo/Foo from return type. Changing of return value is handled separately. Most typeshed and existing type checkers current variance assumptions would be problematic if return type implied invariance.

class Foo:
  def bar(self) -> float:
    ...
class SubFoo(Foo):
   def bar(self) -> int:
     ...

is safe. You can not mutate returned int. A mutable container like list, list being invariant already prevents you from subclassing with it. So I think,

class Translator[K, V]:
  def __init__(self, dictionary: Mapping[K, V], /) -> None:
    ...

  def __getitem__(self, key: K, /) -> V:
    ...

should have Translator with K as contravariant and V as covariant. Translator[int, int] should be compatible with Translator[int, float]. Translator[int, list[int]], is incompatible with Translator[int, list[float]] but that’s because list is already invariant. That should cover mutation concern for a return type. This is also what pyright currently would infer. And this is how stub for that interface would usually be written. There are likely generic classes in typeshed that if you started to fill in private attributes you would change variance.

edit: If you disagree then I’d request concrete example of type unsafety of passing Translator[int, int] to function expecting Translator[int, float] with only stubbed api (equivalent to implementation api excluding private variables).

Here is a minimal mypy play to illustrate why you can’t just rely on the invariance of dict to get the correct behavior, even if the only place you use the dictionary is inside __init__: mypy Playground

Since the value type was explicitly made covariant a Translator[str, object] should accept a Translator[str, int] since that is a proper subtype. But if you operate on the type[Translator[str, object]] and try to create an instance of it you now violate the invariance of the value type on the dictionary that’s passed into the constructor.

And this is completely ignoring the fact that the implementation still needs to be able to catch errors on private members, which is the larger issue with this. The type checker cannot enforce that nobody changes the values/keys in the mapping unless you use a protocol that explicitly forbids that. I.e. using SupportsGetItem is the correct protocol to use in this case, if you know you never want to modify the items in the mapping and never want to retrieve keys. If you use anything that’s mutable on the keys or values then you have to end up with invariant type vars to ensure correctness.

Your example is unrelated to private effect on variance and also unrelated to dict.

class Foo(Generic[T]):
  def __init__(self, x: T):
    ...

class IntFoo(Foo[int]):
  ...

a: type[Foo[object]] = IntFoo
a("str")

The underlying issue with your example is that __init__ is excluded from normal variance rule calculations. __new__ is also excluded. Both these due create type safety hole that’s intentionally ignored as not allowing class like Foo to be covariant makes working with generics difficult. This exact hole that relies on type trick to make unsafety is mentioned in here (1/2). Yes this is a safety hole, but in general __init__/__new__ both break liskov compatibility/type rules extremely commonly that it is necessary pragmatic hole.

Yes, I am aware of that. But that’s not really a counter-argument against auto variance doing the most safe thing it can, if you want to do something unsafe you should be forced to specify variance or use a protocol that matches the variance you want to achieve, regardless of safety exceptions afforded to __init__/__new__.

As you’ve noticed mypy does not emit any errors for this example, even though it is doing something unsafe, so it is possible to circumvent safety, but it should not be the default in something that is inferred automatically.

That being said I understand that your point was unrelated to what the auto variance in your minimized example is supposed to yield, since you could remove the __init__ entirely and now your point stands.

I still disagree with your argument that the convention of excluding private members from third party stubs justifies that members that are marked private should have no effect on variance. This convention exists for the sake of keeping third party stubs easily maintainable, for inline annotations this is completely irrelevant, since you want to be notified about errors in your implementation if you use inline stubs. If you don’t care, then don’t use inline type hints or omit type hints from the parts of the API where you don’t care.

You’re inferring the wrong connection, and I could have been more explicit about it, but I’ve been trying to not just dump the entire history of broken variance handling and it’s consequences into a thread where someone already knows the correct variance and is trying to say we should have less type safety because they don’t want to comply with it.

Anything that implements __setitem__ and __getitem__ essentially needs to follow the same rules as mutable instance variables as they aren’t any different from a typing perspective.

The comparison to stubs is also inappropriate, as a correct stub for this would use invariant type vars. When you use a stub to break out of what is typed directly, less is checked for you, that doesn’t make the stub correct if you do the wrong thing and it can’t be checked for you.

1 Like

If the example were rewritten as

from typing import Protocol

class MapperProtocol[K, V](Protocol):
    def __getitem__(self, key: K, /) -> V: ...

class Translator[K, V]:
    __dictionary: MapperProtocol[K, V]  # It's still a `dict` in runtime.

    def __init__(self, dictionary: dict[K, V], /) -> None:
        self.__dictionary = dictionary

    def __getitem__(self, key: K, /) -> V:
        return self.__dictionary[key]

as I said last in my OP, there would be no problem that K should be contravariant and V should be covariant.

My argument is that it should not require so much effort to have the job done. When the internal data is more than a simple dict with dict.__getitem__ to use, the extra code to make some suitable abstraction will probably be much longer and much harder to maintain. I’ve already had a headache to write a stable protocol to have both dict.__getitem__ and dict.get on it, while it frequently breaks on upgrades of Pyright.

That example works because by using a protocol without __setitem__ as the type hint, you’re promising the type checker new items won’t be set, while also providing that knowledge of it not being part of the object’s contract to all consumers in the annotation (and it will yell at you if it sees them being set) the invariance forced by having both __setitem__ and __getitem__ is no longer there.

The internal dict there would otherwise be invariant, and is sharing the parent class’s type vars. Without the promise that __setitem__ won’t be used provided by the protocol there defining structurally what’s allowed to be used and __setitem__ not being present, the type checker doesn’t know it won’t be used. (This isn’t being checked in a compiled language at compile time, static analysis can’t know everything here)

They may not be technically private, but they are treated as private.

As I just confirmed, neither Mypy nor Pyright supports accessing such members with their mangled names like “_Class__name”, so it should be safe to say such members would not be accessed outside their enclosing classes.

While __dictionary is private, how it is used doesn’t affect variance of K or V, even if __setitem__ gets called. Class type-vars are already treated as invariant for internal usages.

I’ve bolded the relevant section of what’s already been said. You are exposing the internals in your types. Either don’t do that, or deal with that you are the one who tied the variance together like that.

If the dict is truly an internal implementation detail that you don’t want to deal with checking, its type information shouldn’t be leaking out. You can leave it typed as dict[Any, Any] and rely on it being internal and only intended to be accessed as supported. Any coerces to the specific type implicitly just fine. you’ve specified that this is related to the exposed type and have created the connection, the type checker is correctly informing you of this issue.

You can’t remove checking this (as a blanket action on the type system as a whole) without breaking libraries that want their own internals checked too. In my own code and code that I maintain, I’d happily use the protocol if that was the intent because it means my own code is checked that the important invariants are upheld internally.

This version is checked that internal use actually conforms with the exposed intended use
from typing import Protocol

class SupportsGetItem[K, V](Protocol):
    def __getitem__(self, key: K, /) -> V: ...

class Translator[K, V]:
    __dictionary: SupportsGetITem[K, V]  # whatever is assigned to this, only using `__getitem__` is supported

    def __init__(self, dictionary: dict[K, V], /) -> None:
        self.__dictionary = dictionary

    def __getitem__(self, key: K, /) -> V:
        return self.__dictionary[key]
This version decouples the internal use from the public interface, but loses the ability for the type checker to enforce that the internal use matches the exposed types
from typing import Any

class Translator[K, V]:
    __dictionary: dict[Any, Any]

    def __init__(self, dictionary: dict[K, V], /) -> None:
        self.__dictionary = dictionary

    def __getitem__(self, key: K, /) -> V:
        return self.__dictionary[key]

Both options are available in the type system currently, and it’s not necessary to remove the ability for people to check things they want checked.

I’m still not sure whether you would agree that __dictionary in my example is private and will not get accessed outside implementation of Translator. If the answer is no, that should be the base point that we have a disagreement.

With all due respect, whether we agree on __dictionary being accessible outside and the meaning of private in python isn’t going to change the situation here, as libraries and their authors are type checker users too. The ability to have internals checked for consistency with the exposed API is a feature for many of those users, not a bug. The type system also gives you ways to choose how rigidly that API boundary is defined.

With that said, It can be accessed outside of the translator in at least two ways. One of them, as you pointed out isn’t supported by some type checkers already (mypy and pyright aren’t the only type checkers in the ecosystem). I disagree with the type checkers taking this behavior, but it isn’t relevant to the overall answer here.

The other is that you are accepting a mutable reference in __init__. There is no guarantee the caller is not still doing other things with the same dict.

Taking a step back for a moment, I think there are two distinct issues here:

  1. What’s the correct behavior for type checkers?
  2. How do we make it easy for users to express what they want checked?

Right now, I believe the answer to the first one is “the type checker is correct for the case on-hand, and there are x other ways to express it depending on your intent”

The latter has been a larger ongoing recurring topic with a lot of concerns. Variance and API boundaries not being “Easy” was definitely brought up there, and I think this is a real thing we need to address.

I’m not sure I agree with a protocol with a single method defined as effort, but I have a feeling that effort here is less the protocol itself, and more the process to determining it was needed and why. If that’s accurate, I think a more productive way forward would be pushing type checkers to have built-in explanations for what caused specific variance to be determined. This is a not-uncommon pain point in the type system, and even those with experience in it or experience with similar, yet different languages (With slightly different concerns) get surprised by the effects of it at times.

Sadly, it cannot always be worked-around by protocols.

import copy

class Translator[K, V]:
    __dictionary: dict[K, V]

    def __init__(self, dictionary: dict[K, V], /) -> None:
        self.__dictionary = dictionary

    def __getitem__(self, key: K, /) -> V:
        value = self.__dictionary[key]
        value = copy.copy(value)
        self.__dictionary[key] = value
        return value

__dictionary is accessed as a mutable map internally, so its type can at best be abstracted as a mutable mapper protocol with __getitem__ and __setitem__ on it, and V will still be invariant instead of covariant, if other conditions remain the same.

Why would you want to reassign the value in the dictionary if you always want to return a fresh copy? The assignment is redundant in this case, unless you want the reference count for the original object to decrease. As soon as you are setting values in the implementation it just simply is no longer covariant, there are no ifs and buts about it. If you want to ignore type safety, you have to do it manually and at your own peril.

If your aim is to create a one time copy of the elements so you don’t keep a reference to the original objects there are other covariant protocols you can use, such as SupportsItems[K, V] and Iterable[tuple[K, V]].

Please don’t pay too much attention to method implementations as they are not important for type-var variances.

Here I’m giving a more practical example:

from typing import Callable

class LazyMap[K, V]:
    __getter: Callable[[K], V]
    __cache: dict[K, V]

    def __init__(self, getter: Callable[[K], V], /) -> None:
        self.__getter = getter
        self.__cache = {}

    def __getitem__(self, key: K, /) -> V:
        try:
            value = self.__cache[key]
        except KeyError:
            value = self.__getter(key)
            self.__cache[key] = value
        return value