Whether private members should affect PEP 695's type-var variance

mikeshardmind · November 20, 2023, 2:52pm

In that very narrow example, and assuming nothing else touches it ever, no. However, there were several other examples in this thread prior to that that did experience an issue, and at least one of them, marking it as private for the type checker wouldn’t have fixed because the issue occurred inside the class’s own methods.

I used functools.lru_cache to demonstrate decoupling the exposed type, but also previously discussed other options like having the internal cache be typed as dict[Any, Any]. The entire problem was that internally there was a type that the details of became part of the public type by having it use the public generic types

QuantumTim · November 20, 2023, 3:19pm

However, there were several other examples in this thread prior to that that did experience an issue, and at least one of them, marking it as private for the type checker wouldn’t have fixed because the issue occurred inside the class’s own methods.

Are you referring to this snippet?

    def __getitem__(self, key: K, /) -> V:
        try:
            value = self.__cache[key]
        except KeyError:
            value = self.__getter(key)
            self.__cache[key] = value  # This line right here precludes it
        return value

In this case, I am claiming (or rather, I think it’s the case, but haven’t 100% convinced myself / proven) that this is safe if you use the Private[] formulation I made above. Even if you call __getitem__ through a LazyMap[object, object] when it is in fact a LazyMap[object, int], __getitem__ has no way of knowing that you’ve done this. The only way it would be able to know is if it can somehow access a value that’s statically typed as V but is actually an object rather than an int.

And my claim is that in order for it to access such a value, the signature for __getitem__ (or some other public member) would have to include V in a non-covariant way. If you gate all such members behind Private[], then anyone using your class correctly only through public members has no way to violate this. In particular, in the example above, value has type V, but it’s come from self.__cache, which means it must have the real runtime type of V (which would be int) and won’t somehow be object.

I do think this claim needs some scrutiny though - it “feels” valid to me, but I haven’t properly sat down to try to formally prove it.

mikeshardmind · November 20, 2023, 3:30pm

I don’t see a good case in python to typecheck something differently based on private/public convention. The language doesn’t prevent access to private members, and it’s possible to type this correctly without it. The fewer typing divergences from how the language behaves the better.

The initial case in the thread presents a thorn: what happens when a class’s private variable marked this way comes from __init__ and is mutable?

I could come up with a set of rules where this could be done, but it’s an increase in complexity around variance, which is already a complex topic that people get wrong with the current rules, and it wouldn’t be complexity isolated to the type checking side, it would change how developers need to reason about it as well. I’m not inclined to see that as improving the developer experience, and would lean instead towards better diagnostics from type checkers that explain what caused that specific variance and a reference doc on patterns for dealing with variance.

QuantumTim · November 20, 2023, 3:33pm

Hmm, I’m not sure I understand what you mean by this:

Why would a type checker need to check the implementation in this case? I would imagine that if you have a generic class with something like this:

class Foo[T]:
    def bar(self, value: Something[T]) -> None:
        ...

Where Something[T] is covariant in T, then this should force Foo[T] to be contravariant in T. For example, if Something is actually nothing (ie replace Something[T] with T), then we have the normal rule that function parameters are contravariant. I shouldn’t need to look at the implementation of bar to know that it’s possible to break the variance rules and this is unsafe. If my implementation doesn’t do anything with value to access anything of type T then I think it should be possible to rewrite the function definition using Something[Any] instead.

For an even simple example, consider:

class Base:
    def foo(self, value: Callable[[], object]) -> None:
        pass

class Derived(Base):
    def foo(self, value: Callable[[], int]) -> None:
        pass

This is basically the same, just using concrete classes instead of generics. In this case mypy actually correctly detections a violation of LSP, but pyright still doesn’t. But the point it, it shouldn’t need to look at the implementation to figure this out.

Daverball · November 20, 2023, 3:54pm

I’m talking specifically about your example with set_from. A callable by default has covariance on its return type, so in order to detect that T_co should actually be invariant, because you are assigning the result from the callable to self._value you have to look at the implementation in addition to the signature, whereas in the set case it’s already obvious from the method signature, that T_co should be invariant.

This is still relatively shallow, so a type checker could choose to make this work and the performance hit would probably be not that bad, so I welcome you to open issues for this false negative, but you can take this arbitrarily far through nested generics and complex expressions to construct examples that even with this one case addressed, would still yield the incorrect variance.

Edit: Thinking through it more carefully I think you are right that because the parameter itself should be contravariant, that should negate the covariance of Callable and make T_co invariant since it can’t both be covariant and contravariant, which could be detected without deeper inference. I do still get tripped up about the interaction between generic parameters and their bound type vars on occasion.

QuantumTim · November 20, 2023, 4:03pm

Do you mean by using dict[Any, Any]? This is okay for type-checking consumers of the class, but doesn’t allow (full) type-checking of the class itself, because the use of Any basically bypasses those checks.

So I would simply annotate the __dictionary variable as Private[]:

class Translator[K, V]:
    __dictionary: Private[dict[K, V]]

    def __init__(self, dictionary: dict[K, V], /) -> None:
        self.__dictionary = dictionary

    def __getitem__(self, key: K, /) -> V:
        return self.__dictionary[key]

And then I think there is no problem here. There is no way to get a value of the wrong type into the dictionary through Translator by the same logic as above - any method that would add such an incorrect value to the dictionary must somehow be able to access such an incorrect value, which it can’t do if the variance rules are followed:

    def breaks_dict(self, key: K, some_value: X) -> None:
        self.__dictionary[k] = some_function(some_value)

If this method existed, then some_function(some_value) would need to return a value of the wrong type - which can only happen if X is related to V in a covariant way (eg if X == V, or X == tuple[V], or X == Callable[[], V]). And the type signature of this method would then force V to be contravariant/invariant.

The benefits I can see (assuming the feature is sound) are:

Library authors can explicitly mark parts of their code that a type checker should “prevent” outside access to (ie not list it when tab completing, flag an error if used anyway).
Instead of having to force override the variance of a type variable, the type checker would actually have a way to check the validity of what a developer thinks is the correct variance. I agree that better diagnostics around type checkers would be good, but perhaps all a developer needs to do to “fix” their variance issues is mark a particular field as private.
- As a developer I much prefer when I can teach the type checker what it needs to know to validate my assumptions, rather than override it with # type: ignore, since the latter might still be hiding a problem that I reasoned about incorrectly.
Library authors will be able to both provide correct typing to users of their code and have type checking for the implementation itself.

And I don’t think the rules would be too complex:

When determining the variance of a class, ignore Private members.
Outside of a class definition, error on uses of Private members, and don’t include them in tab-completion.
Inside of a class definition, type check and tab-complete Private members as normal.

QuantumTim · November 20, 2023, 4:30pm

Hehe, yes I was just trying to think it through again myself and try to work it out . I think actually, anything that is a type to a method argument should have its variance flipped - covariant becomes contravariant and contravariant becomes covariant. See the following:

from typing import Callable, Generic, TypeVar

T_co = TypeVar('T_co', covariant=True)
T_contra = TypeVar('T_contra', contravariant=True)

class TestCo(Generic[T_co]):
    def __init__(self, value: T_co): # Special case for __init__
        self._value = value

    def get(self) -> T_co:
        return self._value

    def pass_to_cb(self, cb: Callable[[T_co], None]) -> None: # Variance is flipped
        cb(self._value)

class TestContra(Generic[T_contra]):
    def __init__(self, cb: Callable[[T_contra], None]): # Special case for __init__
        self._cb = cb

    def test(self, value: T_contra) -> None:
        self._cb(value)

    def test_from_tuple(self, tup: tuple[T_contra]) -> None: # Variance is flipped
        self._cb(tup[0])

    def test_from_callable(self, getter: Callable[[], T_contra]) -> None: # Variance is flipped
        self._cb(getter())

def fn(x: object):
  print(x)

co0: TestCo[int] = TestCo(5)
co1: TestCo[object] = co0
co1.get()
co1.pass_to_cb(fn)

contra0: TestContra[object] = TestContra(fn)
contra1: TestContra[int] = contra0
contra1.test(1)
contra1.test((2,))
contra1.test_from(lambda: 3)

Note: stuff like tuple[T_contra] in the argument list doesn’t mean tuple is somehow being used contravariantly - it means that TestContra[object] is a subtype of TestContra[int], and to make that work the former has test_from_tuple(self, tup: tuple[object]) and the latter has test_from_tuple(self, tup: tuple[int]). Which is correct - if I have a TestContra[int] I can only pass a tuple[int] to it, but the actual function might support more possible arguments (eg tuple[object]). It might even support object for all I know.

mikeshardmind · November 21, 2023, 2:38am

No. If you wanted to decouple the types that way you could, but there’s no need to.

typing it as

class Translator[K, V]:
    __dictionary: dict[K, V]

    def __init__(self, dictionary: dict[K, V], /) -> None:
        self.__dictionary = dictionary

    def __getitem__(self, key: K, /) -> V:
        return self.__dictionary[key]

works fine today as invariant, and any other variance is incorrect, as can be surfaced by exploring what happens when that dict exists elsewhere still.

The inner dictionary needs to follow the variance rules for mutable mappings and other references to it may still exist.
Because of this, anything that goes into it and comes out of it must be precisely that type.
Because of this, K, V being invariant is correct.

There has been no benefit shown to trying to come up with narrow cases where we can reason something is safe when the categorical rules work. The thread starts with someone wanting to special case variance in private, but we can find various ways that this isn’t needed that work today without adding special casing based on conventions of privacy that aren’t enforced.

Using a Translator[int, str] where a Translator[object, object] is expected hasn’t been shown to be useful, and I would argue cannot be useful.

If you care about the type specifically but allow Translator[int, str], you wouldn’t be expecting a Translator[object, object] If you just care that it’s a translator, you can just accept a Translator (implicitly Translator[Any, Any]), the underlying premise of changing this is itself flawed because the problems presented here are not type checking being incorrect but of bad specification of expectations. Teaching people to specify types in a way that feels more intuitive (and providing more intuitive tooling and better diagnostic feedback) is a much better goal for “fixing this” than having typing diverge from actual language more.