Take 2: Rules for subclassing Any

mikeshardmind · March 10, 2024, 3:46am

I’m going to cut to the chase first: I think this is the best option in the thread so far but I have concerns with it and I wonder if there’s a better way that is reasonable to work on now, or if this is good enough for now, and we can revisit it if it becomes a problem later or if we make enough progress on the other things that are bigger issues.

My concerns with MRO reordering for this

I think there’s some danger in introducing another pattern where typing intentionally deviates from runtime, so I gave it some more thought. I was especially concerned that we needed to be careful to prevent wording that would preclude fixing this in the future if it does become a problem.

I think the wording you provided works in this regard, at least for all of the things I’m aware of people currently working on additions for the type system, though I’m concerned a bit here because I found your argument against this persuasive enough in the past to take the time to avoid specifying something which would cause type errors when substituting in Any. I understand the pragmatism you’re aiming at here as it’s similar to the one I aimed for before as well, but I’m not without concerns. Part of the reason the middle ground between treating this as “Any” and treating it as “only the known types” was “LSP compatible-Any” is that is the most type information we could retain, while the argument made would not apply.

Argument made, for context

Here’s an example that shows how option 5, as it’s currently formulated, violates a core principle of the type system.

In a gradual type system, Any is treated as a stand-in for “any type that could conceivably satisfy type compatibility requirements”. In other words, when you see Any, you should think “is there any type that I could substitute here that would make this work without a type violation?”. If your code is fully typed (i.e. no use of Any) and no type violations are reported by a type checker, you should be able to replace any type used in your code with Any, and it should still type check. This is a core principle of any gradual type system, and its a principle that the Python type system (and mypy and pyright) try to honor in all cases. If you change an existing type to Any and new type errors are reported, then there’s something very wrong.

https://github.com/CarliJoy/intersection_examples/issues/31#issuecomment-1866869547

the case of type[SomeType] vs type[Any] is an existing place where this argument is ignored, so it isn’t universal in python right now, but I think that decision too could be changed in the future to use a lower and upper bound, rather than a single type, see below about “LSP-compatible Any”

As for impact, this is the status quo and I haven’t seen evidence that people expect more than this. It came to my attention while constructing “how to teach intersections”, not through any real code so since my prior message, I made attempts to construct an example where it might matter, and for each one, there were multiple solutions available which seemed better than “change the behavior of existing type checkers for this example in particular”

After examining both the immediate and foreseeable future impact, I believe that this would be an acceptable and low-impact decision to go with for now, and until someone has a motivating case beyond theory, and that it is a way that can be changed later if this particular decision causes problems later on.

Further detailing LSP compatible Any, and type bounds

Maybe this should have been expressed better, but based on a couple of responses, I think a couple of people realized this was related to the set-theoretic ideas.

I don’t think an “LSP-compatible any” (either by the methods I suggested, or improving the definition of Any as @carljm provided) would be “just Any”, but we would need a better way to express lower and upper bounds on a type within the type system to have this be a user-expressible concept, and likely for this to be usable by language servers as well. A language server could choose to serve completions for only the known lower bound, and have a separate configurable warning if using something above a minimum known bound, but within what’s allowed by the upper bound.

If Any is constrainable, then having a way to spell upper and lower bounds comes into play.

This could re-resolve the prior decision on type[Any] such that type checkers see

x: type[Any] = ...

And surmise that the lower known bound for x is type (the runtime type, type), and the upper bound is Any. This particular example might be a little too magical with the overloaded meaning of type, lets look at one of the ones from the original post:

from untyped import Unknown  # standing for an untyped library

class SerialMixin:
    def serialize(self) -> bytes:
        ...

class AmbiguousExample(Unknown, SerialMixin):
    def foo(self) -> int:

x = AmbiguousExample()
reveal_type(x.foo)  # Lower bound: () -> int, Upper bound:  () -> int
reveal_type(x.serialize)  # Lower bound: () -> bytes, Upper bound: Callable[..., Any]
reveal_type(x.bar)  # Lower bound: Any, Upper bound: Any

Interestingly, this also gives us another way to arrive at preferring the known definitions, saying that it’s only currently considered safe to use the known lower bound.

While I think most of Python falls under set-theoretic typing in all but the official adoption of the model, the issues this presents for existing indirect users of typing if not done with other supporting type system features suggests a strong reason to defer any such change to use a definition closer to the set-theoretic model of typing until we have stronger motivations to do so, or until there is nothing else more pressing. Doing this would be a larger undertaking than I can personally justify at this point in time, even seeing the theoretical issues at play, as well as the theoretical future benefits.