Encoding mypy's "overlapping types" in the type system

cslamber · January 24, 2025, 1:28am

mypy’s strict_equality flag is very useful but does not extend to functions that use equality under the hood, e.g. __contains__ on builtin data types, this limitation already having been the subject of several discussions and PRs in typing[1]. I propose to encode the notion of overlapping types into the type system as typing.Overlapping[T]. Overlapping[T] would be treated as the union of all types that could potentially be compare equal to a value of type T (or at least couldn’t be outright rejected by the checker). This would allow for object.__eq__ to be type concretely as def __eq__(o: Overlapping[Self]) -> bool instead of having object as its argument, and similarly for standard library containers like list.__contains__ and set.__contains__. Since this would bring the behavior of object.__eq__ exactly inline with mypy’s strict_equality behavior I believe this would not have the same false positives as the previous discussions on stricter typing of __eq__ and __contains__. We would be able to retain compatibility with strict_equality=False by letting Overlapping[T] always be object.

I’m curious if there’s any interest, oppositions, or better solutions (or better names). I got burned by the lack of strict checking on __contains__ recently so I know there’s a use at least in my case. If there’s not any glaring reason this would be rejected I could try putting together an implementation.

[1]: I’d like to link more but new users can’t use more than 2 links (this includes markdown footnotes apparently).

carljm · January 25, 2025, 12:56am

Thanks for the idea. I’m interested in better ways to catch equality and containment tests that will never succeed. A few thoughts:

If the semantics are “types whose inhabitants could possibly compare equal to inhabitants of the type T”, then I think it would need a name that more clearly references equality. I would assume “overlap” to mean “are there any objects that inhabit both of these types”, not “can objects inhabiting these two types compare equal to each other.”

For an arbitrary user type C, how is a type checker supposed to know what types can compare equal to C? One possible place to put that information is in the type annotation of the first argument to __eq__, but if we have Overlapping[Self] in that spot, it seems a bit circular.

One issue with annotating the first argument of __eq__ to reflect “types for which I will not return NotImplemented” rather than object is that it makes the type wrong for type checking the internal implementation of the __eq__ method. An __eq__ method really does need to handle all objects, if only to return NotImplemented for many of them. If it were annotated to only accept the types it actually supports, type checkers would report the return NotImplemented branch as dead code.

ImogenBits · January 25, 2025, 5:01pm

I think the best solution for this kind of thing is what rust calls associated types. That is, some protocol uses a type alias in some of its methods, and then each implementation provides the concrete meaning for that alias. In some ways that’s similar to a type variable, but rather than it expressing that e.g. list[T] can be instantiated with any particular type for T, it says that every implementation of (making up syntax here) Equality.Overlapping has a single particular type for Overlapping.

Since we need to specify a unique value for these aliases for any given class, we could just use regular class attributes for them (perhaps guarded by TYPE_CHECKING if runtime overhead is a concern). Then both object and any other class would use__eq__(self, other: Self.Overlapping) and classes would define what their Overlapping type is in their body. Of course, with something as fundamental as the behaviour of dunders like this, actually implementing this would require generous defaults and not actually forbid the current style of annotations, etc.

Looking at just equality comparisons, this is effectively the same as the Overlapping[T] idea, but this would open up similar possibilities for other protocols. For example, a parser protocol could annotate the output type and then each implementation can fill it in with their particular class.

cslamber · January 26, 2025, 3:58pm

Thanks for the feedback!

My idea at least to start would be just to generalize the status quo, that is the implementation of strict_equality, which doesn’t do anything nontrivial with respect to whether __eq__ return NotImplemented, only checking whether those types may contain the same object. For the purpose of hash-based __contains__ at least I think this doesn’t lose too much generality since hashing necessarily can’t express very nontrivial equality.

Re: __eq__ should accept object, I forgot to consider that. With the semantics of Overlapping[T] being the “union of all types that overlap” that would contain object itself, and so this type would just be object. I think then the correct form would be: type checkers may implicitly convert a type S to Overlapping[T] if the checker cannot prove that there is no object in both S and T, but not if that type were already the result of an implicit conversion (e.g. to object), and Overlapping[T] has no special properties that meaningfully distinguish it from object, so functions that accept it as an argument have to correctly return NotImplemented.

I don’t want to consider anything with nontrivial return NotImplemented for now (maybe this is a bad idea) so I don’t think that associated types are necessary.

mikeshardmind · January 26, 2025, 4:41pm

We could just redefine how type checkers should treat annotations on __eq__ and other special dunders. ie. They should understand that the datamodel requires equality can be called with any object, but the annotation specifies which ones a bool return is expected for. This could also improve local behavior within __eq__ and others, as a type checker could then point out when you didn’t handle a type you intended to, and actually differentiate NotImplemented’s use.

Alternatively, if we’re concerned about that breaking users, this could be done with overloads (some types) → bool, (object) → NotImplemented, which is something that shouldnt break any existing use.

Neither of these are comprehensive solutions, a comprehensive solution would require the type checker be aware of all possible types in the program, due to potential equality checks with type vars, and the general impossibility of inferring x or y from (x binop y) = z knowing only one of x and y, along with knowing z in python

cslamber · January 27, 2025, 2:30am

I think being more explicit about __eq__ overloads doesn’t itself resolve the question of typing set.__contains__ to disallow trivially-false operations since you would need a separate mechanism to forward the overloads to __contains__.

cslamber · February 6, 2025, 5:57pm

Another idea: users set the return type of __eq__ and __contains__ to be TypeGuard[T] for the relevant T, then checkers may have the diagnotic that constantly-evaluable TypeGuards should not be checked. This feels less composable, but I don’t have a good counterexample.