`hash`, `eq`, and LSP

mikeshardmind · October 16, 2024, 4:53pm

object.__hash__ object.__eq__ and Hashable are currently underspecified in the type system. object.__hash__ and object.__eq__ are currently assumed to exist on object, and the types they have do not match the intended use

This becomes a problem for LSP as the design of the language is that these are ~~removable~~ disableable qualities that happen to have a default behavior, not that the default behavior is intended for all types inheriting from object.

The data model specifically allows for overriding __eq__ to return non-bool objects (see ORM use)
defining __eq__ without defining an updated __hash__ ~~removes~~ sets __hash__ to None instead
hash can be directly set to None to remove hashability.

Proposed changes:

The type of __hash__ considered on object is None | ((self) -> int)
The type of __eq__ on object is (self, other: Any, /) -> Any
It is determined when declaring a type by the type system if a type would be Hashable by following the rules of the language. (the presence of a definition of __eq__ without a corresponding definition of __hash__, as well as explicitly setting __hash__ = None remove hashability)
It is an error to remove hashability when subclassing any type other than object which has hashability.
It is an error to add hashability when subclassing a type that does not have it.
type checkers should determine that instances of the exact type object are both object and Hashable ^[1] (ie. in the case of x = object(), type checkers should preserve the knowledge that x is Hashable in that local scope.)

This results in a consistent interface, at the cost of the type system not accepting a few extremely suspect design choices that someone could make (such as emoving hashability in one type, then re-adding it in a further subtype, then removing it in yet a further subtype while then expecting all of those types to automatically work with subtyping based rules for Hashable), and at the cost of a limited amount of extra tracking using an existing protocol (Hashable) to handle cases where object() is used

The special casing on object is limited and defined in such a way that it limits the impact to cases that can already cause issues such as:

def problem(x: object, mapping: dict[Hashable, Any]):
    mapping[x] = x

Which could be immediately fixable in those cases to:

def fixed(x: Hashable, mapping: dict[Hashable, Any]):
    mapping[x] = x

Below are substantial additions taken from the discussion below for a one-place summary, these are intentionally tacked on at the bottom for convenience but not adding things which mailing list users would be unaware of here.

Why is adding hashability also an LSP violation?

It’s not possible to not have a definition of __hash__ on a type. The interpreter removal of it sets it to None, the supported user removal directly requires setting to None. It’s also not possible even for types defined in C-extensions, __hash__ is a required implementation detail.

Adding hashability to an unhashable type therefore looks like:

class Unhashable:
   __hash__ = None

class Problem(Unhashable):
    def __hash__(self) -> int: ...

Leaving the special casing to object simplifies the number of cases type checkers need to consider, and there are not practical reasons to allow this, subclassing should be cooperative in nature, building off of types that are appropriate to build onto for your use.

Handling of Any

(Thanks @Liz , see comment below)

In the case where subclassing has happened involving Any, if any non-object, non-Any type’s Hashability can be determined, including the definition of the type being defined, Any is assumed compatible and that definition is used.

In the indeterminable case, the assumption is that the type is hashable, preserving compatibility with Hashable. This is a type safety hole, but not a hole in consistent definitions of Any as a compatible type. It is trivially fixable by setting hash to None when subclassing untyped unhashable types.

while intersections are not denotable, the ability of type checkers to track such a type is required already by typing.TypeIs, and would only need to be tracked in the local scope where this happened, as the non-denotability means object & Hashable is not a valid parameter or return type ↩︎

Jelle · October 16, 2024, 4:58pm

Could you expand on why this is a problem? I understand that the other way around (a non-hashable subclass of a hashable class) is problematic, but I don’t see what’s wrong with having a hashable subclass of a non-hashable parent.

mikeshardmind · October 16, 2024, 5:01pm

While there may not be many practical problems (I avoided checking as I don’t see a use case to allow it, and there is a theoretical problem), __hash__ is not absent on such types, it’s present as None, which would make any special casing to allow this here need to consider more than object

Liz · October 16, 2024, 5:21pm

I like what happens here in the absence of Any at least, but I notice you did not specify what happens in the case of subclassing of Any or an unknown type

This case would still be under-specified

from untyped import IsAny

class Foo(IsAny):
    pass

object as the implicit root provides no guidance here.
it’s not possible for the type system to determine if this is Hashable anymore.

I still think subclassing of Any was not considered properly and should not have ever happened, but the “best understanding” we have is that in this case it’s an unknown, but compatible type. This still doesn’t help with hash though, as hash being None or being a method returning an int are both compatible options.

This could be fixed in a few ways, but the way I would pick that actually has a chance of happening is that with only unknown types as bases, the following happens in order:

If hash is defined or set to None, directly or implicitly on Foo, Any is compatible, so it’s assumed to follow the rules here, and Foo defining it is fine.
Otherwise, hash is assumed to be a method returning int as the default implementation provides. This keeps Any compatible with Hashable until proven otherwise.

This opens up a type hole, but we’re talking about subclassing Any, which as everyone has observed already is riddled with type holes. This can be fixed minimally by just setting hash to None on Foo where it causes a problem or longer-term improving the typing of the base.

mikeshardmind · October 16, 2024, 5:40pm

Thanks, I agree that’s the right way here, issues of subclassing Any aside, section added to the original post to include it while being clear it came about from discussion.

TeamSpen210 · October 16, 2024, 9:32pm

If removing hashability is prohibited, that would imply that MutableMapping / MutableSequence / MutableSet are all incorrect classes. Since they’re not typesafe, using Hashable is still unsafe, since you can coerce a mutable to immutable and then pass it to code expecting hashability.

Not sure if anything can be done about this, since changing Mapping/MutableMapping to inherit from MappingBase etc probably is a non-starter. Since these are prominent ABCs, I imagine other code wanting the same sort of pair is likely to follow this pattern too.

mikeshardmind · October 16, 2024, 9:58pm

Strictly speaking, they are and that’s been a topic that’s come up before. These are not type safe classes, though the places where one can demonstrate an issue with them are currently extremely limited.

The “better” way here would have been protocols composable via intersections rather than a runtime valid type. However, these types are already filled with type ignores in the typeshed, this isn’t a change to the status quo for these types.

carljm · October 19, 2024, 6:07am

Thanks for initiating this discussion with a proposal. I think that if we want to represent hashability usefully in the type system, your proposal (or something similar to it) is what’s required. Subclasses being substitutable for base classes is a fundamental assumption of the type system. If type checkers, and typeshed, are willing to consider a type hashable and subtypes of it unhashable, as they do today, then the Hashable protocol is an attractive nuisance: it gives the appearance of providing type safety that it cannot actually provide.

It doesn’t seem like there is a practical problem with providing hashability on a subclass of a class that is not hashable. I think the special case to allow this is quite small: it would be to consider that when __hash__ is set to None on a type, the type of __hash__ on that type remains None | (self) -> int. This is in practice not really distinguishable from None, as either way the type doesn’t support hashing, but it would still allow a subtype to provide hashability without violating LSP. I’d be OK with this special case, in order to allow more safe programs to be admitted.

Why is this special case important? Requiring type checkers to maintain a special type for ExactlyObject seems to me like actually quite a heavy requirement, to support a use case (creating x = object() and then hashing x) that doesn’t have clear utility. I think the special case I described above (for allowing unhashable types to have hashable subtypes) is actually much less onerous for type checkers.

It appears that typeshed currently defines both Mapping and MutableMapping as hashable (given that type checkers currently don’t understand that defining __eq__ implicitly sets __hash__ to None); the unsafe explicit override __hash__ = None appears in typeshed only on dict, not on MutableMapping.

I think for Hashable to be sound enough to be remotely useful, we would have to consider all ABCs which are eventually inherited by unhashable types to themselves be unhashable. Many uses of the Mapping type in real code today are inhabited at runtime by unhashable dictionaries.

This would cause a problem for anyone who is currently inheriting Mapping to define a truly immutable and hashable runtime type, and relying on that type satisfying Hashable. I think these users would have to start multiply-inheriting both Mapping and Hashable, which would work as long as we allow subtypes of unhashable types to provide hashability. This is one reason I think it’s important that we do allow that.

carljm · October 19, 2024, 6:08am

This is perhaps broadening the scope here too much, but I think there are also some questions about how we should type rich comparison methods.

The general expectation for rich comparison methods is that they will not fail by throwing TypeError or similar; instead they’ll simply return NotImplemented for unsupported cases.

For type checkers to be able to accurately type comparisons, the type checker needs to know for which other types the rich-comparison method will return NotImplemented vs a “real” result. For at least some types in typeshed, the current practice seems to be to annotate __eq__ with an other type representing the “supported” types for which the method will not return NotImplemented. If this is confirmed as the desired approach, then it seems like object.__eq__ could actually be correctly typed as (self, other: Never) -> object. This would allow LSP-safe overrides of __eq__, without requiring Any in the signature.

mikeshardmind · October 19, 2024, 6:17am

There’s quite a bit of real world code that uses object() as a cheap unique hashable token. I’m aware of two different task schedulers in python that do this and return this to the caller to use as a cancelation token.

It will also come up with things like functools.lru_cache that relies hashability of arguments, and the common idiom of instances of object() as a better sentinel default than None for code where None is a valid value.

I’m more okay with this change than not requiring type checkers to preserve object() as object & Hashable in the local scope, I don’t see a strong utility for it, and was trying to keep the rules as simple as possible while keeping to the cases that have utility, however I think it interacts poorly with this bit:

If we don’t allow the prior bit of re-addding hash, then what we can do in a future with intersections is ensure that neither the mapping nor mutable mapping abstract types (and other for other similar dichotomies like sequence vs mutable sequence) express whether they are hashable, and functions relying on the hashability can (in the future) express Hashable & Sequence, for example. This would prevent list properly, as list is not Hashable. (This comes up with tuple-esque keyed dicts for flattening of namespaces)

mikeshardmind · October 19, 2024, 6:30am

Sorry, I didn’t fully express that last part properly, it leaves open a low friction change in the future that abstract classes would, (again, in the future where hashability could be expressed separately yet as a requirement), not imply hashability or lack there of unless they directly define hash, or eq without hash, or hash as none. without that future change, it wouldn’t work, but I don’t think that future change is compatible with the idea that arbitrary types can re-add hashability.

carljm · October 19, 2024, 6:32am

Ok, that’s pretty convincing. I guess for any type checker that already has a robust internal intersection implementation, representing this as object & Hashable shouldn’t be difficult.

This makes sense, but I don’t see why it depends on not allowing subclasses to add hashability.

I don’t think it’s important that we allow re-adding hashability in a subclass of a type that explicitly sets __hash__ = None, because protocol/ABC types like Mapping and Sequence (that people implementing actually-immutable-and-hashable container types will likely want to inherit/satisfy) won’t need to set __hash__ = None explicitly, they’ll just inherit None | (self) -> int from object. But I do think it’s important that the implicit effect of defining __eq__ (in the type system) is not to set __hash__ to None, but to set it to None | (self) -> int, so subclasses are still allowed to be hashable.

mikeshardmind · October 19, 2024, 6:43am

Not setting either eq or hash should result in hash being inferred as (self) -> int for the “you can’t remove hashability”, as that subclass would be hashable, and further subclasses need to not remove it. ABCs can be instantiable types, so we can’t special-case our way out of this for ABCs only.

Letting it be None | ((self) -> int) = None might work for when hash is None (Allowing adding hashability), but if hash isn’t explicitly set to None on the first subclass, then inheriting the union from object amounts to removing hashability being allowed.

And the issue with the collection abcs is that they remove hashability (as a consequence of adding mutability)

mikeshardmind · October 19, 2024, 7:10am

adding mutability isn’t a strictly additive operation, which is why the current edge cases with coercing a list to a sequence exist, it removes capabilities too, so ideally we would have in the more distant future, something akin to

class MaybeMutableSequence(Protocol[T]):
    def __len__(self) -> int:  ...
    def __getitem__(self) -> T:  ...
    __hash__ = Any  # cases that explicitly don't care if it's hashable or mutable

class _MutableSequenceDiff(Protocol[T]):
    def __setitem__(self, item: T, /): ...
    __hash__ = None

type NonMutableSequence[T] = MaybeMutableSequence[T] & Hashable
type MutableSequence[T] = MaybeMutableSequence & _MutableSequenceDiff[T]

And no public typing concept of “just a sequence”

adding hashability is trickier. It’s unsafe to add hashability to a list, but python lets you do this by just subclassing list and adding it. the interpreter doesn’t stop you from shooting yourself in the foot, and (pointing to lru cache again) there are “safe” uses to add hashability, but they rely on only ever hashing once and never allowing mutating once hashed. (see the list subclass used for hashing args & kwargs in functools) This is also something easy to get wrong if you don’t know the technical reason hash needs to be constant and equality must imply equal hash.

I think a type ignore is an acceptable outcome here if people have a reason to re-add hashability, the type checker informs you what you are doing is likely unsafe, but you can still tell it otherwise if you’re sure it’s fine.

There’s a lot more we need to get this to “as sound as could be with user expressibility” than is even on most people’s radars right now though, so while I could indulge in mapping out everything we would need, I was trying to keep the scope to what would be useful in the near future term while leaving the road open to further iterative improvement.

I also unfortunately don’t see any short term future where people stop using the existing collections abcs, even though I think it would be better in the long term if people viewed them as having similar problems as the numbers module.

Liz · October 19, 2024, 10:14am

How important is this in your opinion? It seems to be the case that any safe implementation of hash with a non-identity implementation of eq requires awareness of how eq is implemented, so I don’t think there’s a good reason to ever define these on separate classes and then use mixins to combine them, these should be defined in relation to each other.

I think I agree with you that hash being None with the ability to add it can be preserved for ABCs, but I think it needs to be explicitly done, not implicitly on an ABC, some ABCs can be instantiated directly having no abstract methods and doubling as a default implementation.

A protocol, we could implicitly say if hash isn’t mentioned, it’s still either because this isn’t in the runtime type hierarchy, but I think symmetry with the ABC case keeps this more consistent from a developer perspective.

robtovey · October 22, 2024, 6:05am

I like the ideas proposed here, but I think the subclassing rules should be kept distinct from the typing clarifications.

The @overrides decorator was added to enable opt-in to this sort of subclassing behaviour, whereas I think you are proposing use of Any as an opt-out. This was added in PEP 0698 which also briefly discusses and dismisses the introduction of a decorator which would force all subclasses to preserve annotations.

This flexibility is handled by all type checkers all the time anyway, __hash__ is by no means a special case in this respect. I know that pyright has a rule called reportIncompatibleMethodOverride which forces the preservation of signatures instead. It’s not in the python language, but does what you want.

A final question playing devil’s advocate: if __eq__ should return Any, should __hash__ be able to return bool (or any other subclass of int)?!
Do you want the inheritance of annotations to be exact, or just consistent?

mikeshardmind · October 22, 2024, 5:32pm

The override decorator isn’t to allow incompatible overrides, it’s to enforce that you are actually overriding something and isn’t related to this.
pyright’s reportIncompatibleMethoddOverride rule corresponds to part of the typing specification.
The changes here are not proposed for the language as enforcement, but to the typing specification.
Using Any on eq is to allow it’s use as described in the data model without users getting type errors or needing type ignores while still accurately describing equality on object while ensuring that it’s definitions interact with the behavior the language has for hash when eq is defined.
There’s really no valid use case to return a bool for __hash__, but this proposal won’t stop people from doing that or cause it to be a type error. You’re allowed to be more specific in return types when subclassing.

petercordia · January 23, 2025, 8:28pm

The proposal to prevent creating hashable/unhashable children seems frankly bad.

If I understand it correctly, the latest version of this proposal states that if I have a proper frozen hashable dataclass, and I need to define a version thereof that’s not safely hashable (because it has an attribute that is a numpy array or something), I can’t do so through inheritance.

Now I’ve never done what I describe above yet. But this is the kind of thing that if you try to do it and you can’t, it’s extremely frustrating and hard to solve.

Now as for what this gets us… LSP.
Consider dict and FrozenDict. Neither is a subclass of the other, because you can hash() a FrozenDict, but you can .pop() from a dict.
So then since we can’t apply LSP, code re-use should be impossible, no?
And yet, magnificently, in Python you can supply a FrozenDict to almost all functions that expect a dict. (Or the other way round.)

The underlying theory as I understand it is that thinking and designing in terms of Protocols is generally more useful than designing in terms of inheritance trees.

I don’t think it’s worth sacrificing the flexibility of Python in order to chase LSP.

mikeshardmind · January 23, 2025, 10:05pm

In terms of type safety? yes, it’s saying a type checker should reject that. In terms of runtime? you can do it still, there’s no proposal to touch runtime semantics at all.

The problem here is that doing this by inheritance implies that one of these types is safely replaceable for another, but this breaks in the face of using the objects as set items or dict keys if you have a subclass that removes hashability.

Funnily enough, this is actually a fix up to the type system’s understanding of hashability thats meant to make the protocol Hashable actually usable in preparation for user-expressible intersections, especially when it comes to composition of protocols using intersections.

carljm · February 1, 2025, 5:26pm

LSP is fundamental to static typing with subtyping. It’s the only thing that makes it possible for the type C to safely also include instances of subclasses of C, because an instance of a subclass of C should be safely substitutable where an instance of C is expected. LSP is not some kind of arbitrary nice-to-have. Without it, static typing in Python would give you no type safety at all.

It’s that “almost all” that is the trouble. If you don’t mind figuring out which functions are in that “almost all” via documentation, that’s fine; just don’t use static typing. If you want your type checker to accurately tell you what you can safely pass to a given function, then we need to solve problems like the one discussed in this thread.

One solution to this kind of situation is to have a common type that provides neither hashability nor mutability, but provides the rest of the common interface of a frozendict and a dict. This could be a protocol or an actual base class of both.

`__hash__`, `__eq__`, and LSP

Why is adding hashability also an LSP violation?

Handling of Any

`hash`, `eq`, and LSP