"Solid bases" for detecting incompatible base classes

How should type checkers handle a program like this?

from typing import reveal_type

class X(int, str):
    pass

def f(x: int):
    if isinstance(x, str):
        reveal_type(x)

Existing behavior

At runtime, the class definition will fail and no object can be an instance of both int and str.

However, currently pyright allows the class definition and reveals Type of "x" is "<subclass of int and str>".

Mypy raises some errors about incompatible methods for the class definition and (with --warn-unreachable) claims that Subclass of "int" and "str" cannot exist: would have incompatible method signatures. While this subclass indeed can’t exist, mypy’s algorithm here is wrong; it will claim that many classes can’t exist when in fact they can.

Ty, Astral’s in-development type checker, points out that Class will raise TypeErrorat runtime due to incompatible bases: Basesintandstr cannot be combined in multiple inheritance and reveals Never after the isinstance() call. This is correct and consistent with the runtime, but ty’s implementation relies on hardcoded knowledge of a number of standard library classes.

At runtime, what happens is that every class in CPython must be backed by a C struct. Generally, for classes implemented in Python this struct is always the same size (except if they define __slots__), but for classes defined in C, each class usually defines its own struct. To construct a subclass, CPython must create a layout that is compatible with all the bases, which is generally only possible if at most one of the bases is a class with a custom layout (known as a ā€œsolid baseā€); I’ll present a more precise definition below. Ty is attempting to mirror this mechanism in its implementation.

Why does this matter?

Knowing that a subclass of two classes can’t exist is primarily important for detecting unreachable code. If a type checker can detect unreachable code in a program, that often means the programmer made some incorrect assumption, so it’s good to provide more tooling to help type checkers find unreachable code.

I am also thinking about adding support for intersection types. With intersection types, it’s important to be able to reduce uninhabited intersections (such as int & str) to Never. Therefore, a better way to detect incompatible bases would be a good complement to intersection support.

Proposal

We add a new decorator @typing.solid_base, which can be applied to class definitions. Semantics are as follows.

Every class has a single solid base. It is determined as follows:

  • A class is its own solid base if it has the @solid_base decorator, or if it has a non-empty __slots__ definition.
  • Otherwise, if there is a single base class, the solid base is the base’s solid base.
  • Otherwise, determine the solid bases of all base classes. If there is only one, that is the solid base. If there are multiple, but one is a subclass of all others, the solid base is the subclass. Otherwise, the class cannot exist.

Type checkers should raise an error for class definitions that do not have a valid solid base, and simplify intersections of nominal classes without a valid solid base to Never. If they warn about unreachable code, they should use this mechanism to detect unreachable branches.

Example:

from typing import solid_base

@solid_base
class Solid1:
    pass

@solid_base
class Solid2:
    pass

@solid_base
class SolidChild(Solid1):
    pass

class C1:  # solid base is `object`
    pass

# OK: solid bases are `Solid1` and `object`, and `Solid1` is a subclass of `object`.
class C2(Solid1, C1):  # solid base is `Solid1`
    pass

# OK: solid bases are `SolidChild` and `Solid1`, and `SolidChild` is a subclass of `Solid1`.
class C3(SolidChild, Solid1):  # solid base is `SolidChild`
    pass

class C4(Solid1, Solid2):  # error: no single solid base
    pass

Discussion

  • The exact rule is a bit more complicated (see above), but a good practical rule of thumb is: if a class has @solid_base, then any child classes of that class can’t inherit from any other class with @solid_base.
  • I used the name ā€œsolid baseā€ because that’s what CPython calls it internally (code link). The term doesn’t currently appear in CPython output anywhere, so we could choose a different name for the typing decorator if we wanted, but ā€œsolid baseā€ seems like a pretty good name to me.
  • There are some other reasons why a base class would not be able to exist, such as incompatible metaclasses. Type checkers should use those too but I’m focusing more narrowly on incompatible instance layouts here.
  • CPython doesn’t directly expose whether or not a class is a solid base, but it’s mostly possible to reconstruct it by looking at some attributes of the type object. I implemented this in pycroscope; if the solid_base decorator is added to the type system, we could add the same logic to tools like stubtest to validate the presence of the decorator in stubs.
  • @solid_base should usually be applied to classes implemented in C, and therefore it should be used in stubs. However, I think we should also allow it in implementation files as a way to allow users to restrict double inheritance in their class hierarchies. It’s not a common ask but I think it can be useful; for example, this might help the first post in this issue.
  • One concern might be that we’re encoding CPython-specific implementation details into the type system. However, this specific implementation detail is quite stable in CPython, and PyPy has a similar (but apparently stricter) restriction (example issue with some discussion).

Is this a useful thing to add to the type system?

13 Likes

Thanks for the proposal. This is a well-articulated problem statement, and I think it’s an elegant solution.

As a type checker maintainer, I prefer not to hard code knowledge about types into type checker logic. It leads to inconsistent behaviors and brittle, hard-to-maintain code. So I’m generally supportive of mechanisms that allow such hard-coding to be avoided.

Putting aside intersections for a moment, I don’t think this proposal will move the needle much in terms of finding bugs in real-world code. It’s not typical for developers to attempt to create classes that inherit from two or more stdlib classes that have incompatible layouts. In rare cases where this is attempted, the problem is clear the first time the code is run. In other words, static analysis doesn’t add much value here. For that reason, I’m +0 on this feature without intersections.

Once we bring intersections into the discussion, my perspective changes. If intersections are spec’ed in a way that requires (or even encourages) type checkers to take into consideration these runtime limitations when determining whether an intersection type is inhabited, then I think this feature becomes very important and useful. In that case, I’m +1.

4 Likes

The current work I’m iterating on for intersections doesn’t specifically address this as a requirement, but it seems like a natural consequence of things already in the type system, most easily demonstrated by the existence of assert_never.

The more complex cases where I believe this may improve things if and when the intersection work reaches a point where it can be accepted comes in with intersections of callables. Knowing that two functions operate on disjoint sets of inputs is extremely important for not synthesizing incorrect information about what a function must do to satisfy the intersection of two callables.

One thing I was concerned about here when this came up in the intersection work is that this is an implementation detail not specified in the language. It is possible for a Python implementation to implement native types in a way where these would not be incompatible.

Currently, each of Cpython, Graalpy, and PyPy do have some form of limitation like this

For anyone not following through to the PyPy example Jelle linked above:

class MyClass(io.BytesIO, io.RawIOBase):
...     pass

is currently allowed on Cpython, GraalPy, and PyPy, but failed on PyPy prior to PyPy 3.9, this was something they ended up adjusting to allow. This may be something we should try to get more information on and talk with those maintaining other python implementations about.

For me, the complexity of making multiple inheritance to work (typing.Protocol unfortunately has a custom metaclass that sometimes gets in the way of mixins for me) means that any effort to help pinpoint the problem (that is, the incompatibility is due to metaclasses or my mistakes, not CPython limitations which are rarely encountered) is welcome.

But,

here ā€œquiteā€ and ā€œsimilarā€ are worrying words. Does CPython currently make guarantees on multiple inheritance compatibility across future versions, or is that information on memory layout viewed as an implementation detail? Solid bases might be a way to document what classes a particular version of CPython supports in the way of multiple inheritance, but if it makes no guarantees about future compatibility, the utility as a publicly available feature to be relied upon is ultimately limited.

I agree that preventing developers from actually creating invalid multiple-base subclasses is not the value proposition here. The real value is in allowing type checkers to have a better understanding of which intersection types are inhabited, and the implications for control flow, reachability, overlapping overloads, etc. I think this value proposition is already significant today, even prior to adding explicit intersection types to the type system. As observed in the proposal, these implicit intersection types already exist in all major type checkers.

7 Likes

It’s not a common ask but I think it can be useful; for example, this might help the first post in this issue.

Although I don’t think it should be any reason not to pursuit this idea, I’m skeptical it actually solves my original issue. Phantom types are special, and they might be subclassing a protocol. If I would apply the solid base decorator to NonEmpty, which inherits Sized and Iterable, wouldn’t that make it incompatible with tuple and list? Or does the type system treat those as nominal subtypes of the protocols?

1 Like

Sounds like the reception for this idea is generally positive.

I am on the fence about whether this should go in a PEP on intersections or in its own PEP, but I’m leaning towards a separate PEP, both because intersections will be complicated enough by themselves and because this feature is useful even without explicit intersections.

CPython currently says almost nothing about how this works. I think it should say a little more, so I opened Document when multiple inheritance is not allowed Ā· Issue #136843 Ā· python/cpython Ā· GitHub.

If CPython changes its behavior for some classes in the future, we can change the type system. But the core idea has been stable for decades (I see references to ā€œsolid basesā€ from around 2001 in typeobject.c).

If some classes are solid bases in one CPython version but not others, we can use if sys.version_info checks. If there are differences across implementations that matter to users, we could add if sys.implementation == checks to the type system; the idea has come up before but there hasn’t been much demand.

Thanks, does seem like I misread your issue. @solid_base would allow you to say ā€œthis class can’t multiple inherit from any solid base other than meā€; what you want is ā€œthis class cannot multiple inherit from one specific other class, but may multiple inherit from others, including solid basesā€.

1 Like

I’m starting to write the spec for @solid_base, and I’m considering what it means to apply the @solid_base decorator to a Protocol.

One interpretation could be that any classes that implement the Protocol must have that Protocol as a solid base, that is, they cannot inherit from any other solid bases. This would imply that the only valid implementations of the Protocol are classes that have the Protocol as a nominal base class, or @final classes that inherit only from object. (Non-@final classes would be unsafe, since they could have base classes with a different solid base, which would break the transitivity of subtyping.)

Another interpretations might be that nominal subclasses of the Protocol cannot inherit from other solid bases, but structural subclasses can. However, this would mean that the @solid_base decorator on the Protocol doesn’t actually help reachability analysis. Yet another interpretation might be that only nominal subclasses of the Protocol are allowed, but then why are you even using a Protocol?

Since these behaviors seem unintuitive and not useful, I’m inclined to disallow the @solid_base decorator on Protocols and allow it only on nominal classes. Similar reasoning would apply to TypedDicts, which are also nominal types.

4 Likes

I’m in favor of this only being allowed on concrete types, not on structural ones. This would exist for giving typecheckers implementation information about concrete types. It might make further sense to only allow this in stubs since this is specifically about expressing something users can’t readily create from a python file.

1 Like

PEP draft here: PEP 800: Solid bases in the type system by JelleZijlstra Ā· Pull Request #4505 Ā· python/peps Ā· GitHub

3 Likes

Functionality should be added to the @dataclass_transform decorator as well.

For example, the type checker should understand that @dataclass(slots=True) makes the class a slotted class, and therefore makes it its own solid base.

The PEP draft already says that classes with non-empty __slots__ are solid bases. I pushed a change just now to say explicitly that this includes dataclasses/dataclass_transform classes. I’m not sure anything else is necessary.

2 Likes

Yea this PEP makes sense to me.

The only thing I was able to find that could be improved in my view, is the name ā€œsolid baseā€œ. It feels a bit general and non-descriptive to me. I realize that the name has some history in cpython. But if we can think of a better name, then I see no reason not to :person_shrugging:

Some alternatives for ā€œsolidā€œ that come to mind are ā€œsingularā€œ, ā€œprimeā€œ, ā€œdisjointā€œ, ā€œexclusiveā€œ, ā€œclosedā€œ, ā€œmonoā€œ, and ā€œsoloā€œ.

So unless the ā€œsolid baseā€ is set in stone already (which would be pretty ironic), then maybe a poll could be a fun way to decide?

I was also about to ask if it’s time to start the bikeshedding :grin:

The core semantics of the PEP make sense to me. But I think the name ā€œSolid Basesā€ will be opaque to even many users who are generally familiar with the type system. If we can change the name of the term itself, the suggestion of yours that stands out to me is ā€œdisjointā€, because I can mentally connect it to a description using the normal mathematical meaning of that word, something like: ā€œtypes inheriting from different disjoint bases are disjointā€. Alternatively, at least the PEP title should be expanded IMO, so that people can get a sense of what functionality or purpose it relates to. Something like ā€œSolid Base Types for statically inferring unreachabilityā€?

3 Likes

I like ā€œdisjoint basesā€: a ā€œsolid baseā€ class can be described as a base class that is disjoint from any other base class, in that a child class can only inherit from one disjoint base. ā€œDisjointā€ is a somewhat obscure technical term, but maybe that’s OK for a fairly niche feature that should primarily be used in stubs.

7 Likes

There’s precedent for disjoint in Python in eg the set methods; to me ā€˜disjoint bases’ is immediately clearer on what it could mean than ā€˜solid bases’.

A

There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.

(source)

So I’m happy to see there appears to be agreement on naming it ā€œdisjoint baseā€œ :slight_smile:

Personally, I’m more in favor of ā€œsolid baseā€. I think it is more technical and cool-sounding (and yes, I’m aware that this argument doesn’t carry much weight).

The name solid_base appears in the code of typeobject.c. It is used several places in the reasoning about what is allowed, e.g. in bases or the MRO. I’ve come across it mainly when trying to understand the rules for __class__- assignment.

I’m not able to correlate the descriptions above unequivocally with solid base computed there, so my first question is to check it’s the same concept. I think so. (If it isn’t, we shouldn’t use the same word.)

I’ve understood the solid_base of a type as the least derived class that has the same representation (memory layout).

Now, types with the same representation form an equivalence class (a set) of types that permit mutual __class__-assignment. Also, assignment to __bases__ cannot move a type to another equivalence class. And this set is disjoint from other sets of types that share a different representation. I do not think it makes sense, however, to label a type as disjoint.

We want to express the idea that a type is in a distinct representation equivalence class from its immediate ancestors. So how about @distinct?

The least derived type in a representation equivalence class is necessarily unique: I think two classes equally distant from object can be equivalent if they add the same slots to a common solid base.

The concept is meant to map directly to a ā€œsolid baseā€ as understood by typeobject.c, with the addition that classes can be marked as ā€œsolid basesā€ (or disjoint bases) in the type system without being ā€œsolid basesā€ according to CPython. The description in the post you linked to is a little imprecise but the one in the PEP text should be exact. (I just realized this is the pre-PEP thread. We should probably move to the PEP discussion thread.) The term ā€œsolid baseā€ has a long history in CPython, about 25 years, but it only appeared in the source code, not docs or error messages, so I don’t feel too bad about using a different term.

The rules for __class__ assignment I believe are a little different from those for solid bases. I don’t think type checkers should support __class__ assignment at all, but that’s a different discussion.

1 Like