PEP 800: Solid bases in the type system

I don’t think this is a reasonable objection. Python’s type system is an unsound gradual type system (that is, a gradual type system with no runtime enforcement); this means there are many trivial ways to achieve unsoundness via unchecked code. (Just call a typed function from a call-site the type-checker doesn’t see, and pass the wrong argument types!)

If type checkers enforce (in the code they can see) that you can’t multiple-inherit from two different disjoint bases, that’s sufficient to make this feature sound, to the same degree anything in the Python type system can ever be considered sound. The fact that you could in theory still have such a subclass somewhere at runtime, because the type checker didn’t see its construction, is not an objection to this feature, it’s an objection to the entire Python type system.

3 Likes

Incorrect disjointedness breaks the gradual guarantee. I shouldn’t need to explain this to you, we’ve discussed as much before. Yes, that’s “bad information in”, “bad information out”, but why do we want this to allow expressing the wrong thing when we can detect the right thing? This is the same rationale that comes with automatic variance. It’s something users shouldn’t have to think about and that typecheckers can be correct about in pure python code without being told more.

I don’t understand how we can “detect the right thing.” The (non-typeshed, non-builtins) use case for this is C extension types, represented to the type system only by stubs, which use custom memory layouts (similar to int or str). It would be inaccurate for stubs to pretend that such types have Python-defined __slots__, and there’s no way for the type system to detect that these types are “solid bases”, unless it can be expressed explicitly in a stub.

please read the whole posts you are quoting.

I conceded above that we still need this in stubs as well, I just think it should be stub-only specifically because we can do better in pure python code.

Sorry; I saw the mention of stub-only in your earlier post, but it wasn’t clear to me from the subsequent paragraph whether you were just advocating for stub-only, or against the feature entirely.

Making it stub-only seems easy enough; just specify that type checkers should error (and not respect it) if used in a non-stub. I think a wrinkle might be that there could be reasonable use cases for using it under if TYPE_CHECKING also, and I’m not sure if type checkers want to take on that additional complexity.

I’m also not sure how much value there really is in making it stub-only. It seems like all the same mis-uses are possible in stubs. It reduces the scope for mis-use, sure. But it seems most of the same value could be achieved by just documenting what the “correct” use cases are, and documenting “if you’re using this in a non-stub file, it’s likely wrong.”

It prevents many potential mistakes and misunderstandings if we detect this and tell users they only need it in stubs and then document the use in stubs. It reduces the things users need to do if typecheckers are to detect this by inhereting from a solid base and/or presence of slots. Maybe it doesn’t need to be forbidden at runtime, but I don’t think this is something users should need to be thinking about. “I added slots” should be sufficient. “I removed slots” should also.

Stubs are always in the “we have to assume what is expressed is correct” territory, and there’s no good way around that in a static analysis context (a runtime typechecker or compiler using the annotations could enforce this by checking at the boundary, turning the assumption from open world, to closed world with checked entry)

1 Like

Should if TYPE_CHECKING be a valid way to convey things about native extension imports? If we can agree that a stub for the import is the correct tool here, This is not a concern we need to think about further. Something defined in this block would be either only accessible in a typing context and not a suitable base class, or has a corresponding implementation in the else case. If it has a corresponding runtime implementation and is not a native extension import, then putting the slots that are defined in the actual implementation in the positive branch would work.

I would like to see the type system do the right thing for users automatically when possible, and to me, that includes preventing users from building bad habits by telling them “we’ve got this for you, just focus on writing working code” when we can.

This is offtopic, but I’d like to say it publicly because I see it as a recurring theme in some typing adjacent discussions.

I think this isn’t quite right. I’ve been working with @mikeshardmind on how to improve the soundness when involving the case of fully typed code called by gradually typed callers.

There is a middle ground between where each of you and others who care about typing have clashed on this based on the perspectives I have seen expressed in a few threads, as well as in github issues. This middle ground can preserve qualities we all agree that we like, such as the gradual guarantee, while improving soundness in these cases, and without throwing out the existing type system to get it and without treating it as a failure of the type system as a whole.

I know there are some strong disagreements about the current type system in a few places, maybe we could get a few people together in a less formal setting to discuss some of these more without it being as much a recurring underlying theme on discourse and work towards a way to bridge some of the gaps here on improving the soundness that we actually can improve, without always falling back to the fact that python’s type system is always going to be unsound anyway.

Until then, I also just want to acknowledge that despite these disagreements, it’s very clear to me that all of the people involved just want to improve it and I appreciate that a lot. We just haven’t found a common path that everyone can agree on. Maybe we never will, but I think there’s more those involved agree on than not here, and if we can focus on the things we can find agreement on, some of these discussions might move at a faster productive pace.

1 Like

Thinking of which I agree that @disjoint_base don’t actually make classes disjoint, since they can still be multiple subclassed with classes that don’t have a special memory layout.


class A:
  pass

# let’s assume int has a @solid_base

The decorator to int, then, is a marker to indicate that it has a memory layout that differs from that of plain object.

I now think it’s best to call it @_special_base with a leading underscore. It’s documented for the uncommon cases where users or library authors want to use it, but the leading underscore makes it clear it’s not something one would ordinarily use.

object is a solid base but not a special base, so classes that only derive from object can mix freely with a special base. The details of the layout resolution process, as I understand, is like:

  • If a class has no special base specified in the class statement among its base classes, its special base layout (SBL) is undefined.
  • If a class has one special base, its SBL is that special base’s layout.
  • if a class has multiple special bases, the layout of the most derived special base that is a subclass of all other special bases is chosen as the class’ SBL. It is a TypeError if such a layout cannot be found. (Since there can only be one SBL, you can’t subclass two slots class that are not subclasses to each other, which you already can’t do now)
  • if a class has a non-empty __slots__
    • it must either have no SBL defined, in which case the resultant layout will be the __slots__ defined in this class,
    • or that the defined SBL must have non-empty __slots__. The resultant layout will be a combination of the SBL’s __slots__and its own. This allows a slots class to contain additional slots definitions to the base class.
    • The class becomes a special base by itself.
  • If the class does not have a non-empty __slots__
    • If the SBL is defined, the resultant layout is the SBL, and the class becomes a special base by itself automatically.
    • Otherwise, the resultant layout is the non-special object layout, and the class does not become a special base by itself.

In this case we only need to (and should only) put @_special_base in stubs, the rest is resolved automatically. We may want to expose the SBL resolution logic in Python for introspection even though it can’t be tweaked like MRO.

The naming of “special” to exclude object allows this kind of code to be reasoned correctly:

class A:
  __slots__ = ("foo",)

class B:  # implicit object as base class
  pass

class C(A, B): # it’s OK to mix a special base with object
  __slots__ = ("bar",) # OK

+1 for “disjoint base” as a good name for this concept. As someone with very little familiarity with the CPython internals, “solid base” means nothing to me, but “disjoint” immediately provides some useful intuition about what’s going on.

Overall, I really like this proposal; I find the motivation around reachability and intersections particularly compelling.

2 Likes

I certainly agree that type-checkers should automatically detect “disjoint-baseness” based on definition of __slots__ and inheritance from a disjoint base; this is already clear in the Specification section of the draft PEP. So there shouldn’t ever be a need for using this in runtime code to identify classes that inherit from a disjoint base or define __slots__.

The PEP suggests that some people may want to use it in runtime code to identify a class as “shouldn’t be inherited along with another disjoint base” even if that class doesn’t define __slots__ so technically could be multiply-inherited with a disjoint base at runtime. It sounds like this is concretely the part of the PEP you don’t like? To me this doesn’t seem like a primary use case, but I also don’t see a strong case for prohibiting such use. The fact that it is possible to use in runtime code certainly doesn’t require people to think about it every time they define a class, nor does it prevent type checkers from detecting the cases that are detectable (as the PEP already specifies).

Definitely interested in what this would look like!

3 Likes

The reason to prohibit it is that it has adverse effects on inference not matching runtime with gradual callers with exhaustiveness checks and other narrowing.[1] Maybe this isn’t that important, but I think typing matching runtime where possible is a good goal. As I see it, this is currently at the very edge of what isn’t supported by current typecheckers, and “fixing”[2] the current issues with the gradual guarantee makes many things have more self-consistent interpretations automatically.

I think that if we want something like “the module owning this type can subclass it, but users shouldn’t” or “I’d like static analysis to warn people that this type isn’t designed for multiple inheritence if they use it that way” we want to make this seperate from an error that is inherently about runtime “given the type checker has correct information, this will error at runtime, no exceptions”. I don’t want to tie user expressed disjoint invariants to the idea of “the implementation not just says, but requires these must be disjoint”. It might even be desirable to say “must be disjoint with [list of other types in the module]” to enable algebraic data type expressions that are actually mutually exclusive, without restricting the ability for users to add slots with composition, or mix in some serialization mixin, etc.

I think that even if there is a need for expressing other reasons that types need to be disjoint, we should look to understand those needs better first, rather than make this part usable for things it doesn’t necessarily capture well. This might also be a reason to continue using solid_base as a name, and save disjoint for allowing user expression of specific disjoint requirements rather than disjoint with everything else. (maybe disjoint_class_layout to keep both the disjoint nature, but also that this is just for class layout?)


  1. It may also be interesting to codify which things in the type system model a runtime limitation, and things that are “type system addons” like ReadOnly, having them produce a different category of error. While the latter do express things we (myself included) want the type system to enforce, this group of things is incompatible with the open-world for gradual guarantee and soundness. It requires either non-gradual use or closed world (everything visible to the typechecker) assumptions for those to be enforced correctly. Further details about the Ideas Liz and I were exploring for supporting this compatibly with people’s existing types and typecheckers this are worth discussing elsewhere, and shouldn’t hold this proposal up. ↩︎

  2. Using the word fixing loosely here, there are arguments that the by design nature means it isn’t broken, but I think the fact that this in in a gradually typed system with library code with incomplete types, it’s better to slow roll anything that could have a negative impact on inference or narrowing behaviors. ↩︎

6 Likes

@disjoint(cause) perhaps, with the default value of cause being "implementation", and other possibilities including "semantics", or however you like to express “the designer wants you to keep marriages in the family”.

Otherwise, I think it is confusing if the annotation does not match the behaviour of solid_base in the code, and they are called the same.