It feels like I’ve been reading about intersection types for years here, but I don’t think I’ve seen a PEP draft about them. Maybe the general feature is too complex.
I would get 90+% of the value of intersection types with the following restrictions, that can later be lifted:
(Syntax: A & B or Intersection[A, B].)
The intersection must contain one or more protocols.
The intersection may contain at most one concrete class/nominal type.
All members of the intersection must be completely disjoint, meaning no overlapping methods, instance variables or class variables.
One concrete use case I have is that PyTorch’s Dataset class doesn’t define __len__ itself but many operations on datasets require the presence of __len__. You could define a subclass, SizedDataset, with an abstract __len__, and use this type to, e.g., annotate your function arguments, but this is a nominal type, so it’s not compatible with other subclasses of Dataset that also define __len__. What I want is Dataset & HasLen but I can’t express it in the type system.
The most recent work about intersections is (I think) going on in this discussion. It might be worthwhile to ask there, or maybe tag relevant interested parties
I understand what you’re trying to express (that they can’t have conflicts in the cases where they overlap) but as currently expressed this only allows empty intersections (which would indeed accomplish the goal of simplifying implementation semantics )
Languages with full intersection support are very rare, precisely because they’re very complicated to properly implement. It’s (indeed) more common to see things like refinement types instead, which are a limited form of intersection types (wiki).
From Types and Programming Languages by Benjamin C. Pierce (2002):
Unfortunately, the power of intersection types raises some difficult pragmatic issues for language designers. So far, only one full-scale language, Forsythe (Reynolds, 1988), has included intersections in their most general form. A restricted form known as refinement types may prove more manageable (Freeman and Pfenning, 1991; Pfenning, 1993b; Davies, 1997).
Perhaps it’s outdated, but it’s something worth noting either way.
As I understand the proposal of OP the limitation to at most one concrete type and the rest being protocols would be more or less equivalent to refined types. What are behaviors if not the structural subtyping that protocols are directly meant to represent?
In Python, typing.Protocol can be used structurally and (pun intended) nominally (FWIW I think this was a mistake, but it is what it is). So I don’t think this will simplify anything, I’m afraid.
While that is true at runtime, the (documented) intended effect is to mark a concrete class as explicitly compatible with the protocol. I don’t see how that possibility somehow limits a type system from assuming they are meant to represent their structural behavior when used in an intersection as this proposal suggests.
class CanSpam(Protocol):
def spam() -> int: ...
class BadSpammer(CanSpam):
def spam() -> str: ...
Here, BadSpammer is structurally incompatible with CanSpam. But because it’s a nominal subclass of CanSpam, BadSpammer will be assignable to CanSpam regardless.
So my point is that the nominal subtyping relation can not be avoided by limiting intersection types to Protocols in this way.
I was just thinking the same thing - good point. However can we assume the chorus of users asking for type intersections (myself included) are happy to handle such cases, and so these intersections (with unresolvable method or property clashes) can safely be treated as empty? Where that’s not appropriate, won’t an empty type annotation quickly be identified as an error?
Two simple methods like that with different return types are telling me they both want to be a generic one, FWIW.
It seems to me that the simplest approach would be to declare str as a Container type as well. Maybe resolving the issues in typeshed to stabilize at least the built-in types could help to pave the way?
They can at runtime sure but an incompatible override like this should be flagged as unsound at definition time and we can’t really expect any typing based construct to handle unsound, conflicting classes soundly in all cases.
The specification explicitly documents that inheriting a Protocol as a base class is a statement that it implements that protocol and so type-checkers should simply reject the above definition directly.
Also I can’t see how this interacts particularly badly with intersections as proposed compared to any other typing construct that needs to contend with unsoundly annotated types.
All type checkers currently accept passing str to def (x: Sequence[str]). So then surely it should be reasonable to expect that something like object & Sequence[str] will also accept str, no?
And just to be clear; I don’t agree with what typeshed is doing here. My point is that these mistakes have been made, and that they’re not going away anytime soon, so we’re forced to deal with them.
There are very few typing features that don’t run into some edge case of python’s dynamic nature and legacy features where they aren’t completely sound. That such cases exists can’t really be an argument against a new feature unless it interacts particularly badly or very broadly and I actually don’t see that in any of the examples brought up yet.
Can we note that there may be problems that a pep would have to take into consideration when specified in detail and open the discussion to focus on broader considerations of the proposal instead?
I just don’t think that Protocols are a good fit here, is what I’m trying to say. So I’d rather we’d explore other options than to make this into yet another discussion of all the current issues surrounding Protocol.
These points don’t work. I tried a few times to get a limited form of it specified well enough. It may look fine, but there’s a few issues that come up with this set.
If it wasn’t for the ability to subclass Protocols nominally, then an intersection of only disjoint protocols would be possible. Including nominal types at all has more complications than I have the time to rehash right now, some are actually related to not knowing if a subtype is disjoint from a protocol due to the boundaries on information currently.
As for disjointedness, if the type system information boundaries allowed being certain of it, this restriction would remove the primary complication:
Any overlap in methods results in needing to incorporate newer type theory than anything the type system currently relies on. Presenting this in a way that is approachable to audiences ranging from a non-expert that just needs to understand how to satisfy an intersection, all the way to those needing to verify an implementation of a type checker is behaving correctly is the bulk of the remaining work on the proposal.
If you’re comfortable enough with the theory yourself, the latest draft pdf contains almost everything needed. I hope to have more time where I have the level of focus required to continue working on it soon.
I have to admit, I don’t really see what the issue with nominal subclasses of protocols is.
Were you thinking of something like this?
from typing import Protocol
class CanSpam(Protocol):
def spam(self) -> int: ...
class BadSpammer(CanSpam):
def spam(self) -> str: ... # type: ignore
class C(CanSpam):
def spam(self) -> int:
return 4
def f(b: BadSpammer) -> str:
return b.spam()
f(C())
But pyright (haven’t tried others) doesn’t fall for this and flags the last line as error (correctly!).
It feels to me like it’s always possible for the type checker to see which attributes are actually present on a class or protocol — regardless of whatever stuff was defined in the superclasses.
To clarify: What I meant was that any overlap of attributes is not allowed. So, if we have
class P(Protocol):
x: Path
class Q(Protocol):
y: int
class R(Protocol):
x: PosixPath
Then P & Q is allowed, but P & R is not allowed, even though PosixPath is a subtype of Path and P and R are in theory compatible.
The point of this is that we can ignore all questions of compatibility and can literally just consider the list of attribute names.
Reading previous proposals made me realize, by the way, that allowing intersections of only TypedDicts should also be pretty straightforward.