I love most aspects of PEP 695, however I believe there is one major issue with it. The issue relates to the use of the : symbol to denote a subtyping relationship between types. (For example, def foo[T: int] constrains T to be a subtype of int.)
There are several reasons why this syntax is suboptimal. Please hear me out—I believe that together these reasons constitute strong evidence that the syntax should be changed.
Reason #1: This syntax is not forwards compatible
Using the : symbol to specify a subtyping relationship is not forwards compatible with two extensions that might eventually be made to Python’s type annotation syntax:
- The specification of supertype relationships (i.e. lower bounds).
- The specification of types parameterized by values (a.k.a. “dependent types”).
For each extension, I will briefly explain:
- why it is practical, and
- how it is incompatible with the use of
: to specify a subtyping relationship.
Supertype relationships
Amongst programming languages that support constraints upon type parameters, some of them (e.g. Java, Scala, Julia…) support both lower bounds and upper bounds. This feature is especially useful for languages that support use-site variance annotations, e.g. Java and Kotlin.
Here is an example of what this might look like in Python. I shall use Scala’s syntax for type bounds, namely <: (upper bound) and >: (lower bound). I shall also use a hypothetical where syntax:
def foo[T](items: list[T])
where
T <: int | str # Things that we might GET from the list.
T >: int # Things that we can PUT into the list.
-> bool:
x: int|str = items.pop() # We can get 'int|str'
items.push(0) # But we can only put 'int'
...
As defined, the function foo is able to operate generically on any list where the items are (at most) integers or strings, and for which it is safe to add arbitrary integers. Thus, it is safe to invoke this function upon:
But it is NOT safe to invoke this function on:
list[int|str|bool]
list[NonZeroInt] (a hypothetical subtype of int)
I’m not necessarily advocating for this type system extension. However, it would be good to avoid unnecessary barriers. The use of : to specify upper bounds is one such barrier, because it doesn’t facilitate a subsequent syntax for lower bounds. (Unless we used the syntax int: T, but wow, that would be confusing!)
Types parameterized by values (dependent types)
Several advanced type systems support the parameterization of types by values. For example, one might define an Array with a statically-known length as:
class Array[T, length: int]:
Here is how an instance of Array would be constructed:
arr = Array[Int, 8](...)
This is a kind of “dependent typing”, and there are many practical uses for it. Indeed, the type system of Mojo —a language that extends Python—has dependent types, and uses the exact syntax that I’ve shown above. But notably, under the syntax proposed by PEP 695, the length parameter would be interpreted as a type declaration, not a value declaration! So fundamentally, the use of : to denote subtyping is incompatible with the above syntax for dependent typing.
- About Mojo: Mojo is a programming language that extends Python with static typing and Rust-like memory safety. It aims to be an excellent language for specifying high-performance machine-learning models. The project is being led by Chris Lattner, creator of LLVM and Swift. Its type system allows types to be parameterized by values, as shown above.
Reason #2: This syntax is not consistent with Python’s existing syntax
Compare the following two code snippets:
# TEST whether x is an INSTANCE of 'Sequence'
if isinstance(x, Sequence):
# DECLARE that x is an INSTANCE of 'Sequence'
def foo(x: Sequence):
Now in contrast, compare the following:
# TEST whether T is a SUBTYPE of 'Sequence'
if issubclass(T, Sequence):
# DECLARE that T is a SUBTYPE of 'Sequence'
class Foo[T: Sequence]:
The syntax for testing instances (isinstance) is different from the syntax for testing subtypes (issubclass). Therefore, the average Python user would anticipate that the syntax for declaring instances (:) is different from the syntax for declaring subtypes (?). But unfortunately, PEP 695 proposes using the : operator for both purposes.
Reason #3: The : symbol already has 5 different meanings
In today’s Python, the : symbol is already used in 5 different places:
- Introducing a nested block
- Dictionary literal
- Slicing
- Type annotation
- Part of the walrus operator (
:=)
PEP 695 extends : with yet another meaning.
For learners, this will likely be a source of confusion.
For experts, this will potentially increase the cognitive overhead of reading function signatures. For example, consider the signature:
fn foo[T: int](x: int, y: T, z: T):
In this single line of code, the : symbol has three distinct meanings:
- Subtype annotation
- Type annotation
- Introducing a nested block
Proposed solution
These problems can be avoided by choosing a different operator. I don’t care too much which operator is chosen, however, it would probably be the most sensible to include the < character as part of the operator, since this appears to be the only ASCII character that is strongly associated with a subset/superset relationship.
Hence, I would suggest considering one of the following operators:
The first of these is already the de-facto standard for a subtype relation. It is used in Scala and Julia. The notation makes a lot of sense:
< means “sub”
: means “type”
This operator was also proposed by others earlier in this thread, and in the prior thread on PEP 695.
What about T: (int, str) syntax?
PEP 695 also proposes using the : syntax to specify that a type variable must be instantiated with a type drawn from a set. The syntax is:
class Foo[T: (int, str)]:
If we were to introduce the <: symbol (or something similar) to mean “subtype”, we would need a different symbol to express the above relationship, because it is not a subtyping relationship.
In mathematics, the ∈ symbol (“element of”) is used to express such a relationship. And in Python, it turns out that we already have the in keyword for this. So I would propose using in in place of the : symbol:
class Foo[T in (int, str)]:
Summary
Using the : operator to denote a subtyping relationship would be problematic. We can avoid these problems by using a slightly different syntax, for example <: and in.