PEP 695: Type Parameter Syntax

This is an interesting idea, and I’m a bit more positive about it than Alex and Eric. I see two ways to implement it, both of which come with some tradeoffs:

  1. Implement __call__ as returning its argument unchanged

This would make type aliases work similarly to NewType. It’s simple and fast. It could unlock syntax like this for NewType in the future, removing the need to repeat the name and allowing for lazy evaluation of the supertype:

type NT = NewType[int]

On the negative side, this would work differently from existing assignment-based type aliases in that the value would not be converted in any way. In Thomas’s example, List(x*2 for x in range(3)) would return a generator expression at runtime, not a list.

  1. Implement __call__ as return self.__value__(arg)

This would mean that the type alias’s value is used as the converter function. That would make the List() call indeed turn the value into a list, but as Alex called out, it means calling the type alias would work only for simple aliases, not e.g. for unions. In addition, it would implicitly force the evaluation of the lazily evaluated type alias, which could be surprising behavior.

Besides union types, abstract types also cannot be constructed. That’s why I think it’s generally an error to conflate type[SomeClass] with Callable[..., SomeClass], which is what calling a type does. I think that if you want something you can call, you should annotate it Callable[..., SomeClass].

Even better would be to provide a parameter definition. There is no way to know the parameter specification of the constructor of some type[SomeClass] since __init__ doesn’t obey LSP so any subclass can change it.

Consider your list example. I guess the reason you’re making a type alias is so that you can change the container type? Then, I think you need two lines of code:

type OurContainer = list[Any]
make_container: Callable[[Iterable[Any]], OurContainer] = list

I don’t think this redundancy is so bad because these two lines are really two different things that only happen to have the same value.

Suppose, you decide that the elements are all integers. Then, you might do:

type OurContainer = list[int]
def make_container(it: Iterable[int], /) -> OurContainer:
  retval = list(it)
  # check that the elements are all integers
  return retval

And so there’s some more benefit to splitting these two things.

1 Like

I agree that type[SomeClass] is in general problematic but that’s not what a type alias is, right? A type alias is a one-to-one mapping, so I think there’s no need to worry about inheritance and LSP here. Type checkers treat these two things very differently:

class C: ...
MyType: type[C] = C
MyAlias: TypeAlias = C

But I realize now that my example was not very good because everyone is focusing on the __call__ part. The actual example I had in mind (which I then foolishly replaced with a simpler example) was this:

from enum import Enum, auto
from typing import TypeAlias

class DatasetSplit(Enum):
    train = auto()
    validation = auto()
    test = auto()

class Dataset:
    Split: TypeAlias = DatasetSplit

    def __init__(self, split: Split) -> None:
        self._split = split

ds = Dataset(split=Dataset.Split.train)

where the user of the library can either explicitly import DatasetSplit or can access it via Dataset.Split. This type-checks in mypy and pyright and works at runtime. When I remove the TypeAlias annotation and just write Split = DatasetSplit in the class body, then mypy complains Variable "main.Dataset.Split" is not valid as a type. Though pyright accepts it.

EDIT: I know I could just use a nested class instead, but my point is just that something is possible to do with TypeAlias that isn’t possible with the new type X = Y syntax.

1 Like

Good example.

I think that you answer your own question though. As you say:

And then in your example, you use it as an alias:

and as a class variable:

Therefore, I think it makes the most sense to simply have two statements: one to set a class variable, and one to set a type alias.

Unlike a type variable, a class variable can be assigned. And as far as I know, type variables are resolved statically whereas class variables are resolved dynamically (in case it is overridden in a subclass). So these really do need to be separate declarations.

I love most aspects of PEP 695, however I believe there is one major issue with it. The issue relates to the use of the : symbol to denote a subtyping relationship between types. (For example, def foo[T: int] constrains T to be a subtype of int.)

There are several reasons why this syntax is suboptimal. Please hear me out—I believe that together these reasons constitute strong evidence that the syntax should be changed.

Reason #1: This syntax is not forwards compatible

Using the : symbol to specify a subtyping relationship is not forwards compatible with two extensions that might eventually be made to Python’s type annotation syntax:

  • The specification of supertype relationships (i.e. lower bounds).
  • The specification of types parameterized by values (a.k.a. “dependent types”).

For each extension, I will briefly explain:

  • why it is practical, and
  • how it is incompatible with the use of : to specify a subtyping relationship.

Supertype relationships

Amongst programming languages that support constraints upon type parameters, some of them (e.g. Java, Scala, Julia…) support both lower bounds and upper bounds. This feature is especially useful for languages that support use-site variance annotations, e.g. Java and Kotlin.

Here is an example of what this might look like in Python. I shall use Scala’s syntax for type bounds, namely <: (upper bound) and >: (lower bound). I shall also use a hypothetical where syntax:

def foo[T](items: list[T])
where
    T <: int | str  # Things that we might GET from the list.
    T >: int        # Things that we can PUT into the list.
-> bool:
    x: int|str = items.pop()  # We can get 'int|str'
    items.push(0)             # But we can only put 'int'
    ...

As defined, the function foo is able to operate generically on any list where the items are (at most) integers or strings, and for which it is safe to add arbitrary integers. Thus, it is safe to invoke this function upon:

  • list[int]
  • list[int|str]

But it is NOT safe to invoke this function on:

  • list[int|str|bool]
  • list[NonZeroInt] (a hypothetical subtype of int)

I’m not necessarily advocating for this type system extension. However, it would be good to avoid unnecessary barriers. The use of : to specify upper bounds is one such barrier, because it doesn’t facilitate a subsequent syntax for lower bounds. (Unless we used the syntax int: T, but wow, that would be confusing!)

Types parameterized by values (dependent types)

Several advanced type systems support the parameterization of types by values. For example, one might define an Array with a statically-known length as:

class Array[T, length: int]:

Here is how an instance of Array would be constructed:

arr = Array[Int, 8](...)

This is a kind of “dependent typing”, and there are many practical uses for it. Indeed, the type system of Mojo —a language that extends Python—has dependent types, and uses the exact syntax that I’ve shown above. But notably, under the syntax proposed by PEP 695, the length parameter would be interpreted as a type declaration, not a value declaration! So fundamentally, the use of : to denote subtyping is incompatible with the above syntax for dependent typing.

  • About Mojo: Mojo is a programming language that extends Python with static typing and Rust-like memory safety. It aims to be an excellent language for specifying high-performance machine-learning models. The project is being led by Chris Lattner, creator of LLVM and Swift. Its type system allows types to be parameterized by values, as shown above.

Reason #2: This syntax is not consistent with Python’s existing syntax

Compare the following two code snippets:

# TEST whether x is an INSTANCE of 'Sequence'
if isinstance(x, Sequence):
# DECLARE that x is an INSTANCE of 'Sequence'
def foo(x: Sequence):

Now in contrast, compare the following:

# TEST whether T is a SUBTYPE of 'Sequence'
if issubclass(T, Sequence):
# DECLARE that T is a SUBTYPE of 'Sequence'
class Foo[T: Sequence]:

The syntax for testing instances (isinstance) is different from the syntax for testing subtypes (issubclass). Therefore, the average Python user would anticipate that the syntax for declaring instances (:) is different from the syntax for declaring subtypes (?). But unfortunately, PEP 695 proposes using the : operator for both purposes.

Reason #3: The : symbol already has 5 different meanings

In today’s Python, the : symbol is already used in 5 different places:

  • Introducing a nested block
  • Dictionary literal
  • Slicing
  • Type annotation
  • Part of the walrus operator (:=)

PEP 695 extends : with yet another meaning.

For learners, this will likely be a source of confusion.

For experts, this will potentially increase the cognitive overhead of reading function signatures. For example, consider the signature:

fn foo[T: int](x: int, y: T, z: T):

In this single line of code, the : symbol has three distinct meanings:

  • Subtype annotation
  • Type annotation
  • Introducing a nested block

Proposed solution

These problems can be avoided by choosing a different operator. I don’t care too much which operator is chosen, however, it would probably be the most sensible to include the < character as part of the operator, since this appears to be the only ASCII character that is strongly associated with a subset/superset relationship.

Hence, I would suggest considering one of the following operators:

  • <:
  • <

The first of these is already the de-facto standard for a subtype relation. It is used in Scala and Julia. The notation makes a lot of sense:

  • < means “sub”
  • : means “type”

This operator was also proposed by others earlier in this thread, and in the prior thread on PEP 695.

What about T: (int, str) syntax?

PEP 695 also proposes using the : syntax to specify that a type variable must be instantiated with a type drawn from a set. The syntax is:
class Foo[T: (int, str)]:

If we were to introduce the <: symbol (or something similar) to mean “subtype”, we would need a different symbol to express the above relationship, because it is not a subtyping relationship.

In mathematics, the ∈ symbol (“element of”) is used to express such a relationship. And in Python, it turns out that we already have the in keyword for this. So I would propose using in in place of the : symbol:
class Foo[T in (int, str)]:

Summary

Using the : operator to denote a subtyping relationship would be problematic. We can avoid these problems by using a slightly different syntax, for example <: and in.

8 Likes

Regarding the dependent types: you can still achieve this with Literal types.

For example:

from typing import Literal

class Array[DType, Length: int]: ...

x: Array[float, Literal[4]]

def concat[DType, L1: int, L2: int](
    x: Array[DType, L1], y: Array[DType, L2]
) -> Array[DType, L1 + L2]: ...  # the `+` operator here would need to be generalized to int literal types

And I think that’s actually what Mojo is doing. It’s just that Mojo allows using 4 as shorthand for Literal[4] when it’s in a typing context, which allows them to write:

Array[float, 4]

And I actually think Python should also allow this. There is a fork of mypy which does allow it (they call it “Bare Literals”): GitHub - KotlinIsland/basedmypy: Based static typing for Python that makes breaking changes from PEP 484. Based? Based on what? so I think this would be safe to do. (It wouldn’t be safe for string literals because they could be confused with forward references. Though maybe with PEP 649 it would become safe?)

Regarding your other points, I’m not familiar enough with lower bounds to comment on that. I do agree with reason #2 to some degree, but I think I can live with it…

I can see how you’d come to that conclusion, but it turns out that in Mojo, 4 really is being passed as a value, not as a literal type. You can use the value in subsequent computations:

class Array[T, length: int]:
    def __init__(self):
        self.x = sqrt(length) + 1
a = Array[string, 16]()
print(a.x)   # prints '5'

Note: I’ve modified the syntax of this Mojo program to make it look like Python. The above program won’t compile as-is, but a very similar one will.

2 Likes

Hmm, but then how does Mojo distinguish value-parameterization from type-parameterization? For example, I also saw this example in the Mojo docs:

from Autotune import autotune

def exp_buffer_impl[dt: DType](data: ArraySlice[dt]):
    # Pick vector length for this dtype and hardware
    alias vector_len = autotune(1, 4, 8, 16, 32)

    # Use it as the vectorization length
    vectorize[exp[dt, vector_len]](data)

Source: Modular Docs - Mojo🔥 programming manual

That looks to me like an upper bound.

Mojo doesn’t currently support type bounds, because subtyping (inheritance etc.) hasn’t been implemented yet. In that example, dt is a value, not a type, and thus DType is a type annotation, not a bound. (My understanding is that DType is a temporary hack that is being used to simulate a sum type / union type. I wouldn’t study it too closely.)

But I think we’re beginning to drift off topic. If you’d like to discuss the details of Mojo, you’re welcome to join the Mojo Discord server! (You can DM me there: nick.sm)

The point of relevance to this thread is that in Mojo, values and types can be used as “type arguments”. But the syntax for type bounds (:) that is proposed for PEP 695 is incompatible with this.

2 Likes

@nmsmith, thanks for your thoughtful feedback. However, PEP 695 has already been accepted and implemented in Python 3.12. You’re about six months too late to provide feedback and about 16 months too late to be involved in the (lengthy) discussions among members of the Python typing community where we debated various facets of the spec.

The good news is that all of the issues you raised in your feedback were considered and discussed. I’ll try to address each of them in turn.

  1. You raised a concern that the new syntax doesn’t allow for the addition of “lower bound” support in the future. The addition of a lower-bound constraint was deemed unlikely for Python. It’s a feature that is found in very few modern programming languages. Refer to the appendix of the PEP for a thorough overview of other languages and which generics features they support. To the best of my knowledge, lower bound support has not been requested by users of mypy, pyright or other Python type checkers, and no motivating use cases have been provided for such a feature. In the unlikely case where such a need arises in the future for new types of constraints, there are ways that we could extend the syntax to accommodate.

  2. You asserted that the syntax is not consistent with Python’s existing syntax. I don’t agree with that. It’s syntactically and semantically consistent with parameter and variable type annotations, where the annotation indicates that the symbol must be a subtype of the provided annotation. For example, a variable foo: str must contain a str instance or a subtype thereof. Likewise, a type parameter with an upper bound is constrained to be a subtype of the provided annotation. You drew an analogy with the isinstance and issubclass calls. These are runtime calls. They’re functions, not dedicated syntax. They are used for testing class hierarchy relationships of nominal types. They were not designed to be used with the full static type system (and indeed they predate the static type system by many years). You mentioned a function issubtype. Please correct me if I’m wrong, but I think you meant issubclass. “Subclass” and “subtype” are related but different concepts. “Subtype” is a static type concept, and isinstance and issubclass do not understand subtype relationships in general. For example, these calls don’t operate on generic types (like dict[str, str]) or structural types (like non-runtime protocols or TypedDicts).

  3. You raised the concern that the : token is already used in several ways in Python. Most languages reuse tokens in different parts of the grammar. This is generally not a problem for users. When we were exploring various syntax options for PEP 695, I conducted a poll among typed Python users to get their feedback on various options, and there was a strong preference for the syntax that is in the final specification. I’ll also note that the syntax that we chose for type parameter bounds is consistent with a number of other languages. Refer to the table in the appendix for details.

You proposed using <:. That was one of the options we discussed during our lengthy debates and explorations. It was also one of the options included in the survey I conducted. The feedback was that it looked strange, and it wasn’t intuitively obvious what it meant – compared to : which was immediately understood by most typed Python users who took the survey. If I remember correctly, <: was the least popular of the options in the poll.

Again, thanks for taking the time to provide the feedback. Your input is most welcome on future static typing proposals.

5 Likes

PEP 695 has already been accepted and implemented in Python 3.12. You’re about six months too late

Python 3.12 is still in beta/prerelease, and thus per this page, it is open to “feature fixes”. That’s why I’m here—I’m suggesting that : has several problems and that it might be worth fixing them.

If others agree with this assessment, then it is obviously a good idea to address the issue when Python 3.12 has ~0 users, rather than when it has millions of users!

You drew an analogy with the isinstance and issubclass calls. These are runtime calls. They’re functions.

My point is just that there is a fundamental distinction between:

  • an object being an instance of a type
  • a type being a subtype of a type

Python offers two different functions for testing these things, because they are not the same concept! I strongly disagree with your assertion that they are. It’s just not true.

The assertion that you’ve made is analogous to asserting that the ∈ (“element of”) operator in mathematics is the same thing as the ⊆ (“subset of”) operator.

Please correct me if I’m wrong, but I think you meant issubclass .

Yes, that was a typo. Thanks.

When we were exploring various syntax options for PEP 695, I conducted a poll among typed Python users to get their feedback on various options, and there was a strong preference for the syntax that is in the final specification.

I checked that poll before making my original post. (Indeed, I read every post in the history of PEP 695.) The poll had an extremely small sample size (19 people), and was only sent to one particular subcommunity (the “Microsoft-internal Python forum”). What’s more, the respondents were not given information about the potential upsides/downsides of each operator (such as the information contained within my earlier post). Consequently, I don’t think it is fair to use the poll as an argument in favour of anything. At best, it tells us what people’s “gut feeling” is when encountering a foreign operator. People will obviously vote for what is most familiar and/or aesthetically pleasing. I agree with you that : is aesthetically pleasing (moreso than <:), however, we can’t let aethetics overrule practicality.

I’ve provided ample evidence that : is not a wise choice. I believe that evidence still stands. My motivation for sharing this assessment is to help Python grow in a good direction. That is all.

3 Likes

First of all, that was a really well laid out first post. Welcome to the community, Nick.

I have no idea whether there’s any point in debating a PEP that’s already been accepted.

I also don’t want to get caught in the crossfire here, but the thing that I like about Nick’s suggestion is that having separate notation for subtype and instance-of:

  • distinguishes concepts that are easily confused, and
  • it opens the possibility of the oeprator being used in other places.

Specifically, the subtype notation could be used in parameter type annotations. Instead of:

def f(some_subtype: type[SomeType]) -> SomeType: ...

we could have

def f(some_subtype <: SomeType) -> SomeType: ...

But then you’d have to keep type[..] around for other cases, and it’s usually a bad idea to have two ways of doing things. Also, it’s probably too terse for an uncommon case that should be made more obvious.

Still, there is a nice consistency with the meaning of the restriction wherein T is a subtype of SomeType:

def f[T <: SomeType](x: T) -> T: ...

This operator comes at a cost though: You’re adding an operator that people have to learn, and you’re making the type variable specification a lot busier (all those < characters).

I don’t really have a strong opinion about changing the notation. I’d happily accept whatever gets decided, and I’m absolutely thrilled that this PEP was accepted.

One good thing that can come from this discussion would be to put some of these ideas about the meaning of : in the two type contexts (parameters and type variables) into the documentation.

2 Likes

Out of curiosity, don’t square brackets already make this distinction obvious?

1 Like

Yes, sure. I think “obvious” might be different for different people. And everything becomes obvious after you get used to it—which is why I’ll happily accept whatever is decided.

you’re making the type variable specification a lot busier

Just to re-iterate: I don’t really mind what syntax is chosen, as long as it is forward compatible, as I’ve talked about. If people find <: too noisy, then < is a reasonable second choice. Consider:

def foo[T < Sequence](..)

This looks clean and concise to me. So maybe we can have the best of both worlds: aesthetics and forward compatibility. (The trade-off with this option is that technically <= would be the mathematically accurate symbol.)

2 Likes

I’ve consulted with the developers of Mojo. They agree that the : syntax is incompatible with their syntax for dependent typing, however they believe that they can resolve this problem by just using their own syntax for type annotations (rather than Python’s syntax), if need be. It’s not ideal, but it won’t be a major issue in practice.

At this point, I’ve provided all the information that I can about this issue, so I’ll leave it to the Python devs to figure out how to proceed from here.

Thank you for your time everyone.

2 Likes

You actually might be surprised at how many people use main, let alone the betas.

The key way to change something like this after Python has reached beta is for a groundswell of support to make the change, lest we break people already using the syntax to prepare for Python 3.12 final’s release.

I also don’t think we should make changes simply for Mojo’s benefit (yet). The language still requires an invite and they have not demonstrated they will be successful in their goal of being a Python superset. As such, I wish them luck, but to me Mojo is still too experimental and lacks enough users for us to have to restrict something in Python for them.

3 Likes

I suppose it’s too late to vote for a keyword like deftype or typedef instead of type. I see some discussion about the type soft keyword, but no serious discussion of alternatives to it.

I also didn’t see any response to PEP 695: Type Parameter Syntax - #70 by davidism which asks about conflicting with slice syntax. What became of that?

Yes, it’s too late.

It doesn’t conflict in the grammar, because PEP 695 syntax doesn’t appear in places where you could use a slice. I can’t say if some people might find it confusing, but something like this:

def f[T: int](x: list[T]) -> list[T]:
    return x[:1:2]

the two uses of : within square brackets are visually quite different.

2 Likes

Beyond the fact that this is unlikely to happen because of compatibility reasons, this has seemingly nothing to do with PEP 695.

2 Likes