Basic terminology for types and type forms

We should also list the places where a type expression must be used. I think that’s the following set:

  • In a type annotation (always as part of an annotation expression)
  • The first argument to cast()
  • The second argument to assert_type()
  • The bounds and constraints of a TypeVar (whether created through the old syntax or the new PEP 695 syntax)
  • The definition of a type alias (whether created through the type statement, the old assignment syntax, or the TypeAliasType constructor)
  • The type parameters of a generic class (which may appear in a base class or in a constructor call)
  • The definitions of fields in the class-based forms for creating TypedDict and NamedTuple types
  • The base type in the definition of a NewType

@MegaIng thanks, will fix the typo.

1 Like

I don’t think this definition is useful without first defining what a valid expression of a type is and to who. I think a more useful definition would be along the lines of

“An expression which is intended to indicate an expectation of a type to tools aware of the type system”. This leaves the validity of such an expression to those tools, which would imply the rules laid out elsewhere in the typing specification for what does and does not compose.

I don’t particularly like trying to create a pseudo-grammar like this, it’s imprecise to what needs to be done by type checkers (for instance, with aliases) and we’re not writing a parser for it here. I do see value in documenting strongly which constructs may be composed with which other constructs, but not in such a form.

With the above definition, I would change this to be “places where an expression is treated as a type expression” since runtime does not enforce such use, but otherwise, I agree with this list that follows.

Love it. Ship it!

Appreciated

Makes sense. (I withdraw my prior hesitation on this point.)

Grammar sketch looks good to me as well.

I like the formalized grammar approach. After all, type expressions are an “embedded grammar” — a subset of the full grammar supported in Python. This approach eliminates ambiguities about which syntactic forms are allowed and disallowed within type expressions. This is a continual source of confusion among users.

Your grammar specification looks pretty complete to me. The only things I’d add are:

  • InitVar, which should be a qualifier (valid in some contexts)
  • Additional clarifications about strings used for forward declarations. I think it should explicitly say “string literal” rather than just “string”. Raw strings and f-strings should be disallowed. I’d also suggest that concatenated strings be disallowed, as they are currently in pyright, because they are problematic when reporting diagnostics with character ranges (something that mypy historically has not concerned itself with). I’d also prefer that we disallow escaped characters in string literals — something that is also disallowed by pyright for the same reason as concatenation. It should also be noted that triple-quoted (multi-line) strings are allowed, and should be parsed as though they are implicitly parenthesized.
  • Specialization for ParamSpec type parameters requires a list expression form. For example, if you have a generic class or type alias Foo[*P, T], you can specialize it using Foo[[int, str], int]. I don’t think that’s covered in the grammar currently.

The special forms Generic, Protocol and TypedDict may need to be called out explicitly as “special forms that are not allowed within type expressions”.

In your “list of places where a type expression must be used”, you include “The type parameters of a generic class (which may appear in a base class or in a constructor call)”. I’m confused by this statement. I suspect you mean “type arguments” rather than “type parameters” here. If so, then it makes sense to me. More generally, it might make sense to broaden this to say that type expressions are used in any explicit specialization of a generic class or type alias.

Edit: Thinking more about the “specialization for ParamSpec type parameters” above, it probably makes sense to create a new definition in the grammar — something like param_spec_expression. It should include ... and Concatenate and the list syntax. You can then use param_spec_expression to simplify the Callable variants in your current grammar, since Callable accepts the same expression forms.

6 Likes

+1 to the formalized grammar approach. Thank you so much @Jelle for the hard work!

Two nits I could think of:

  • In the current form of the grammar, are things like List["MyClass"] or "List"[MyClass] allowed as a type expression or an annotation expression? If the string quoting occurs at the outermost level (e.g. "List[MyClass]") I guess it’s unambiguous. But what should happen if the quoting is nested inside the types?
  • What’s the meaning of TypeGuard[T] if it is allowed to appear on non-callable-return positions? I wonder if it’s possible to add restrictions to the grammar that explicitly limit not only the callable param types but also the callable return types as well, banning people from writing e.g. Callable[[TypeGuard[int]], None].

In the grammar as written above, every type or annotation expression may be a quoted expression. In the form A[B], B must be a type expression, but A is not (it can only be a name). Therefore, list["MyClass"] is legal but "list"[MyClass] is not (which matches the runtime behavior: the latter will throw an exception).

It’s not meaningful in other positions and should be rejected by type checkers. I think it’s fine if there are further restrictions that go beyond the grammar itself; for example, list[int, str] is allowed by the grammar but obviously should be rejected by a type checker. Similarly, Self is only allowed within a class. Such restrictions should be mentioned in the documentation for individual special forms.

The restrictions on TypeGuard are simple enough that it’s not impossible to encode some of them in the grammar, but such a restriction won’t be complete anyway (for example, Callable[[], TypeGuard[T]] should be disallowed), and especially if we also add TypeIs, it would increase the complexity of the grammar. Therefore, my weak preference is to keep those restrictions outside of the grammar itself.

2 Likes

In the grammar, there’s an edge case that seems to be missed - name '[' '(' ')' ']', to specify an empty TypeVarTuple.

3 Likes

I don’t strictly disagree, but I’m not enthusiastic about having a specification work in this manner yet as I think there are still too many interconnected pieces that don’t compose properly in all cases throughout the type system, and formalizing it in another place rather than it just being as a consequence of independent pieces may make changes to improve how things compose harder.

I think it may be safer and easier to work with long term if a type checker encounters a composition it doesn’t have a meaning for, it can raise an error such as “Composing X with Y may not be defined behavior” with or without a grammar addition, and that we should prefer this manner going forward rather than preemptively precluding specific behavior unless necessary, and known cases for necessity of that should be part of the documentation and spec so that if they no longer are blocking reasons, the restriction can be easily revisited.

This should result in not needing as many situations where multiple type forms need reconciling with each other later on when some interaction between them wasn’t considered for some other later addition. Not considered? Type checkers can raise and say they don’t know what to do here, they don’t think the construct hasn’t been defined to have behavior and error without the potential for an interaction being blocked at the specification level preemptively. (Error as undefined behavior as a wide catchall, don’t allow undefined behavior to become what is allowed, but don’t disallow things at the specification level more broadly than necessary in advance making defining what was previously undefined harder.)

I’m not Entirely against expressing a grammar-like construct, but due to the various special casing that exists throughout the type system, I think if we add something like this, it should be in a section that is clear that the grammar-like construct is intended to reflect how the type forms are specified to compose, and that the grammar should be considered as generated from the rest of the specification (even if it isn’t actually machine generated) rather than authoritative when in conflict with something else in the specification. (ie. type checkers should be able to use any such grammar, but if the grammar and spec are out of sync, the spec should be considered what is correct until reconciled)

I’m incorporating this into a proposed change to the typing spec now and ran into an issue. Consider this code:

from typing import ParamSpec, Generic

P = ParamSpec("P")

class Foo(Generic[P]):
    pass

f = Foo[[int, str, float]]()
reveal_type(f)

f2 = Foo[[*tuple[int, str], float]]()
reveal_type(f2)

Pyright errors on the f2 line with “Unpacked arguments cannot be used in type argument lists”, and mypy also rejects this code with “Unpack is only valid in a variadic position”.

However, I think it should be valid and equivalent to the definition of f. The spec currently says that unpacked tuples are valid only in a tuple type expression list, but both mypy and pyright already allowed unpacked tuples in Callable types (e.g., Callable[[*tuple[int, str]], float]).

For now I’ll write the proposed spec change for type expressions to exclude unpacked types in ParamSpec specializations, since that matches current type checker behavior.

I have now posted a proposed typing spec change incorporating the definition I proposed above, plus a few missed cases that @erictraut and @TeamSpen210 brought up:

This is a large spec change counted in lines, but it is not meant to prescribe any behavioral change relevant to the current spec, only to move towards more precise definitions and shared terminology.

Please comment on the PR if you have any concerns with exact wording or with the RST syntax, and post here if you have larger concerns with the change.

I plan to wait for a few days and then submit the proposed change to the Typing Council.

1 Like

I think there’s a danger of this composing badly with a future potential expansion to Callable specialization, where we have a way to spell more/all the different argument types within the parametrization. So disallowing it seems like a good idea, to keep the available design space more open, since we really don’t gain all that much from allowing it. The one case that currently is allowed, is a little bit easier to justify, but I agree that it’s probably better to be consistent, and to either allow or disallow it in both cases. Making it a special case is weird.

Submitted to the Typing Council: Typing spec update for defining "type expression" and other terms · Issue #26 · python/typing-council · GitHub

1 Like

The Council has signed off on the change and I merged it into the typing spec. Thanks everyone for the useful discussion!