PEP 718: subscriptable functions

For interest sake, is the only reason why square brackets are proposed instead of the C++ < > brackets due to the implementatiom and usage of __getitem__ why would simplify the implementation and not require a new parser?

Asking to better understand how and why such features are implemented in a language at lower levels.

__getitem__ was chosen for consistency with other forms of subscription, unfortunately using any other syntax is a very sailed ship and was chosen originally to keep the impact of PEP 484 relatively small outside of the scope of non-typed python

Using angle brackets would be a far more involved change. Implementing PEP 718 as currently proposed is merely a library-level change: add __getitem__ support to function objects. Using angle brackets, on the other hand, would be a backward-incompatible syntactic change. Currently, f<int>(a) is valid syntax, equivalent to f < int > a, which is meaningless but syntactically valid.

6 Likes

I’m trying to understand this example. Even if foo() was able to access the type explicitly passed as T, how would it use that information?

# Both should return [[1, 2, 3]], as per the type given.
a = foo[list[int]]([[1, 2, 3]])  
b = foo[list[int]]([1, 2, 3])

How does foo() decide which code path to take (i.e. interpreting x as a Sequence[T] vs. T). I think it would have to look at the value given and dynamically decide whether it matches T or not, simply looking at whether it is a sequence of any type or not would not work. AFAIK dynamically checking whether a value matches a generic type doesn’t work in general in Python.

An example where runtime type is not enough is when a type matches many runtime types and you have distinct behavior for each. Simple case is {“a”: 1, “b”: 2}. What type is that? dict[str, int] is one natural answer but there are other types it matches. It is Mapping[str, int], Iterable[str] as well. I’ve written code where path it uses for Iterable type and path it uses for Mapping type are different. While most of time dict would make sense to take Mapping path I’ve occasionally explicitly passed type as function argument as Iterable and had it focus only on keys.

Another example where runtime type is tricky is with generic types, typeddict, and other types you can not easily tell just from type(x) or mro.

class Foo(TypedDict)
  x: int

a = Foo(x=1)
print(type(a))

The type of a at runtime is not Foo. It is dict. Typeddict do not remember their type at runtime. Generics often also lack their type as easily available at runtime. There are libraries that try to identify these cases but it’s easy to end up in situations where it’s either costly to infer/rather complex. Recursive generic type is one tricky case and I do have code that at runtime takes code path based on types and passed recursive ones. Type hint for json like structure is a good example of recursive type. Simple type(x)/mro check won’t handle that.

The discussion link in the PEP currently points to the discussion in Ideas, and not this thread. That should probably be updated.

Regarding the proposal itself, I think I agree with @pf_moore that either creating an intermediaty variable or using typing.cast seem to cover all cases already covered by this PEP. I would like to see that point addressed and, as already asked here, some examples demonstrating that this actually is a win. One comparison that I would be really interested in seeing is performance, is using intermediate variables (when applicable), casts, or __getitem__ faster?

I would also like some stats on how common these situations are. I think I’ve used a forced cast like once. If this situation isn’t very common it probably shouldn’t get special syntax. But without some stats on how common these situations are we can only guess.

>>> def foo[T](): pass

>>> statistics.fmean(timeit.repeat("foo[int]()", globals=globals()))
0.1284119249903597

>>> statistics.fmean(timeit.repeat("cast(int, foo())", globals=globals()))
0.06919300858862698

>>> statistics.fmean(timeit.repeat("foo()", globals=globals()))
0.0344321416108869

(numbers are ran on an unoptimised build)
I wasn’t expecting the new form to be so slow but then again it is an intermediary step that has to create a new object though in reality I don’t expect this to be much of a performance concern as people would already be doing the latter with an intermediary variable. The main reason I actually see implementing this is for reified types which would need this to be useful and if people wanted this they’d need to accept they would be costs associated with this approach for using this. (how do I pass down the Class type to a method without using it as a parameter · Issue #1544 · python/typing · GitHub was made just a few days ago.)

I would like to add the numbers if you use the same mechanism for classes the numbers are slower for some reason.

>>> statistics.fmean(timeit.repeat("Bar()", globals=globals()))  # this makes sense
0.11875184160890058
>>> statistics.fmean(timeit.repeat("cast(Bar[int], Bar())", globals=globals()))
1.063153683010023
>>> statistics.fmean(timeit.repeat("Bar[int]()", globals=globals()))
1.2137858666013925

I’ll update that now, not sure how that slipped through.

typing.cast is inherently unsafe and its usage shouldn’t be encouraged because it can lead to very hard to detect bugs in typed code. Using an intermediary variable is a decent solution and is always the most performant but one symmetry between functions and the class based syntax here I think is important especially as a first step towards adding reified types, I’m not sure how the examples in PEP 718: subscriptable functions - #7 by mdrissi or PEP 718: subscriptable functions - #9 by mdrissi aren’t strong if you want to maybe add the option in the future. There are also multiple examples in the PEP where having an intermediary variable i.e. with a lambda isn’t ideal.

I think your timings are inconsistent because your casts are inconsistent. In the cast for the class case you are subscripting Bar inside the cast, so it makes sense it’s as slow as the other case, while above you are casting to a builtin int, which is a lot less expensive, since it’s just a simple name lookup. I’d expect it to be consistent with the above timings if you wrap the type in a string, which is what you generally should do if you care about the performance of cast, since the type expression always gets evaluated, unless you wrap it in a string.

I disagree, TBH. Personally, I try to avoid casts, partly because of the runtime cost, but also because of the unsafe nature of casting. Adding a theoretically safer solution which is significantly (2x) worse than a cast which already had a performance overhead, seems like like a bad decision - people will either have their prejudices that “Python is slow” reinforced, or they will avoid safer typing constructs because “the overhead is too high”. Neither situation is good. And the results for classes are even worse, although @Daverball casts some doubt on their validity, so I’d like to see better benchmarks if that is the case.

Wrapping types in strings feels like a hack, TBH. Hopefully PEP 649 – Deferred Evaluation Of Annotations Using Descriptors | peps.python.org will make this unnecessary.

I agree, although I don’t think PEP649 is a catch-all at all, it helps with reducing the need of having to deal with string annotations at runtime, which should make writing libraries such as Pydantic easier, without imposing annoying restrictions on the user code. Other than that I don’t expect a large improvement, compared to what is already possible with PEP563 as far as reducing runtime overhead goes.

cast is a function like any other, so it’s unlikely to get special treatment, since dynamic dispatch still has to work as usual. We’d need a cast keyword in order to be able to statically defer the type clause of a cast.

Although I’m hoping PEP649 will consider specifying a special syntax for an explicit forward reference [1], so we can still get the benefits of a forward reference in places where the dynamic nature of Python makes it difficult/impossible to always use one. This has some overlap with deferred expressions though, so I could see why it would be an unpopular addition, without making it more generally useful.


  1. something like the type keyword, which is already kind of that, but only for type aliases, I would like to be able to use it inline in any expression without assigning a name to it ↩︎

This point I don’t think is that big a deal because the annotation approach (which is safe)

var: int = foo()

is the approach if you care about the performance (they are completely stripped from bytecode). However, I think there is room for optimisation for this case as Jelle mentioned through specialisation.

@Daverball With the tests I was intentionally not using strings to show the performance overhead though maybe just a benchmark for the speed of generic alias construction is important here:

>>> statistics.fmean(timeit.repeat("foo[int]", globals=globals()))
0.0814359413983766
>>> statistics.fmean(timeit.repeat("Bar[int]", globals=globals()))
0.9336082836030982

(edit: figured out that Bar.class_getitem is slow for pure python classes because it’s checking more stuff)

>>> my_cool_list = list
>>> statistics.fmean(timeit.repeat("my_cool_list[int]", globals=globals()))
0.16742044179700316

Also yeah PEP 649 won’t be a fix here agreed and a way to specify forward references syntactically would be nice agreed.

I think I shouldn’t have asked about performance, no one is going to use cast or this proposed syntax if it’s in a hot loop (unless it can be optimized away). Sorry about that.

Regarding the implementation details, why did you choose to implement __getitem__ instead of __class_getitem__ to mirror all (?) the other generics that got this feature? Is it because we’re technically working with an instance of a function? If this is what we choose, it seems mirroring the current generics makes more sense.

Overall, I think I’ve come to think this is a good idea.

1 Like

__class_getitem__ is for making types themselves subscriptable and may be saved for the day that PEP 612 clarification: ParamSpec that captures a method signature · python/typing · Discussion #946 · GitHub is fixed and FunctionType[Params, ReturnType] makes sense and binds as you’d expect. __getitem__ was indeed chosen because this is an instance of FunctionType

@guido do you think this can be submitted to the SC for 3.13?

Currently you have to use an assignment to provide a precise type … but this code is unnecessarily verbose taking up multiple lines for a simple function call.

The primary competitor to this PEP seems to be just providing type context through a variable annotation (or maybe a cast). If this PEP is primarily:

x: something[T] = generic_function(...)
# becomes
x = generic_function[T](...)

then I’m not sure it’s doing enough to merit its cost. Maybe the PEP could benefit from showing more real-world / complex examples where the variable annotation doesn’t provide sufficient context?

This proposal also opens the door to monomorphisation and reified types

Could you spell this out more?

I think the PEP could also use more specification of type checker behaviour, e.g. it’s not clear to me how the PEP works with overloads with different numbers of generics. Similarly it would be good to specify Feuermurmel’s example from up thread (the return type should be list[list[int]] and an error if that’s not the case).

(speaking only for myself, not the typing council)

2 Likes

Well, something might be a complex type specified once in the def, and we might not want to repeat it. OTOH, the latter has runtime overhead to evaluate T that the former doesn’t (if x is a local).

On the third hand, it’s always bothered me that for a generic class, we can write C[T]() in an expression, but for a generic function, we can’t use f[T](). I think that makes it an argument from consistency more than ergonomics. I can’t decide which is more Pythonic – we tend to value ergonomics, but we also value consistency (because it means having to learn fewer special cases). But neither to extremes.

(In all these examples, T is not a typevar but just some type expression.)

I agree the PEP’s case isn’t super-strong, and I haven’t read it carefully to agree or disagree with your other feedback.

2 Likes

This would be possible with reification which is currently possible if you use a class.

Good catch, I’ll add a note about it.

I’m looking at the PEP again to see whether the use cases are strong enough to motivate a change to the core language. I feel that for this PEP to be accepted, it should give good examples of sound, type-safe code that is currently impossible or unduly verbose to write. Let’s go over the examples given in the PEP’s “Motivation” section.

Empty containers

def make_list[T](*args: T) -> list[T]: ...
reveal_type(make_list[int]())  # type is list[int]

This makes sense for the specific case of empty containers. Why would you write such a function, though? Are there examples of real users who needed this?

Factory callable

def factory[T](func: Callable[[T], Any]) -> Foo[T]: ...
reveal_type(factory[int](lambda x: "Hello World" * x))  # type is Foo[int]

This is a good example:

  • I can imagine a type-safe implementation
  • Type checkers would be able to tell if you use the wrong type. For example, if you wrote factory[str] on the second line, type checkers can flag that "Hello World" * str would fail.
  • Type checkers can’t realistically infer this type, because other types could potentially exist that can be multiplied with a string.

On the other hand, there is a trivial workaround that works today: you can write my_foo: Foo[int] = factory(lambda x: "Hello World" * x). Guido rightly points out above that this may be less convenient if the type is more complicated than just Foo[int]. Still, how common is this sort of thing in practice? Can you point to e.g. mypy issues where someone writing real code would have been helped by this?

Undecidable inference

def foo[T](x: Sequence[T] | T) -> list[T]: ...
reveal_type(foo[bytes](b"hello"))  # list[bytes]

Here the function subscript helps the type checker decide whether to pick the Sequence[T] or the T branch of the union. But that is problematic, because it seems I could as well write foo[int](b"hello") and get list[int] back (since bytes is a Sequence[int]).

Yet at runtime, foo will only return either list[int] or list[bytes], because its implementation can’t see whether we did foo[int] or foo[bytes]. Therefore, I don’t think there is a type-safe way to implement this foo function.

If we want to fix this undecidable inference, I feel we should do it instead by providing a more precise specification for type checkers on how to solve this type variable.

Unsolvable type parameters

def foo[T](x: list[T]) -> T: ...
reveal_type(foo[int]([]))  # type is int

This is another function that is impossible to implement: how can it return an instance of an arbitrary type when given an empty list?

Conclusion

This PEP needs a stronger motivation from real use cases. Two of the examples are useful, but I am not convinced that they represent common real-world code. Two others involve functions that cannot be implemented in a type-safe way.

I vaguely remember several cases over years of following typing discussions where subscripting functions has come up. However, the PEP does not link to any such cases, so I have to evaluate it based on the examples given in the PEP itself. Right now, those examples aren’t convincing enough for me to support the PEP.

(Speaking only for myself, not the Typing Council.)

5 Likes

I’ll add one example that’s become more timely on lack of TypeForm[T] and making type[T] stricter.

def from_config[T](data: object) -> T:
  ...

class Foo:
  x: str
  y: int

from_config[int | list[int]]](4)
from_config[Foo]({"x": "a", "y": 3})

where with some runtime type reflection the implementation would need access to type argument. The function equivalent of a class being able to do cls.__orig_bases__ and access it’s own generic type. This would enable one use case I have for TypeForm[T] and can not be safely written today in type system because non-simple types like int | str and list[int] are no longer compatible with type[T].

For my own usage, if PEP 718 was accepted and there was runtime way to access functions own type argument, that would solve most of my TypeForm needs and similarly trycast could be written with PEP 718 and no longer need TypeForm.

edit: This also touches on is it possible to implement some things in a type-safe way. For undecidable inference foo[int] vs foo[bytes] case, if foo has access to T at runtime it can use that to make a decision.

3 Likes

I guess what you’re saying is that the PEP becomes useful if there was a way to access the type argument at runtime. But the current version of the PEP doesn’t provide for that.