PEP 718: subscriptable functions

Gobot1234 · June 24, 2023, 9:18pm

TLDR; This PEP proposes making various function-like instances subscriptable for typing purposes so doing things like:

def create_list[T](*args: T) -> list[T]: ...

create_list[int]()  # type is list[int]

would work without raising a TypeError.

Previous discussion: Making functions subscriptable at runtime

Thanks for reading, happy to answer any questions.

Jelle · June 25, 2023, 12:26am

Should this be supported?

def make_list[T]() -> list[T]:
    return []

make_int_list = make_list[int]

assert_type(make_int_list(), list[int])

This would of course work at runtime, but a static type checker implementing the PEP in a naive way might reject this.

If we decide that such generic function aliases should be supported, we should call it out explicitly in the PEP.

Gobot1234 · June 25, 2023, 10:34am

I don’t see any problem with supporting it. I’ll add it to the PEP

Melendowski · June 25, 2023, 4:29pm

From a person who often confuses the static and runtime applications of types, first thing that comes to mind is why cannot I also do isinstance([1,2,3], list[int]). I don’t know what point I’m making other than that’s something I’ve always found strange about types and what the assert_type makes come to mind

a-reich · June 25, 2023, 5:34pm

I don’t think this PEP really has anything to do with assert_type. It’s just used in the example to show how static checkers are supposed to interpret the new feature.

gpshead · July 7, 2023, 11:10pm

Many of the arguments for why we need this listed in the PEP can be read instead as “my type analyzer can’t do its inference job, so lets change the language to allow humans to do the job instead”. That doesn’t come across as compelling. Why not improve the type analyzer?

This syntax also provides us with something akin to the Markdown syntax linkification delimma of ordering the parens and brackets properly:

People will now be able to put [] and () in the wrong place and instead of an obvious error that could produce an error message telling them what specifically was wrong for having typed frob.fetch[stuff](key) when they meant a non-typing-related frob.fetch(stuff)[key]… it’ll have what behavior exactly?

It looks like it’d devolve into a fetch(key) call rather than fetch(stuff) and produce whatever error, if any, that might cause somewhat removed from the source of what was actually a syntax problem?

It wasn’t obvious to me from the PEP text when this syntax was intended to be used? At source analysis time by type analyzers much like an annotation? Or at runtime? It clearly has a runtime performance impact to use it. There’s an extra expression in the [], getitem call, GenericAlias object construction, return, and indirect call through the GenericAlias.__call__ implementation happning.

mdrissi · July 8, 2023, 4:01pm

I’ll add two examples of use cases for this PEP that hopefully help with cases where type checker likely can’t be smarter.

When deserializing data it is pretty common to have function where return type is not possible to know fro m the code, but the writer may expect a certain type. For example,

data = pickle.loads("my_object.pkl")

What type is data? Currently it is Any as there’s no way to know the type. With this PEP it would be possible (optionally) to do,

data = pickle.loads[MyFoo]("my_object.pkl")

and to adjust the type definition of loads (simplified) from,

def loads(contents: str) -> Any:
  ...

to

def loads[T](contents: str) -> T:
  ...

Not specifying the type is still possible like before and will fallback to unsolved type var which is like Any we have today. For simple example it is possible to instead do,

x: MyFoo = pickles.loads("my_object.pkl")

but slight variations that add one more function call in between like,

def process_foo(x: MyFoo) -> int:
  ...

process_foo(pickle.loads("my_object.pkl"))

you can’t really do that trick without splitting code into multiple lines or leaving pickle loading not type checked.

Same idea applies to several other deserialization functions like json.loads where as writer you may expect certain type, but type checker can’t possibly know what it is.

A different kind of example is where right type for type variable is ambiguous.

def foo(x: Sequence[T] | T) -> list[T]:
  ...

foo(b"hello") # What is T here?

Since bytes is Sequence[int] should T be int and return type list[int] or T be bytes and return type list[bytes]? Anytime you have Unions/overloads and type variables it becomes possible to introduce ambiguous cases where type checker has multiple options which type T should be. Type checkers today use heuristics to pick type that try to prefer simplest value of T, but sometimes you want other choice to be picked and I don’t think there is any clear way to always know what user expects for T. This pep would make it easy to call foo[bytes](b"hello") or foo[int](b"hello") allowing writer to be clear what they expect.

pf_moore · July 8, 2023, 4:56pm

TBH, this sounds like a case where the cure is worse than the disease. Can’t you just cast? That does the same, asserting that pickle.loads returns a specific type.

The arguments you are making seem rooted in an assumption that everything is fundamentally strictly typed. But in reality, pickle.load just isn’t. Things are different in a language like Rust that’s strictly typed from the ground up, but it feels very unnatural trying to impose that level of strictness on a dynamic language like Python.

Cast it, or annotate the destination variable. Why is it essential to have special syntax? In the end, foo[bytes](b"hello") and cast(foo(b"hello")) give exactly the same information to the type checker, and have exactly the same runtime overhead (one do-nothing function call). Why must this be syntax?

mdrissi · July 8, 2023, 5:09pm

I think mainly for symmetry with classes. Your same arguments also apply for classes where generic classes allow this syntax. It is common code to have

class Foo[T]:
  …

Foo(x) # Usually fine but sometimes the right T for x is ambiguous, so

Foo[int](x)

In the class case there’s one more benefit which is int there is actually available at runtime and can be accessed inside Foo to do some dynamic dispatch style logic. Function case could be similar with a way to access the passed type of given, like

def singledispatch[T](x: T):
  element_type = get_function_typevar() or type(x)
  …

This would allow functions like singledispatch in standard library that currently infer their behavior based on type to be given intended type to use for dispatch.

In practice there are alternate ways to write these patterns today. I commonly use pattern of just passing type as it’s own argument like

def load_data[T](typ: type[T], contents: str) -> T:
  …

load_data(MyFoo, “…”)

So overall I mostly see this to allow same patterns/way of specializing that generic classes offer today while generic functions don’t. But I agree that there are reasonable simple alternate ways to do this (your cast or add a type argument).

Edit: The cast approach also grows more complex when return type is not just T (or another type variable like) but some larger type expression that contains T or was affected by T. In that case you need to manually work out what the type checker would end up with when what you know is just intended T to use. Simple example of that is like,

def load_function[T](contents: str) -> Callable[[T], tuple[int,str]]:
  …

Overload case I think can grow more messy. For all of these default Any fallback/ignore still exists vs work it out. It is easier though if you want type checking to give T directly vs figure out what cast would be.

I think with using paramspecs it’s possible to even come up with an example where cast can’t work as there exist types that type checkers can infer but can’t be written down directly. At same time ambiguous case complex enough that cast doesn’t cover it does seem pretty rare and probably not worth being exact.

pf_moore · July 8, 2023, 6:58pm

I’m sort of inclined to ask for a concrete example of real-world code that does this and which would benefit from the proposal. But the reality is that I’m not the one making the decision here, and ultimately there’s not much point trying to persuade me (I’m something of a typing skeptic and I’m inclined to take a negative view of a lot of the claimed benefits of typing^[1] in any case).

But I get the impression from the comments above that the SC (or at least @gpshead) might have similar reservations, which is why I brought it up.

For Python. I’m a huge fan of typing in Rust, and this is an “obviously good thing to have” for Rust… ↩︎

mdrissi · July 9, 2023, 3:56pm

I’m assuming you mean real world code that does runtime class based type dispatch? Real world code that does runtime function based type dispatch would be then enabled by this PEP as function version of current class way.

For class based way it relies on using __orig_bases__ dunder mentioned in PEP 560. Recently (last april) there was a documented api added get_orig_bases as __orig_bases__ while documented in PEP was mentioned as internal to typing. I think most real code still uses __orig_bases__ as it’s been there for years while get_orig_bases is only few months old. Using sourcegraph here are places that use it for some runtime generic introspection that today only can be done with classes. I’ll pick 2 examples that I could follow intent,

In this library, that looks like they have sql ORM type class, they are using the generic subscript type to determine model class being used. In this other library, the generic type represents the type of config to use and is runtime reflected. The latter is similar to one I’ve used in bunch of production internal code where I have,

class Config:
  ...

class Pipeline[PipelineConfig: Config]:
  @classmethod
  def config_type(cls) -> type[PipelineConfig]:
    return get_args(cls.__orig_bases__[0])[0]

  ... # Other methods/fields in class may be annotated with PipelineConfig

class EvaluationSpec(Config):
  ...

class EvaluationPipeline(Pipeline[EvaluationSpec]):
  ...

The config type is then mainly used for serialization/deserialization of a pipeline.

Here’s one past issue where couple users reported using orig_bases in their libraries for class style runtime reflection. There examples I think should also be in sourcegraph listing.

gpshead · July 9, 2023, 4:46pm

Please don’t read what I say that way. If any of us SC members were speaking as the Steering Council, we’d explicitly say so. All of us are individuals. I’m merely asking hopefully relevant questions. The point of a PEP discussion is to identify questions and offer answers and ultimately see those clarified and captured in a relevant manner within an updated PEP itself. Ideally before it comes time to seek an actual decision.

pf_moore · July 9, 2023, 6:42pm

Not really. I was more asking the typical question that gets asked of all proposals, which is “please show some existing real-life code that would benefit from this proposal, and how it would be rewritten to take advantage of the new feature”.

That may be what your example is demonstrating, but without the “here’s how it would look if it used PEP 718” part, I’m not able to see what the improvement would look like.

I suspect the problem here is that the benefit is fairly obvious if you’re deeply involved with typing - for me, as a non-expert, I see the proposal and think “that looks like Rust - it’s really useful in Rust, but I can’t see how it would fit into Python”. So I understand what the proposal is saying, but I need someone to explain why it’s a good idea in Python. My gut feeling is that I don’t want to write Rust in Python - the languages have a lot in common where it makes sense, but also have fundamental differences that we should respect.

mdrissi · July 9, 2023, 7:09pm

My previous post had an example of runtime function based type reflection that would be possible with PEP 718. get_function_typevar is piece that requires 718 to be possible to define. singledispatch could use this for cases where the type intended for dispatch is not just type(x) and user could specify intent by doing singledispatch[ty]. This is most likely to happen with protocols like Sequence/Mapping which may not even be part of mro or if an object satisfies multiple types that have dispatch methods defined. Toy case of using singledispatch if PEP 718 was accepted and it supported runtime specialization would be,

class Foo:
  ...

class Bar:
  ...

class Baz(Foo, Bar):
  ...

@singledispatch
def fun(arg):
  ...

@fun.register:
def _(arg: Foo):
  ...

@fun.register:
def _(arg: Bar):
  ...


x = Baz()

fun(x) # Which implementation will it pick? Can you make it pick other one?

# With PEP 718 it would be possible to pass an explicit type here and make it pick that one
# given singledispatch considering that type at runtime.
fun[Bar](x)
fun[Foo](x)

The class based examples are intended as an analog as classes may be replaced with functions (closure if needs variables) at times and some apis are documented only as callable where implementation. At moment current difference between class vs function means that if library (including standard library) has a documented generic class implementation of a callable, then it can not be replaced with generic function without breaking backwards compatibility by causing any type subscript usage to become invalid.

Is lru_cache a function or a class? As a decorator it could be implemented as either callable class or a function. The current implementation is a function. If it’s generic then converting it to class is fine. Converting from class → function would be breaking change if generic today. One example of a python standard library api that was implemented as a class and later became a function is TypedDict. I’d be curious if there is any class → function transition in standard library before typing where function today is generic and that change would have been issue if done now. I don’t see an easy way to find class → function replacements though beyond reading through git history.

pf_moore · July 9, 2023, 8:44pm

So is the point of PEP 718 for runtime type reflection? I thought runtime use of type annotations was considered advanced. I’ve certainly never had a need for it myself. I think we’re still talking at cross purposes, because I assumed the main point of PEP 718 was for type annotation and static type checking.

But you still seem to be misunderstanding what I’m asking for. Sorry if I’m not explaining myself well. I have never seen a real-world use of singledispatch that would have benefited from this capability. In fact, I’ve seen very few real-world uses of singledispatch at all. I’m not particularly interested in what theoretical uses could be made of PEP 718, I’m interested in what actual, current production code could be made better if PEP 718 was implemented. And what the improved code would look like, in comparison to the current code. I want to actually look at the “before” and “after” code, to understand what the improvement actually is.

Again, this seems all very theoretical. And to be blunt, it feels like we’re way out in the “long tail” of trying to make it possible to annotate everything, no matter how useful it is to have the annotations in practice. For me, part of the point of typing being “gradual” and “optional” is that we can simply not annotate certain functions, if the cost of doing so is greater than the benefit. Have we abandoned that idea?

Sorry. I’m finding this all very theoretical and (as a consequence) frustrating. If it’s not possible to discuss the PEP in more practical terms, I honestly think that counts as a pretty strong point against it.

mdrissi · July 9, 2023, 9:05pm

I think one core difference is I don’t view runtime usage as advanced given there are many popular libraries that heavily use it (fastapi, pydantic, cattrs, typeguard, etc). I think main new functionality this PEP grants is for runtime usage. Static type checking wise it does simplify things a few times, but most of them cast will cover too and I think saying this is niche vs falling back to gradualness of not filling in is fair argument.

I’m mainly using singledispatch case as I do have production code used for a while that follows similar pattern. I’m also picking it as I think that example is simple enough to follow. That code is company internal though so I can’t share it and have been trying to extract examples similar to it. Config serialization cases from PipelineConfig above are very pydantic/cattrs like and I have 100s of parts of production code that do runtime style type reflection similar to them.

It’s also difficult for me to discuss whether something is theoretical/practical. From my view most of examples I’m picking are intended to be concrete (cast weird cases main exception). I will note that part of the point of typing being “gradual” and “optional” is statement true for type checker usage only. Runtime type checking libraries/reflection is optional in sense you choose to use them, but if you use runtime type library then you can’t just drop annotations as code itself will change behavior when run heavily based on them. If you never use runtime type heavy libraries like typeguard/beartype/pydantic then yes you can view type checking as mostly gradual/optional. It becomes less optional as you use runtime type libraries.

Edit: Part of my view is influenced by I have written and maintain internal library very similar to Cattrs that is heavy on runtime type reflection.

pf_moore · July 9, 2023, 9:18pm

I wonder - is the problem here that the examples you have are all closed source, so when I’m expecting you to be able to point me at actual source code, you don’t have that option and I’ve not spotted that disconnect?

Personally, I’m very strongly of the opinion that volunteer effort should be of benefit to open source projects first and foremost, and so I have a bit of a blind spot when it comes to features that aren’t motivated by open source use cases. So if the motivating cases here are largely internal projects, my views are unfortunately going to be biased against the PEP, and therefore should probably be discounted somewhat as a result.

davidfstr · July 15, 2023, 6:53pm

Thanks for taking the time to write up the proposal @Gobot1234 ! I have some minor feedback:

Rationale

Strange section title, considering its contents. Maybe “Definitions” would be more appropriate?
Nit: There’s also some broken monospace formatting in the first paragraph of that section.

Currently, __orig_class__ is unconditionally set; however, to avoid potential erasure on any created instances, this attribute should not be set if __origin__ is an instance of any of the aforementioned types.

What does “aforementioned types” refer to here? Function objects, as mentioned earlier in the “Rationale” section?

Currently these classes are not subclassable and so there are no backwards compatibility concerns with regards to classes already implementing getitem.

Nit: Here __getitem__ probably should be styled as monospace.

Gobot1234 · July 15, 2023, 9:30pm

Thanks for the suggestions and pointers David.

I was just going with the title suggested by PEP 12, there is also some rationale included in the paragraph after the short definition so I thought the title appropriate

Gobot1234 · August 14, 2023, 3:38pm

Sorry about the delay responding to these

IMO there’s no better solution to this, there is prescience for this in other languages like rust, c++, kotlin etc. which all haven’t found a better solution to this because some situations like @mdrissi has pointed out are unknowable and “In the face of ambiguity, refuse the temptation to guess.”

I somewhat agree with this concern as there wouldn’t be any error from the actual code (apart from maybe a KeyError?), however, I don’t think it will be much of an issue outside of typed python where this would really be mentioned/taught, where people should be using type checkers that will be able to catch these errors.

So for now I’m going to say it is mainly for type checkers to say, “ah yes so you meant to use this call variation of this function”. However, in the future it might be possible to have this set a cell variable or something similar that would allow easy access to the generic passed at runtime (I was thinking of adding .__value__ to type parameters in a future PEP to get the specialised type parameter, this feature does still stand on its own without this future PEP ever existing).