PEP 827: Type Manipulation

Well, the type of the method in this case includes the original type, the pep language doesn’t specify that this is only safe on staticmethods or require that it is safe to rebind to a new type that isn’t a subtype of the original, but allows constructing a structural type which then by definition isn’t a subtype of the original.

If you pull methods from two different concrete types, rather than from protocols, you’re now running into essentially an ad-hoc intersection requirement to determine the type of Self for these methods.

It gets significantly harder than it already starts out, to justify rebinding to an arbitrary type with out consideration when looking at user-defined descriptor types.

None of this is actually explored in the pep.

In terms of variance issues, there’s a few that come up without explicit typevariables due to some things that are implicit typevariables, as well as python’s data model. Prior discussions about Self and __replace__ would cover a few of those, but that’s not an exhaustive set of the ways this can arise. Notably of Self, rebinding Self to an unrelated type (unrelated via subtyping, which a new protocol would be) is known unsound.

I don’t think this is actually reflective of the pep’s goals. While the type is being copied, the reason the type is being copied is so that libraries that apply transformations to types from runtime types to create new types have a way to express their behavior. They aren’t going to magic into existence a method definition, they have to get that implementation from somewhere, and that’s generally going to be from the type they are copying the type of the method to do so.

1 Like

I’m not totally convinced by this argument. Specificially the “And if you’re curious how typing is plugged in here you’d ctrl+click onto the ComputeSelectReturnType type and learn.”

I see the problems this PEP tries to solve and have run into them as well, but the examples given in the PEP are really quite simple. In my experience in larger codebases you very quickly run into patterns where queries become composable using different functions. Something like the following pseudo code:

def select_statement[ModelT, K: typing.BaseTypedDict](
    typ: type[ModelT],
    /,
    **kwargs: Unpack[K],
) -> Query[..., ComputeSelectReturnType[ModelT, K]:
   # This function returns some sort of wrapper around the select function
   # provided by the library to compose it with other statements.

def additional_filter() -> Query[..., ...]: ...

def get_all():
   return database_driver.execute(select_statement())

def get_all_with_filter():
   return database_driver.execute(select_statement() + additional_filter())

It is nice that the database library can provide ComputeSelectReturnType but I’m convinced at some point it will leak into the codebase that is using the library through an explicit reference.

I have seen the same happen with fastapi-like codebases, both at work and in other repositories. At some point you start adding tests for the endpoint functions or (unit)tests for the functions/models within them and the fancy types you added there will start leaking into your tests and other code.

Whether these patterns are good or bad is of course a different matter, but I think a case can be made that they will show up.

In both case you don’t need to “ctrl+click” only if you want to learn, you now need to understand what ComputeSelectReturnType is as you’re using it in your own code. If the original author does not want to “learn” they might opt for Query[Any, Any] to not have to use ComputeSelectReturnType at all, but as a reviewer I’d like to stay away from Any and ask them not to.

Whether the syntax and concepts in this PEP can be explained/understood is a different question. I already have quite a hard time explaining those less versed in the typing ecosystem how to use ParamSpec or **P correctly and I’m not particularly looking forward to having to explain this as well, but that is just my own experience. If the community thinks it can/should be explained I’ll make that work :smiley:

All I’m saying is that gut feelings and “I don’t want to see this in code” will not get us far.

I hope my examples can show that it will be hard to keep this out of code that is using libraries that use the new syntax/concepts. I’m personally convinced it will leak and therefore I think we do need to address the concerns about this complicating readability and understandability of the code.

3 Likes

I see that there are some misconceptions about shape-typing in NumPy, so allow me to clear that up.

TypeVarTuple

The Array example in the PEP uses variadic type parameters to represent its shape. This isn’t possible for several reasons:

  1. No bounds: The shape of numpy arrays is np.ndarray.shape: tuple[int, ...], TypeVarTuple currently doesn’t support bound=int, i.e. you can’t declare Array[DType: np.dtype, *Shape: int]. This makes the Shape type parameter incompatible with the type its supposed to represent: np.ndarray.shape.
  2. Invariance: Currently, TypeVarTuple is, unlike the variadic type parameters of tuple, always invariant, rather than covariant. When users (for whatever reason) want to use NewType("Ax1", int) to write Array[float64, Ax1], they it wouldn’t be assignable anymore to e.g. Array[float64, int], which would likely be problematic.
  3. Ambiguous defaults: In many cases it’s not possible to statically determine the shape-type of an array. It would then be natural to write Array[_] to describe “an with unknown shape” for some arbitrary dtype _. But Array[_] can just as well be interpreted as “a 0-dimensional array”, i.e. a scalar. There are workarounds for this, but in both cases it’s syntactically awkward: Array[_, *tuple[Any, ...]] and Array[_, *tuple[()]].

This is why we currently directly use variants of tuple[int, ...] in numpy for shape-typing purposes. For example, numpy.typing.NDArray[S: numpy.generic] is an alias of numpy.ndarray[tuple[Any, ...], numpy.dtype[S]], and the numpy.eye function returns a numpy.ndarray[tuple[int, int], ...].

Literal

Using Literal integers for axes types, e.g. using tuple[literal[2], Literal[2]] to describe the shape of a 2x2 array, isn’t helpful.

For example, in numpy, you can reshape a an array a of shape (12,) (representing a vector) into a (4, 3) array by writing np.reshape(a, (4, -1)), which is equivalent to writing np.reshape(a, (4, 3)). However, tuple[Literal[4], Literal[-1]] makes no sense, and we can’t do integer arithmetic using static typing.

Another issue is that we, as I’ve said before, often don’t know the exact shape-type statically. So if we have an array a with shape-type tuple[int, int] (so 2-d), which at runtime is 2x2, then we wouldn’t be able to pass it to a function f(ndarray[Any, tuple[Literal[2], Literal[2]]]). So the issue here is that there’s no “gradual int”. And if we cannot restrict types to have a shape of Literal ints, then using Literal in shape types serves no purpose (besides documentation).

But not using Literal integers for shape-typing also makes things a lot simpler. Broadcasting, for example, reduces to a simple “max” operation when the axis types are at least int. So broadcasting tuple[int] against tuple[int, int] is the same as taking the “largest”/“longest” of the two, which gives us tuple[int, int]. And I’m sure there’s an elegant way to define a type MaxShape using the proposed conditionals :slight_smile:.

For a related discussion on the use of Literal in shape-types, see Respect fully specialized shape types in operations · Issue #578 · numpy/numtype · GitHub

8 Likes

Can I present a simpleton’s perspective? I was excited about reading this pep, given that I like a lot of what typescript offers, but ended up being a little disappointed. The prism example was not particularly compelling, and while it would be nice to be able to describe an entire ORM in one line of code, I find myself, heavily influenced by *grug* and some notions of what is pythonic, that the separate classes makes a bit more sense and ends up being simpler.
Also, I hate to be “that guy”, but with certain tools coming out that help us generate such simple classes and models rather quickly, I find the elegant, heavily implicit type HeroCreate = Create[Hero] to be just bothersome compared to a few different classes, even if they end up requiring some amount of duplication (“I need a view model for the schema class that will be a pydantic model constructed from attributes missing the password field and optional x,y,z fields - go.”) I think that a lot of “I wish the type system were more like typescript” comes from missing the fact that python is an “OOP first” language, and that’s just the way you should tailor your code. Another source of pain is that attribute access in python is a rather complicated process, while in JS proxies are not the most important or common sort of thing, and typescript could ignore them completely and still provide 99% of it’s value (as opposed to python, where for better or worse that metaprogramming stuff is baked too deeply into the crust). That typescript types aren’t reified and don’t need to actually exist is also a huge boon, for reasons.
You know what I really miss? Being able to do SomeInterface[‘memberName’] to get the type of the member. When I think about it, in particular do to reification, I feel this would be very difficult to do in python. Maybe some of the fancy type stuff could be restricted to TypedDicts? Maybe subclasses of a new type of typeddict that allows adding methods to subclasses? That could be interesting and would provide many simplifying assumptions that enable typescript like stuff being brought into the fold, but it may not be glorious enough…

2 Likes

OK, I think you are correct that there are subtleties here, and that the PEP currently is making some implicit assumptions about Self types that need to be spelled out in more detail. (In particular, we need to do something along the lines of returning Self types for unannotated self arguments, if we want to allow copying methods.)

It’s unfortunately slightly more complex than just that. This is one of the parts of the proposal that interacts with the intersection specification.

Take the following example:

class A:

    def __init__(self, value: int, /):
        self._value: int

    def get_value(self) -> int:
        return self._value

class B:

    def __init__(self, b: int, /):
        self._b: int 

    def get_value(self) -> int:
        return self._b

What happens here if get_value is picked from either of these? What about from both?

The type system needs knowledge of what Self actually is here, whether implicit or not, because as far as the type system is supposed[1] to be concerned, when not type checking the function itself, the function is opaque, and only the type signature of the function is relevant.

Here, to copy the method (and therefore have a reason to copy the type of the method), it’s a requirement that wherever you copy it to, Self remains compatible with each related original.

While she stepped away from actively working on the intersection discussion, @Liz brought up some things during it that seem extremely relevent to what you’re trying to accomplish here (hoping she’s willing to correct anything I get wrong in summarizing it)

  1. Without fully recursive intersections, you can’t safely copy methods or descriptors from two seperate nominal types.
  2. If limiting to structural types, you can safely copy from an arbitrary number of methods or descriptors, but it may be statically provable that there are no possible types that meet the definition of that protocol.
  3. If limiting the selection so that the fields copied are entirely disjoint across method/descriptor/data field names, you can safely copy an arbitrary number of methods, descriptors, and data members from structural types, and so long as each of the structural types can have possible implementations, the resulting combination will also have possible implementations.
  4. You can’t copy the implementation from a nominal type safely even if the nominal type meets the protocol definition, because it may have implementation details tied to that nominal type (self) that aren’t expressed by the protocol
  5. It’s not safe to copy data only members from nominal types and assume they are actually data only, because typecheckers allow, and stubs frequently express properties as if they are just data members.

These were cornerstones of her suggesting that it may be possible to limit the intersection pep to structural types in the initial work, but expand what is allowed as we defined more. Had TypeIs not already been accepted, I likely would have been personally swayed by these, but we have the relatively unfortunate case where the intersection work has had to also serve as exploring the implications of things that were accepted.

Given the primary use cases the pep presents, I believe you can simplify it and serve the majority of the presented use cases if you limit the pep to structural types, and either don’t allow copying methods or descriptors, or allow it for only copying from a single nominal type while also creating the new type as a subclass of that type (more on this in a moment). This would cover the structural types of various web and orm cases presented, while still allowing libraries to define a set of methods those constructed types will have that aren’t sourced from a structural type definition.


  1. async def functions break this rule, and it causes problems for async generators defined in stubs, I imagine given that automatic generation of types is a use case, you wouldn’t want stubs to have ambiguity for ignoring this. ↩︎

If you limit to constructing a new type, where the implementation of data members/methods is not sourced by copying the implementation (Such as ORMs and dataclass-likes used for various api wrappers that supply their own implementations), as well as allowing a singular type from which a base-set of methods may be provided (for things like telling an orm to write the current state back to the db), you get something that at runtime may look like this:

def construct_type(name, base, fields):
    return type(name, (base,), {field.name: create_field(field.name, field.type) for field in fields})

As long as none of the created fields conflict with a definition in base, this would have sound type system implications using the mechanisms in the proposal to express the resulting type.

I help maintain ibis, a dataframe library, and I use it every day at work. My biggest pain point with it at this point is it’s lack of typing, eg if I rename a column in the upstream database, all my queries break because they are SELECTing the stale column.

If any of the authors or champions of this pep want to chat live about what Ibis would want out of this, please reach out. Sorry, I’m not going to be more useful and actually take the time here to respond to the details of the PEP, it was too intimidating to me. I appreciate the effort you all are putting into this!

There’s been some discussion of runtime type checking, but Hypothesis has a somewhat different feature: we can take a runtime type object[1], and create arbitrary instances[2] of that type from a reasonably complete distribution[3].

Going through the proposed parts:

  • :white_check_mark: Unpack of typevars for **kwargs: easy
  • :white_check_mark: Extended Callables: will be very nice to have
  • :red_question_mark: Type booleans, Conditional types, and Unpacked comprehension
    • Supporting these would require analysis of the whole expression at runtime, e.g. to determine whether a condition can only resolve to true (or false) and resolve to either the ‘forced’ type or a union of both options. This is particularly tricky when typevars force non-local structure on the choice of ‘branch’, though it’s doable if we’re careful.
  • :cross_mark: Boolean operators: many types of practical interest don’t support runtime is-assignable checks
  • :red_question_mark: Basic operators: mostly workable, I think. We already have some tricks for protocols; in the worst case you just find every type object in memory which implements the protocol!
  • :white_check_mark: Union processing: sure
  • :red_question_mark: Object inspection
    • leaving out members which aren’t explicitly type-annotated seems pretty odd; they do exist at runtime and there’s a somewhat blurry spectrum from “not annotated” to “some arguments are annotated and others are not” to “heavy use of Any" to “completely annotated.
    • I assume we’d have to use the proposed resolver library, but then see above re: expression analysis
  • :white_check_mark: Object creation: easy to handle as for status-quo protocols and typeddicts
  • :white_check_mark: InitField: solved by the proposed resolver library, I think
  • :white_check_mark: Callable inspection and creation: nice new feature for us
  • :red_question_mark: Generic callable: [skipping this because complicated and it’s midnight here]
  • :white_check_mark: Overloaded function types: nice to have
  • :red_question_mark: Raise error: looks designed for the proposed resolver library, but Hypothesis will need to catch and respond to resolution failures at runtime. Maybe that just works?
  • :cross_mark: Update class: unclear how I can handle this at runtime, especially on classes which e.g. use __slots__.

Overall I can see how much of this would be nice to have, but it seems ‘underbaked’ to me, and in its current form would cause serious problems for Hypothesis and whatever fraction of our 5%-of-Python-users also end up exposed to these types.

It does seem like landing a subset of these changes - e.g. typevar unpacking, extended callables, and function-overload types - would be a relatively uncontroversial win in Python 3.15, and allow more time to experiment with the larger package.


  1. instances of type, or other runtime objects representing a type such as GenericAlias or various other special forms from the typing module. It’s gotten more complicated since Python 3.4. ↩︎

  2. “morally speaking”, anyway; not everything has runtime instances, and there are a couple of places where the correct co-or-contravariance is unworkable. ↩︎

  3. e.g. it’s possible to generate an instance of any element of a union type, but we won’t hand you bools if you ask for ints. ↩︎

5 Likes

I owe you a more complete response, but I’m thinking some things through still. Self is a very weird type.

We mostly limit the pep to constructing structural types, though UpdateClass allows for tweaking nominal types. I think that limiting the pep to only inspecting structural types (if that is what you meant) would destroy it.

Earlier drafts of the PEP included NewProtocolWithBases(still included under potential future expansions), which would make the new protocol a subclass of some specified bases. We wound up dropping it because we were trying to cut down on the number of new concepts introduced and didn’t want to add “protocol-with-bases”, and because we could simulate most of what we would want it for using method copying. (Which I do think is probably still true; will try to follow up on that tomorrow.)

Big fan of hypothesis; I’ve used it less than I probably should have but it was super clutch when I used it last. I didn’t realize it had a type-driven mode!

My big question here is: would you need to be able to generate values of unevaluated type operators? If there aren’t type variables in the type, then you should be able to evaluate down to a concrete type. If there are type variables, you can probably pick concrete types to substitute for them.

I understand and don’t want you to feel the need to rush on a response on this.

To get to the essence of it, there’s no problem with creating a new nominal type. The only problem is copying a type of a method, descriptor, or other such member of a type that is associated to a nominal type via Self requires the destination’s notion of Self to be compatible with that of the source.

The only sound answer that I can give at this time is that Self would need to be an intersection of the Self types that contribute to the new type. Despite that simple expression, what that actually means on bounds for possibility to copy currently seems remarkably close to NewProtocolWithBases for the upper limits of capabilities without further changes to the foundation of the type system.


Some of this is intentional limitations of python’s typesystem that I don’t think are even possible to suggest change on currently, such as typing of methods being opaque: If it wasn’t, the type system could infer the minimum necessary attributes and their types to be copied for the method’s definition to remain compatible, rather than requiring full nominal subtyping for constructing new nominal types without going the route of bases.

Some of this is more difference of languages: JS/TS and python diverge on their general object models.

Some of this is the long tail of “practical” shortcuts. If we were more principled in the typeshed about stubs needing to actually match runtime, rather than various “convenient” things done historically, we could probably safely include copying data-only members of nominal types as well, but I worry about that in practice knowing how many places the typeshed is not just imprecise, but intentionally incorrect.

I think that’s an okay summary.

This keeps having relevance and you’re one of the people who has continued to point out the problems it causes, maybe there’s a proposal worth making about an actual policy that the typeshed should never be intentionally incorrect to finally work past this?

1 Like

Are you open to contributions at this stage?
I would be glad to help, whether with implementation, bug-fixes, working through examples and theoretical questions, or adapting existing libraries type hints with that PEP

1 Like

Yes!

One really helpful thing would be to try to build something that uses the new features or adapt some existing code to do it. The mypy proof of concept now covers all the examples in the PEP, so it hopefully should be possible to do quite a bit.

I set up GitHub - msullivan/typemap-test: Demo testing repo for PEP 827 prototypes · GitHub as a quick demo of how to test something against the prototype repos.

Detailed feedback on the text/proposal is also valuable!

1 Like

Hey all, commenting as a maintainer of a documentation-purpose static and runtime analysis tool. In the PEP:

Tools that want to fully evaluate the annotations will need to either implement an evaluator or use a library for it (the PEP authors are planning to produce such a library).

Good. Providing basic type-related features in documentation tools is already hacky enough without a proper type-system / type-checker implementation (which you’ll all agree is not an easy thing to write).

For example, devs want to show the actual list of parameters in a function signature rather than **kwargs: Unpack[MyTypedDict], so the tool has to understand how unpack works, find the typed dict, iterate on the right members, transform them into parameters, etc.

Another example is when a dev writes a Yields section in their docstring: the tool has to check the return type of the parent function and handle both Iterator and Generator to find the correct type for yielded values.

Simple things that are still manageable when manipulating static types represented as structured expressions or ASTs.

This PEP has the potential to make it much, much more difficult / hacky. I cannot foresee all the things devs will ask docs-tools to support but I’m sure they’ll have many requests. I’m curious to learn more about the plans for an evaluator library, and how it would be used in both static and runtime analysis contexts (especially in the context of generating documentation from Python code).

Then the PEP follows with:

Tools that specifically rely on introspecting annotations at runtime (tools that parse Python files are obviously unaffected) […]

I’m not sure to understand why this part was added in parentheses. My static analysis tool currently expects simple ASTs for type annotations. Introducing unpacked comprehensions with conditions as possible nodes definitely has an impact on my code. But maybe I just misunderstand this sentence.

All in all, I’d just ask for more consideration for documentation-purposes static/runtime analysis tools in this PEP. Happy to chat further about it :slight_smile:

2 Likes

I wanted to share some positive feedback regarding PEP 827.

As someone who moved away from Python toward TypeScript primarily for the advanced type manipulation (like keyof, Pick, and mapped types), seeing this proposal is incredibly exciting. The lack of these features has been a major friction point in my workflow, often making Python feel less “safe” for complex data structures compared to the TS ecosystem.

If PEP 827 (or a similar implementation of these utility types) is accepted, it would be the deciding factor for me to switch back to Python as my primary language. Having this level of expressiveness would finally bridge the gap for those of us who value strict, modern static analysis.

Kudos to the authors for pushing this forward!

8 Likes

This looks great. Should make a lot of mypy plugins obsolete once we get good coverage across type checkers, in some cases using the third party library. Not least the Concat type appears to allow static validation of simple textual grammars such as a subset of valid URLs, which is something I’m looking at right now. At least you could overload on scheme prefixes, seemingly?

I mean it is unaffected in the sense that nothing needs to change in order to extract sensible and complete ASTs for the new features. Actually supporting the new features in an interesting way probably will require some work.

It’s an interesting question whether a documentation generator tool ought to display unevaluated types or expanded ones.

Probably the best approach for a docs generator that wants to do a good job with typing is to try to delegate the work to one of the static typecheckers—I think that would be a better fit than using a dynamic evaluation library, which will have various limitations discussed in the PEP.

I’m not totally sure if the type checkers provide the interfaces necessary for this to work really nicely, though? Possibly the ones that have LSP support do?

2 Likes

Oh so I now understand that the library the PEP authors plan to write will not be usable statically? Then yeah, last solution is to rely on type-checkers.

Looks like mypy and pyanalyze are candidate for direct Python use (store typing data while visiting ASTs, reveal types when introspecting?), while pyright and ty are candidate for use through LSP (spawn server, communicate with JSON-RPC, could be tedious).

EDIT: pyanalyze imports code, so not a solution for static analysis (but still interesting for introspection).