Cast syntax for static typing

Zomatree · March 25, 2023, 6:20pm

Currently to cast from one type to another in typing you have to use the function typing.cast which requires a runtime cost and is not very pleasant to use. i propose a built-in syntax for this:

a as Type

this re-uses the as keyword from imports and with statements, this is better over the existing typing.cast because it can be a no-op at runtime removing the runtime cost and is a lot cleaner of a syntax, Typescript uses the same syntax for the same purpose as well1.

I have a draft PEP for this however I am unsure whether this is possible as I don’t have experience with Pythons PEG parser so I have yet to submit it.

storchaka · March 25, 2023, 6:40pm

It conflicts with existing syntax:

with a as Type:

ajoino · March 25, 2023, 7:25pm

For as much as as possible, replace := with as:

with A() as B as c as d:
    #wat

Zomatree · March 25, 2023, 8:15pm

Is there an alternative keyword or syntax we could add instead?

Rosuav · March 25, 2023, 9:10pm

Runtime type casts (as opposed to type conversions) are a code smell. Can you give a realistic example of where they’re necessary and where the cost is actually significant?

oscarbenjamin · March 25, 2023, 9:10pm

In an assignment the syntax is like:

obj: Type = ...

That syntax doesn’t work as part of an expression because colon can be used for other things like dicts e.g. {x: int} means a dict rather than a set containing an element x of type int.

The simple solution to avoid ambiguity around the meaning of the colon is just to use brackets i.e.:

result = function((obj: Type))

I think that is currently always invalid syntax so it could be repurposed.

oscarbenjamin · March 25, 2023, 9:41pm

I’m not going to dig up an example right now but just quickly I want to say that if you want to use explicit type annotations in Python and have a type-checker understand/accept what is happening then there are many legitimate situations in which you will need to use something like cast, Any, type:ignore and so on because the type checker just can’t follow what is happening to see that the types involved will be valid at runtime. The same is true in C where sometimes you have to use a union or a void pointer or a pointer cast or in rust where sometimes you need unsafe code etc. Type checkers are great but they also throw up false positives and if the checker is stringent then you probably need a way to break out of it even if that should only be used sparingly.

Of the available options cast is the cleanest because you tell the type-checker that some object really is of the type that it actually is. In other words you tell the type checker the piece of information that it failed to infer. Unlike the other options with a cast you do not disable type-checking more broadly but rather help the checker to understand one part allowing it to then check the rest. However currently cast has a runtime cost which is unacceptable in some applications (inner loops etc). Really it’s not the cast call but actually the cost of constructing the type expression:

x = func(cast(y, SomeType[tuple[T, dict[S, T]], <etc>]])

At run time this literally calls SomeType.__getitem__ etc even though the type argument to cast is ignored.

With casting syntax you could write it as

x = func(y: SomeType[<etc>])

and there would no runtime overhead because it could be compiled to the equivalent of

x = func(y)

Rosuav · March 25, 2023, 9:51pm

Agreed; it should definitely be a rare thing to need to override it at the call site. So this is something that should almost never be needed, and even then, it’ll only be important if it’s in a tight loop. That doesn’t sound like something that warrants syntax.

How much of a problem is it to have a typing override comment at call sites like this?

oscarbenjamin · March 25, 2023, 10:18pm

We could of course use comments for all typing information but many PEPs have worked in the direction of replacing type comments with dedicated syntax. Comments are particularly awkward here e.g. what would the comment syntax look like for this:

x = func(cast(g(y), S), cast(g(y), T))

Where does the comment go in order to apply to a subexpression?

The reason that cast is a callable function is so that it can be used inline as part of an expression. Unfortunately that’s a kludge because it brings runtime costs for something that should not have any runtime costs.

Rosuav · March 26, 2023, 12:34am

Oscar Benjamin:

We could of course use comments for all typing information but many PEPs have worked in the direction of replacing type comments with dedicated syntax. Comments are particularly awkward here e.g. what would the comment syntax look like for this:
x = func(cast(g(y), S), cast(g(y), T))
Where does the comment go in order to apply to a subexpression?

I’m not sure, and it’s hard to say with a contrived example, but if this is in a tight loop, my first thought would be to move S and T out to constants above the loop. You said that most of the cost was the construction of those types (since they could well be arbitrarily complicated), so moving that to a one-time cost should reduce the impact of the runtime cast.

steven.daprano · March 26, 2023, 12:57am

You should note that this was discussed only a few days ago:

https://mail.python.org/archives/list/python-ideas@python.org/message/ZP6UHIHBOIZLO63QJPYTUSFXQBEKJVV4/

Python type checkers implement gradual typing, so if type checking a particular part of your code is causing you grief, you can always just not type check that part.

Its not like static type checks find all, or even all, bugs. They don’t. Skipping a section of code generally just means you have to add a few extra unit tests to cover that section.

For this syntax to really be significant, we need the intersection of at least six conditions:

You have code that benefits from static typing.
The type checker fails to correctly track the types and needs help from a cast.
Replacing those static type checks with unit tests is not sufficient.
Your cast is inside a tight loop, or some other performance critical piece of code.
And involves complicated type expressions (otherwise the runtime cost is negigible).
Which cannot be refactored to be outside of the loop.

To have all six conditions be true is a very small niche.

Having to add syntax so we can type check tinier and tinier niches of the Python code ecology means that everyone pays the cost of the new syntax while only a tiny few people get the benefit.

So I think that, in order to justify new syntax for casts, people will have to demonstrate that these casts are significantly more useful and common than I thought.

In other words: we would need to see evidence that this is a genuine, and common, pain point in the language, not just a “Nice To Have” for a small minority of coders.

oscarbenjamin · March 26, 2023, 1:08am

This thread is going down the familiar line of arguing tangentially with the premise rather than the proposal. Some points that shouldn’t need much debate:

It is necessary to be able to use something like cast in Python’s typing.
The current support for that is deficient because it has a runtime cost.

We really don’t need to get further into the details of specific cases if we can agree those two points so let’s please not waste time arguing about whether cast is needed or whether there is any problem to be solved.

Rosuav · March 26, 2023, 1:19am

No, they DO require debate, particularly the second one. To what extent is this runtime cost significant? Allow me to offer a similar example. Decorated functions often look like this:

def deco(arg):
    def outer(f):
        @functools.wraps(f)
        def inner(*a, **kw):
            return f(*a, **kw)
        return inner
    return outer

This has a fair amount of runtime cost (multiple levels of wrapping and transformation for every call). Do we need to design a lower-overhead way of decorating functions?

And to answer that question, we need to know how much that runtime cost actually is. To argue that it needs to be changed, you have to argue that the runtime cost is significant enough to warrant a change.

My personal opinion? It’s probably not significant for decorators and it’s not significant for type casts either. I’m open to argument on that, but that argument has to show that it’s significant, not dismiss the point as irrelevant.

steven.daprano · March 26, 2023, 1:20am

Then in most cases the type expression can be refactored outside of your loop, like any other expensive expression.

You might even do:


if TYPE_CHECKING:

    T = insanely_expensive_type_expression

else:

    T = object



for y in critical_loop:

    x = function(cast(y, T))

Or easiest solution of all: just pass your expensive type expession as a string.


[steve ~]$ cat cast_test.py 

from typing import cast

def func(a: str) -> str:

    return a



value: int|str

value = 1

func(cast("str", value))



[steve ~]$ mypy cast_test.py 

Success: no issues found in 1 source file

oscarbenjamin · March 26, 2023, 1:27am

It is a nonzero cost that I have measured and decided to not to incur because I wanted something to be fast. I decided that using accurate type annotations was less important than actually having optimal runtime behaviour. The downside of that decision is that I am unable to get the full benefit of Python’s typing support. Conversely I could have chosen to prioritise having optimal type annotations a the expense of having less efficient code that consumed more CPU cycles.

It is possible to design a typing system in which this tradeoff decision does not even occur. I’m not aware of any other language where this even arises.

steven.daprano · March 26, 2023, 1:40am

Everything in Python has some runtime cost. The language philosophy is that some runtime costs are worth paying, not all runtime costs have to be moved to compile time, and if you care about shaving off every last nanosecond of overhead, Python is probably not the language for you.

Literally every single other proposal has to justify that there is a problem that needs to be solved, and that the proposed solution’s benefits outweigh the costs. Why should static typing casts be any different?

In any case, it seems that we already have an existing solution for excessive runtime cost of type expressions. Wrap them in quotes to hide the expression from the interpreter. It doesn’t completely eliminate all runtime cost, but it reduces it down to a cheap constant lookup and function call. So the question now becomes:

Is that residual cost significant enough for enough people to justify complicating the language with more type checking syntax?

oscarbenjamin · March 26, 2023, 12:25pm

I don’t intend to go through the motions of arguing for any proposal on this so I’m going to leave it here (I don’t want to commit the time/energy for a python-ideas thread).

My suggestion if anyone is interested in pursuing it is that existing assignment type annotation syntax could be extended to expressions with a requirement to wrap the expression in parentheses. Then something like this

obj = func(cast(arg, T))

could be written instead as

obj = func((arg: T))

This would then have no runtime cost when run with __future__.annotations.

I haven’t thought it through completely but as far as I know with parentheses it is not ambiguous in Python’s grammar. The syntax mirrors that of assignments and other type hints so it should hopefully be intuitive to understand:

var: T = func()

class A:
    attribute: T

    def method(self, arg: T) -> None:
        ...

result = func((arg: T))

In Rust the syntax arg as T is used but there it has a different behaviour because it actually converts the object into a different type (e.g. from int to float). I think that simply arg: T better reflects what is happening here which is that the cast has no runtime effect and is simply providing information for a type checker.

In some cases this syntax could be unambiguous without the parentheses so a variant proposal would be to figure out exactly what are the situations in which parentheses are needed and only require them only in those cases. One ambiguous case is with dict syntax and so parentheses would be needed to use this inside braces {(a: b)}. In the case of a lambda expression there could be different interpretations like whether lambda a: b: T means lambda a: (b: T) or (lambda a: b): T. Another place colons are used is in slices: array[start:(stop: int)]. Hypothetically you might end up wanting this in a default value expression:

def func(arg: T = (default: T)): -> None:
    ...

I think that is all the places that colons are used.

It is already possible to use string type annotations and it is also possible to use a comment like type: ignore to reduce the runtime cost to zero (currently my preferred option in the cases where I have needed this). Since both type comments and string annotations have been suggested as alternatives I want to say that I don’t think either of these was ever really intended to be a long-term part of Python’s typing syntax. Rather both were intended as short term kludges so that type checkers and type annotations could be developed at a faster pace than the syntax for type hints. Multiple PEPs have subsequently tried to establish enough of a syntax so that in the longer term both string annotations and type comments would not be needed. Besides import cycles they are not needed in the case of cast but only if you are prepared to pay an unnecessary runtime cost for every evaluation of the casted expression.

storchaka · March 26, 2023, 3:29pm

How often do you use cast()?

I searched in a few projects that ubiquitously use annotations, and found that cast() is used once in 2000-4000 lines of code, in one of 15-20 files.

oscarbenjamin · March 26, 2023, 5:52pm

I don’t use cast that often. In fact I just realised that I had the arguments the wrong way round in previous posts!

You only need to use it once to slow down a hot path though. Note that if you were to search my code you wouldn’t necessarily find cast itself because there are alternatives such as Any, type:ignore etc. Also how much you might need to use these things depends very much on what you are trying to do.

Someone asked for a concrete example of where you might want to use cast so here is a simplified but not unrealistic example:

from __future__ import annotations

from typing import TypeVar, Generic, Type, Hashable, cast

T = TypeVar('T', bound=Hashable)


class A(Generic[T]):

    __slots__ = ("value",)

    value: T

    # Intern all A instances in this dict so that we can use object.__eq__ and
    # object.__hash__ for fast comparison and set/dict operations even when the
    # values held by the instances might be arbitrarily complex.

    cache: dict[tuple[Type[Hashable], Hashable], Hashable] = {}

    def __new__(cls, value: T) -> A[T]:
        key = (type(value), value)
        cache = cls.cache
        try:
            return cache[key]
        except KeyError:
            obj = super().__new__(cls)
            obj.value = value
            return cache.setdefault(key, obj)


# Here a typechecker can understand the types of the values:
aint = A(1)
astr = A("a")
print(aint.value + 2)
print(astr.value + "b")

If you run mypy on this it will complain about the two return lines:

$ mypy t.py
t.py: note: In member "__new__" of class "A":
t.py:24:20: error: Incompatible return value type (got "Hashable", expected "A[T]")  [return-value]
                return cache[key]
                       ^~~~~~~~~~
t.py:28:20: error: Incompatible return value type (got "Hashable", expected "A[T]")  [return-value]
                return cache.setdefault(key, obj)
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~
Found 2 errors in 1 file (checked 1 source file)

The question is how do you type cache in such a way that a type checker can understand that if value is of type T then cache[key] will return A[T] as required by the signature of __new__. The type of value is part of the key so the type of the looked up dict value is guaranteed at runtime but the type checker can’t understand that.

You could use a cast like:

return cast(A[T], cache[key])

You can also just use type: ignore:

return cache[key] # type: ignore

The latter has no runtime cost because it is just a comment and I have tested meaningful benchmarks in which the use of cast gave a measurable slowdown.

chepner · March 26, 2023, 8:31pm

Use A for the value type of the cache.

class A(Generic[T]):

    __slots__ = ("value",)

    value: T

    # Intern all A instances in this dict so that we can use object.__eq__ and
    # object.__hash__ for fast comparison and set/dict operations even when the
    # values held by the instances might be arbitrarily complex.

    cache: dict[tuple[Type[Hashable], Hashable], A] = {}

    [...]