Reinstantiating a generic class with a different specialization

Hi everyone, I was trying to improve a couple of type annotations in the pint project (Improve type annotations for `Quantity`'s arithmetic methods by RBerga06 · Pull Request #2303 · hgrecco/pint · GitHub) and I encountered an issue (in fact, several, but I figured I’d tackle them one at a time).

Let’s say I have a very simple generic class and I need, inside of a method, to call again the constructor to return an instance of a different specialization:

from dataclasses import dataclass

@dataclass
class Foo[T: int | float | str]:
    value: T

    def as_int(self) -> Foo[int]:
        return self.__class__(int(self.value))   #!

This already works at runtime; however, both pyright and mypy yield an error at the line marked with #!, pointing out (in my understanding) that self.__class__ gets resolved as type[Foo[T]], essentially a Callable[[T], Foo[T]]. This means that calling self.__class__ with int(…) is wrong, because int is not assignable to T.

However, at runtime self.__class__ is indeed Foo, essentially a def _[U](value: U) -> U callable[1]: the code runs correctly.

It’s worth noting that I cannot simply return Foo(int(…)) because I need this code to work with subclasses of Foo. Of course, with my -> Foo[int] annotation, the subclass information will be lost, but that’s a different issue.

Is this a problem with typeshed or a type checker issue?

Is there, to your knowledge, another way of achieving what I’m doing that is already supported by type checkers, or do I simply have to cast(…)/# type: ignore it?


  1. Here I’m using a different notation because I don’t think it’s expressible with Callable alone. ↩︎

1 Like

Isn’t it quite correctly warning you about?:

f = F("abc")
f.as_int() # ValueError: invalid literal for int() with base 10: 'abc'

Do you need to do make changes, or do a lot more that actually needs a new copy of the data class? Or can you just put an int property on the generic class:

from dataclasses import dataclass

@dataclass
class Foo[T: int | float | str]:
    value: T
    
    @property
    def int_value(self) -> int:
        if isinstance(self.value, str):
            raise NotImplementedError
        return int(self.value) 

No, that is not what it’s warning about. It gives the same error if you pass an int literal, for example.

That would be a value error, not a type error. With OP’s code as written (ie just the one class), if f.as_int() returns a value, it is guaranteed to be a Foo[int]. It’s similar to x = int(input()); even though that can ValueError, it’s still correct to infer that x is an int.

How about something like:

from dataclasses import dataclass

@dataclass
class Foo[T: int | float | str]:
    value: T

    @classmethod
    def Foo_int_from_other[S](cls, other: Foo[S]) -> F[int]:
        return cls(int(other.value)) 

This is unsafe because of subclassing. An arbitrary Foo[T] might not have a constructor that can return a Foo[int].

from dataclasses import dataclass

@dataclass
class Foo[T: int | float | str]:
    value: T

    def as_int(self) -> Foo[int]:
        return self.__class__(int(self.value))   #!

class AlwaysStr(Foo[str]):
    def __init__(self, value):
       super().__init__(str(value))

assert not isinstance(AlwaysStr("3").as_int().value, int) # uh-oh
1 Like

Yes it is unsafe - even without subclassing. But if that really is in their code, I think Riccardo’s happy to live with that.

This is a broken subclass definition, not an issue with use of __class__ or alternatively type(cls).

A smarter set of definitions would error on this subclass violating LSP, not on the use of __class__

You can’t safely subclass a specialization with the current rules that treat functions as completely opaque. If we had a way to signal that a class uses it’s unspecialized type as part of it’s behavior, this problem goes away and matches runtime appropriately.

4 Likes

This might be tricky if the existing dataclass uses “value” as both a kwarg in __init__ and a field name, but a third idea is just to coerce the input data on init (and never worry about it again):

from dataclasses import dataclass, InitVar

@dataclass
class Foo[T: int | float | str]:
    val: InitVar[T]
    value: int

    def __post_init__(self, val: T):
        try:
            self.value = int(val)
        except ValueError:  
            self.value = 0  # Or handle str val some other way

I wouldn’t reccomend changing any working runtime code to fix an issue with typing, especially if someone is starting with the problem of trying to type a widely used existing library.

1 Like

Which class/method in Pint are you hoping to type? There might be some other tricks that are applicable.

You’re probably right. But I personally do gain a lot of value when I take the hint, and recognise the typechecker is trying to tell me my code is too complicated.

If you swap the dataclass for a different generic type, e.g. list, then how should you convert a list[int | float | str] into a list[int] ?

I claim, you make a design decision and address the real problem.

You can give self an explicit non-specialized type annotation:

from dataclasses import dataclass

@dataclass
class Foo[T: int | float | str]:
    value: T

    def as_int(self: Foo) -> Foo[int]:
        return self.__class__(int(self.value))

(self: Foo[Any] also works, and may make it a bit more clear that the erasure of T is intentional.)

Like you mentioned, though, you still lose the subclass information (for that we need higher-kinded types).

1 Like

Thanks everyone for the help!

I think I need to clarify a bit more what I’m trying to achieve here and provide some context.

This is not technically a problem with pint’s exposed API, it just has to do with a couple of type checker errors I’ve seen inside several method implementations.

For anyone not familiar with the pint project, a PlainQuantity is essentially a wrapper around a value (self.magnitude: typically, a scalar or a numpy array) that also carries a physical unit; it is currently defined as a class which is generic over the wrapped value’s type.

Typically, an arithmetic operation on one (__pos__, __neg__, __abs__, etc.) or two (__sum__, __mul__, etc.) PlainQuantity instances will return another instance of the same class as self (i.e., self.__class__). However, as one might expect, the wrapped value’s type depends on the type(s) of the operand(s) in non-trivial ways, in particular it’s not necessarily the same.

As an example, __abs__() on a complex returns float, so I decided to leaverage the typing.SupportsAbs[] protocol and annotate PlainQuantity.__abs__() roughly as:

def __abs__[T](self: PlainQuantity[SupportsAbs[T]]) -> T:
    return self.__class__(abs(self.magnitude), self.units)

For the library’s users, this works without problems, it’s just that the type checker emits an error in this file as described above.

I was hoping type checkers could understand that I wanted to re-instantiate the generic class with a different parameter, but I have a feeling this kind of type transformation might become more natural for type checkers to infer once HKTs are introduced (if ever).

I think @BenjyWiener’s solution is in the right direction (thanks again!) but I don’t think it’s always viable; for now, I’ll consider casting self.__class__ to the right specialization or simply # type: ignore-ing the line.

I’m not a typing expert, anyone who is please correct or clarify any misunderstandings below.

The way I understand the problem with as_int() is that a subclass of Foo could be StrFoo(Foo[str]). When as_int() is called on this Foo.as_int() will try to pass an int as value, but StrFoo only accepts str. It is therefore unsafe for as_int() to pass int to the subclass type since it may not accept int.

You can change “-> Foo[int]” to “-> typing.Self” to indicate it returns whatever type self is. Also, I think it’s preferred to use “type(self)(…)” rather than “self._class_(…)”.

from dataclasses import dataclass
from typing import Self

@dataclass
class Foo[T: int | float | str]:
    value: T

    def as_int(self) -> Self:
        ret = type(self)(int(self.value))  # mypy error: not safe since a subclass could be a Foo[str] and not take int
        return ret

    def as_int_safe(self) -> Foo[int]:
        ret = Foo(int(self.value))  # safe because Foo[int] take int.
        return ret

class StrFoo(Foo[str]):  # subclass that is valid (passes mypy --strict) that can't take int as value.
    ...

When faced with issues like this I find it helpful to think about what a subclass could to that will make what I’m trying to do unsafe, and work backwards from there.

Yeah, that’s the part that’s backward here currently. Typecheckers should reject the subclass that can’t safely exist, not the original class that is safely using language features as designed for itself.

I’d personally go with the type ignore. It’s a case where the type system is deficient, not a case where the type system needs more information that can be provided by a cast. It also doesn’t come with any new runtime overhead for users.

5 Likes

My initial gut reaction also was that the problem here must be in the AlwaysStr subclass due to a violation of the LSP (at first glance, it seems that since Foo is a class that can accept int, float or str, then AlwaysStr must also accept int, float or str to satisfy the Liskov Substitution Principle).

However, AlwaysStr doesn’t inherit from Foo, it inherits from Foo[str] and Foo[str] only accepts str, so it’s perfectly fine for AlwaysStr to also only accept str.

In fact, the type of foo.__class__ (or type(foo)) is not “type[Foo]”, it’s “type[Foo[T]]”. And the tricky part here is understanding “which specific T” does this type[Foo[T]] refer to. It seems like you expected / wanted the following statement to be true:

foo.__class__ is of type type[Foo[T]] for any T: int | float | str

but in reality the correct statement is actually

“for any T: int | float | str, foo.__class__ is of type type[Foo[T]]

(yes, the difference between these two statements is only in the location of the “for any” quantifier).

You can directly verify that the second interpretation is correct by doing this:

foo1 = Foo("T is a string")
reveal_type(foo1.__class__) # type[Foo[str]]

foo2 = Foo(123456)
reveal_type(foo2.__class__) # type[Foo[int]]

So when you do self.__class__(int(self.value)) in as_int, self.__class__ is type[Foo[T]] for some T that was determined ahead of time (during the creation of self). So in the context of foo1.as_int, self.__class__ would be type[Foo[str]]. And you can’t construct an instance of Foo[str] using a value of type int.


I understand what you were trying to do, but unfortunately I don’t think that the current type system semantics allow describing the kind of relationship that you want to express. From the POV of the type system, Generics in python are just a fancy way to define a bunch of different classes at the same time.

Defining a generic class Vector[T] is roughly equivalent to just defining a bunch of classes: class VectorStr, class VectorInt, class VectorMyCoolType, etc (for all possible types T). When you inherit from Vector, you are actually always inheriting from some concrete variant where T is fixed. Even when you do something like

class SubVector[T](Vector[T]): ...

you aren’t defining a single class that “inherits from Vector[T]”. You are once again just defining a bunch of classes: class SubVectorStr(VectorStr), class SubVectorInt(VectorInt), class SubVectorMyCoolType(VectorMyCoolType), etc.

Of course, in this case, all of these classes actually end up being the same at runtime. But the type system doesn’t know that. In fact, the whole point of generics is to be able to distinguish these “variant” classes despite them actually being the same at runtime.

It’s not very surprising that the type system starts complaining when you try to use self.__class__ as if it was “just Foo without the T”. After all, just earlier you promised to treat the different “variants” of Foo as different classes, when you defined it as a generic class.

3 Likes

There’s a lot of circular reasoning and using what the type system does currently to justify itself here.

.__class__ at runtime isn’t a specialized type, it’s an unspecialized one. There are valid cases for using it this way that predate the static typesystem entirely.

What we need here is for use of .__class__ or type(cls) to be understood to not carry the specialization, and for use of these to also indicate that all subclasses must not restrict the valid specializations.

The static type system is only useful as a tool to help developers ensure they get the behavior they expect, when it doesn’t accurately reflect runtime, it’s just an annoyance for those with the cases it doesn’t support yet.

6 Likes

I’m not sure, if it’s fair to call this “circular reasoning”. Rather than “the type system justifying itself” it’s more “that’s how the type system was originally designed / defined to work” and then “demonstrating that it’s self-consistent”.

Like it or not, but the behavior you are observing is a direct consequence of how the current type system was supposed to work. It might be possible to change some aspects of the type system, but any such change would have to start with a pretty heavy PEP. You would have to come up with specific changes to the type system semantics that would make the AlwaysStr class invalid or change the type of x.__class__ in this particular context or something. Then, you would have to thoroughly explore all consequences of such a change, including backwards compatibility.

You can’t just say “this class should be invalid, because LSP”, because under the current type system semantics, this class doesn’t violate LSP. If you don’t like that, then you have to propose a specific set of changes to the python type system spec that would lead to the outcome that you want.

Like in almost all type systems, the goal of the python type system is to define a subset of programs that are “provably correct” (under some definitions of “provably” and “correct”). This means that there are many possible programs that don’t crash at runtime, but are rejected by the type system. This is intentional.

If you declare a: list[int] = [1, 2, 3], then the runtime type of a doesn’t change. You are still able to a.append("not an int") at runtime. In fact, that’s like the whole point of generics - to “carve out” specific subsets of “valid” values for the purposes of type checking without changing the runtime behaviour.

When you declare Foo as a generic type, you are making a specific promise to the python type checkers - specifically that you want to treat Foo[int], Foo[str] and Foo[float] as separate types for the purposes of type checking, despite the fact that they will actually be represented by the same “actual” type during runtime.

The purpose of generics is to be able to encode the specialization information into the type system. If you have foo_str = Foo("example"), but then you don’t want type(foo) to “carry the specialization”, then Foo (by definition) must not be a generic class:

@dataclass
class Foo:
    value: int | float | str

    def as_int(self) -> Foo:
        return self.__class__(int(self.value))

Of course, this would mean that foo_str.value is now also just int | float | str (and you have to check it on every use). But that’s the trade off you are forced to make:

  • Either the type of foo_str carries the necessary information to know which specialization it is (and then foo_str.value is a str, but also type(foo_str) / foo_str.__class__ is Foo[str]).

  • Or the type of foo_str doesn’t carry the specialization information (and then type(foo_str) / foo_str.__class__ is “just” Foo, but also foo_str.value is an int | float | str).

3 Likes

Or the sensical option:

@dataclass
class Foo[T: (int, float, str)]:
    value: T

    def as_int(self) -> Foo[int]:
        return self.__class__(int(self.value))

And this works just fine so long as the type system is fixed. At runtime, the reality is that self.__class__ is Foo[*] where * is a standin for some notion of “not specialized”, rather than default (of Any), and construction results in specialization to Foo[int]

I’ve called the type system “deficient” in this discussion, as well as saying as much in the post you most recently quoted by pointing out that you’re using existing type system behavior to justify the type system having an artificial limitation that both doesn’t match runtime, and isn’t caused by a limitation on what we can reason about via type theory. I can say it, because the type system isn’t actually helping users in this case due to incorrect simplifications that have been made that don’t reflect runtime, and the reality is that the subclass would be invalid with better definitions.

To be clear in where I stand on this, I’m in favor of changing the type system to match runtime better and to allow more things people have actually written, but think that the best option users have right now is slapping a type ignore on, because it’s the type system that is wrong, not their program.

I’ve brought up existential quantification as well as a more limited form covering almost exactly the case we need for this problem before as an actual proposal before, but even though it was aimed at actually solving other real issues, there wasn’t much support for it, and I’m not going to waste my time arguing for something that none of the type checker authors are willing to maintain. I’ll mention it as relevant when it comes up in other problems, and then leave it be if there’s no support for actually fixing the problems.

1 Like