Reinstantiating a generic class with a different specialization

This is only true if subclassing specialized generics is disallowed, which is obviously not an option as a global change[1]. In fact, calling self.__class__ at all is unsafe unless the class or its __init__ is marked @final.

For your use case, we would need a way to disallow specialized subclasses on a per-class basis. It’s not immediately clear to me how useful this feature would actually be or what the semantics should be, but I think it’s worth a discussion.


  1. collections.abc is built on this feature ↩︎

2 Likes

It seems to me, that you didn’t understand my explanation (or I am not understanding what you are trying to say), because in the context of this discussion, there is basically no difference between T: int | float | str and T: (int, float, str).

It doesn’t matter, if T is allowed to be a subclass of int / float / str or only those specific types. The problem of AlwaysStr applies to either case.

If you want to have a productive discussion about this, you need to accept that the current semantics of the type system are not broken, wrong or incorrect (or otherwise require “fixing”).

This isn’t a matter of “the type system doesn’t work like it’s supposed to”, it’s a matter of “it’s currently impossible to express the specific semantics that you want your class Foo to have in terms of the type system”. You keep trying to express this kind of relationship in terms of Generics, but your desired semantics for Foo are not compatible with the kind of semantics that Generics were built to represent.

This is not circular reasoning or an appeal to “this is how it is, therefore it should be that way”. I’m just trying to explain that this is an intentional design choice, not an accidental mistake.

This isn’t a “limitation”. It’s an intentional restriction. Once again, consider the example of a: list[int] = [1, 2, 3]. In this case, the type system is intentionally restricting the set of valid values of a compared to what is actually possible at runtime.

By declaring a generic type Foo[T: ...], you are explicitly asking the type system to introduce additional restrictions to what should be considered a “valid” use of Foo. One of those restrictions is that type(foo_str) is Foo[str]. And Foo[str] is treated like it’s a separate static type (aka - a set of potential runtime values, where Foo.value is restricted to be str).

“Unspecialized Foo” isn’t even a static type, that you can “name” under the current semantics. In the context of static typing you always have to clarify, which Foo[something] you are talking about (it’s technically possible to drop the [something] in some contexts, but in that case you get Foo[Any] - which is “some statically unknown specialization of Foo”, not an “unspecialized Foo”).

Once again - this is an intentional feature, not a bug. The semantics that you are proposing would break like 99% of collections.abc and similar use cases (which were basically the very reason, why generics were introduced in the first place).

class Container[T: int | float | str]:
    @abstractmethod
    def __contains__(self, x: T, /) -> bool: ...

    def check(self) -> None:
        # note the T`1 below, it's not just "some T" it's
        # "the same T" in every case, this is important!
        x: T
        reveal_type(x) # T`1
        reveal_type(self) # Container[T`1]
        reveal_type(self.__class__) # type[Container[T`1]]

        # This is valid:
        is_type_Container_int: type[Container[T]] = self.__class__

        # This is not valid:
        # Incompatible types in assignment
        # expression has type "type[Container[T]]",
        # variable has type "type[Container[int]]"
        not_type_Container_int: type[Container[int]] = self.__class__

class MyContainerForOnlyStrings(Container[str]):
    # this implementation doesn't have to cover T: int | float

    # this is NOT a violation of the LSP, because Container[str]
    # is a separate type, distinct from other Container variants

    def __contains__(self, x: str, /) -> bool:
        reveal_type(self.__class__) # type[MyContainerForOnlyStrings]

        # This is valid:
        is_type_Container_str: type[Container[str]] = self.__class__

        # This is not valid:
        # Incompatible types in assignment
        # expression has type "type[MyContainerForOnlyStrings]",
        # variable has type "type[Container[int]]"
        not_type_Container_int: type[Container[int]] = self.__class__

        return True

There might be cases, where it could be useful to be able to express some other semantics, but those semantics would have to be implemented in such a way that doesn’t change the semantics of Generics as they were originally defined.

2 Likes

Your example here isn’t good. It just highlights that the current semantics are wrong. Neither annotation for self.__class__ you provided is accurate, and the current semantics are just making some cases work while others don’t.

If we can’t fix things that are wrong in the type system, what are people who are currently having cases that work at runtime meant to do? This isn’t some dynamic behavior, it’s entirely something the type system should be able to deal with statically.

2 Likes

You’re still using what the type system currently does to justify itself, rather than reflecting runtime accurately here.

The type of .__class__ isn’t a specialized generic. It’s a type object that has yet to be specialized. Either annotation you provided should be allowed as a developer expressed constraint, but neither is an exact match, both are narrower.

class X[T]:
    ...

    def foo(self, ...) -> ...:
        x = self.__class__

In this example for instances of X and not of subclasses, x is X at runtime. It isn’t type[Self], as that implies having the same specialization as the instance it was called from, but it has not yet bound to any particular specialization, nor is it type[X[Any]], as that implies that it has already been bound to some gradual unknown, rather than the reality that it has yet to be bound at all.

Nothing about this requires breaking any existing valid annotation to fix, it would only remedy both some false positives and some false negatives, so I’m also unsure why you believe it’s impossible to change this.

What needs to happen here is a way to mark classes that use the behavior as well as where they use that behavior. It’s fine to have a subclass of a specialized type, but in the case of relying on .__class__, it comes with additional considerations that need handling. It needs to remain true that all inputs are compatible with that use and result in a subtype of the specialized type, or that the involved location is overridden in the subtype to be compatible if it would not be automatically.

1 Like

To be slightly more specific on the impact, the type system understanding correctly the behavior of ways to go from an instance to a type object fixes false positives like that that the OP came across.

The ability to mark specific functions as relying on certain behavior of the type object in specific locations allows preventing false negatives that already exist with such patterns where the use isn’t correct, but is hidden and also by current rules, opaque to someone subclassing a library type.

1 Like

Please, explain what is “wrong” with the current semantics:

  1. Keep in mind that “typing.Generics don’t work like I expected” doesn’t mean that the semantics are wrong.

    If you want to convince me that the current semantics are “wrong” (in the sense that they are implemented incorrectly and/or disagree with the formal specification), please point me to the relevant lines in one of the typing PEPs or a piece of documentation on typing.python.org or docs.python.org.

  2. If you mean that the semantics are “wrong” (in the sense that they were designed incorrectly, so not only the current behavior is wrong, but also the spec/documentation is also wrong/incomplete), then you need to clearly articulate, why do you think that it’s wrong.

    All of the typing specs were extensively discussed and eventually accepted. It’s possible that everyone just missed some crucial detail when designing some aspect of the type system, but such a claim needs extraordinary proof.

  3. And if by “the semantics are wrong” you meant “it’s currently impossible to represent my use case using Generics”, then I don’t think that’s a very productive way to frame this discussion. It’s still possible that your use case is worth supporting, let’s just not jump straight to “typing.Generics semantics are wrong, we need to change them”.

Either way – if you want anything to be done about this issue, you need to make a concrete proposal (which would need to include at the very least: a formal specification of what exactly are you proposing to change, your rationale and alternatives that you’ve considered and rejected and a thorough backwards compatibility analysis).

It seems to me that you are operating under the assumption that static type hints and runtime behavior should match exactly 1 to 1. This is not the case, this has never been the case and this was never the goal of static typing.

Python uses dynamic typing. Static type hints are by definition distinct from actual runtime types. In fact, static type hints are neither a subset, nor a superset of the dynamic types that are actually possible at runtime.

There are cases, where static typing is intentionally more “permissive” compared to actual runtime behavior. The obvious examples are Any / missing annotations / gradual typing, but in general the static type system can’t always fully represent some complicated relationships / invariants that are actually always upheld at runtime.

More relevant to this discussion - there are also cases, where the static typing is intentionally stricter compared to actual runtime behavior. In fact, this is almost always the case. You can do almost anything you want at runtime (pass arguments of any type to any function, arbitrarily change the types of variables / attributes “on the fly”, etc and sometimes this might not even lead to an exception).

The whole purpose of the static type hint system is to empower programmers to express and document further additional restrictions that might not be present at runtime. I’ll repeat this once again – the purpose of typing.Generic is to express a certain kind of additional restriction that doesn’t exist at runtime.

So “these semantics don’t accurately reflect runtime” is NOT a good argument. During runtime, this

foo: Foo[int] = Foo(1)
foo.value = "str" # <- works just fine at runtime

works perfectly fine. So the fact that static type checkers reject this code also “doesn’t accurately reflect runtime”.

It’s a bit hard to have a productive debate, since you haven’t actually written down an exact specification for what you are proposing, but I’ll assume that your proposal is essentially this (slightly reworded by me):

The problem with this proposal is that it implicitly breaks almost all of collections.abc (and most other uses of Generics together with abstract base classes or inheritance).

If you make a Generic Foo “unspecialized by default” then you have two possible options:

  1. You make it so that all subclasses of Foo must keep the exact same type bounds on all type parameters of Foo. This means that you can’t inherit from a specialization, only from the “unspecialized Foo”.

    This breaks collections.abc because, for example, you can no longer implement a container class that only works with integers and make it inherit from Sequence[int], all subclasses of Sequence are forced to be generic with respect to their element type.

  2. You allow inheriting from specializations. This means that a subclass of Foo can have “tighter” type bounds than Foo itself. This leads to unsound behavior (as demonstarted by the AlwaysStr example).

    In the case of AlwaysStr, the self.__class__ in Foo.as_int must be some subclass of Foo[*], but AlwaysStr isn’t a subclass of Foo[*], it’s a subclass of Foo[str] (and Foo[str] isn’t a subclass of Foo[*] either).

Sorry, this sentence isn’t very clear. Mark classes that use what behavior? Mark them how?

Also, did you mean that your proposed change in behavior should be “opt-in” only? So the current semantics of typing.Generic would stay unchanged (and self.__class__ would refer to the “specialized” Foo[T], not Foo[*]) by default, and your Foo[*] behavior only comes into play, when the original author of Foo marks the class in some way?

My whole issue was with the asserion that the current Generic semantics are “wrong” and “should be changed”. If what you are proposing is an “opt-in” addition to the current semantics, then I have no issue with that.

Although, I would recommend making the “opt-in” behavior per-type-parameter rather than per-class. So something like

@dataclass
class Foo[T: InstanceTypeParameter[int | float | str]]:
    value: T

    def as_int(self) -> Foo[int]:
        return self.__class__(int(self.value)) # ok

foo_str: Foo[str] = Foo("1") # ok
foo_int: Foo[int] = foo_str.as_int() # ok

# open question: how do you "spell" Foo[*] in code?
cls_foo: type[Foo[Unspecialized]] = foo_int.__class__ # ok
typ_foo: type[Foo[Unspecialized]] = type(foo_int) # ok

# this is not okay: inheriting from a specialized version of the class
class AlwaysStr(Foo[str]): ...
# this is not okay: inheriting with narrower type bounds
class NarrowerT[T: float | int](Foo[T]): ...

# open question: is this ok?
cls_foo_float: type[Foo[float]] = cls_foo # Foo[*] -> Foo[float] narrowing

# open question: how should "proper" subclasses be defined?
class ValidFooSubclass[T: InstanceTypeParameter[int | float | str]](Foo[T]):
    ...
# this is probably the most "correct" option, but it's a bit verbose
# and requires repeating the exact type bounds of T

The exact naming / syntax is up to bikeshedding. I called it InstanceTypeParameter, because it’s a type parameter that only applies to instances, not to the class type itself. Alternatively, it could be called ErasedTypeParameter or something else entirely (although, I think “type erasure” normally refers to the exact opposite behavior - when the type information is erased from the instance, not from the class, so maybe this isn’t a good name).

I don’t think we should add a new syntax like Foo[*] to the language for something so niche.

2 Likes

You’re making assumptions I haven’t presented, nor would I present.

In an annotation context, we can’t change the meaning of a bare type object. That remains whatever the explicit default parameterization is, or Any in the absence of a default. We can make the type system understand that accessing the type object without having explicitly specializing it has left it unspecialized in a value context.

I also used T[*] as a standin, explictly so. I’m not suggesting new syntax. A real proposal here would involve a new special form in the typing module. As far as implementation goes, it could literally just be a module level constant, the only purpose of it is a marker for cases where explicitly marking the choice to keep it unspecialized until later, rather than implict cases.

In a gradual type system, false positives indicate that something has been done incorrectly. Some of those may be intentional, but when valid code that is not runtime dynamic, and is possible to reason about statically is rejected due to a simplification that exists in the type system, it’s the type system that is wrong.

The inability to accurately express something statically is different from your counter example of intentionally using a different parameter than what was expressed statically.

You may prefer the other description I used, calling the type system “deficient”, rather than “wrong”, but my frame of reference on this is that the type system is only a tool to help developers ensure the behavior and interfaces they express are checked. When something like this is inexpressible due to an incorrect simplification in the type system, the type system is wrong from that perspective.

The use of a type object obtained via type(self) or self.__class__ from within an instance method isn’t always safe to inherit. Some cases involving subclasses of a specialized type would require a new implementation. This remains true even if type constructors were not exempt from LSP checks. Because the type system erases information at function boundaries, some false negatives exist that should be rejected by typecheckers. Being able to mark such functions such that subtypes must uphold the same assumptions made about the type object is tricky to implement the logic for, but the actual implementation of such a thing would just be something like a simple decorator.

1 Like

I literally included a disclaimer about not being able to discuss things without a concrete proposal. I genuinely tried my best to make a good faith guess about what you might have meant from context and your other comments.

If you want to discuss “what you have presented” – consider actually presenting something, instead of just constantly repeating that the current semantics are “wrong” without addressing any of my points.

I have no idea, what you mean by this, nor how this is relevant to the discussion. I almost started writing a response based on my best guess about what you could have meant by “bare type object” and why we “can’t change it’s meaning” in this case, but caught myself.

The “I don’t think we should add a new syntax” thing was more of an afterthought about my InstanceTypeParameter proposal, not about whatever you said. Please, address any of the concerns from my comment that were actually explaining why I think that what you are suggesting is not possible (or at the very least – conflicting with existing very well established expectations such as collections.abc).

I disagree. I explained, why I disagree. I don’t think that this is possible to do without changing the semantics in a backwards-incompatible way that would break a lot of code (including, but not limited to collections.abc).

If you think that this is possible – write a detailed specification explaining what changes to the current type system are required to make this work and why won’t those changes cause backward incompatibility issues for existing code.

My claim is that this is not a false positive. The type system is correctly prohibiting you from using self.__class__ like it was “unspecialized Foo”, because self.__class__ might be a valid subclass of Foo[T] such as AlwaysStr. In that case, using self.__class__ like the OP is trying to do would be unsound.

When you define class Foo[T: int | float | str]: ..., you are statically expressing that Foo defines a set of distinct classes that share the same implementation, but have different types - Foo[int], Foo[str], Foo[InheritsFromStr], etc.

The type system then uses this declaration to statically prove that the code you provided as a “shared implementation” for those classes would be type safe, no matter which T is used (within the specified bounds, of course). The type system then correctly rejects

    def as_int(self) -> Foo[int]:
        return self.__class__(int(self.value))   #!

because it knows that somebody could (at a later point in time, in another file or library) inherit from Foo and then self.__class__ could be some subclass of Foo like AlwaysStr that only inherited from Foo[str]. And so attempting to pass an int to the constructor of that class would be unsound.

And (once again) you can’t forbid classes such as AlwaysStr, because classes exactly like that are one of the primary use cases for generic classes in the first place (with a notable example being collections.abc).


As I already mentioned, I don’t think we can have a productive discussion about this issue until we have a concrete proposal to discuss. I take it that you aren’t satisfied with my “opt-in” InstanceTypeParameter compromise, so unless / until you (or somebody else) can actually bring forward their own proposal, I am not going to be engaging with this thread any further.

2 Likes

It’s not the use of __class__ here that’s unsound, but a subtype that would be incompatible with this that is unsound.

Something closer to what I’m envisioning here is that such methods are marked:

    @typing.uses_type_object
    def as_int(self) -> Foo[int]:
        return self.__class__(int(self.value))

Rather than rejecting all subclasses of specialized types, this now requires that any subclass of a specialized type would need to provide it’s own valid implementation of marked methods or be rejected. We can’t rely on this being inferred because by design and also with the presence of stubs, the interior of functions is a black box to the outside world.

As for “Not breaking existing use”, any actually valid use of __class__ that isn’t specialized identically is currently not expressible, and would change to becoming expressible (type ignore removal), and the only cases that need such marking are currently also disallowed because of type checker behavior.

A special form for explicitly expressing keeping it unspecialized until later is possible, we can call it typing.TBD for now, the name isn’t important to discussing why it would work.

With this, accessing the type object from an instance of T at runtime (eg. type(self) or self.__class__) results in type[T[typing.TBD]] unless explicitly annotated as something else. When annotated this way, binding to a specific specialization happens using the same rules as if using T directly. All use still needs to be consistent with the surrounding annotations.

With this:


class X[T: (int, str)]:
    def __init__(self, value: T):
        self.value: T = value

    @typing.uses_type_object
    def as_int(self) -> X[int]:
        x = self.__class__  # no error
        reveal_type(x)  # type[X[typing.TBD]]
        ret = x(int(self.value)) 
        reveal_type(ret)  # X[int]
        return ret  # still no error

class ErrorInCorrectPlace(X[str]):  # some error saying that the type object is used in a way that this subclass isn't safe without a replaced implementation of as_int.
     ...

class CorrectedSubclass(X[str]):
    def as_int(self) -> X[int]:
        return X(int(self.value))  # or implement some sibling subclass and use it
1 Like

This seems to be the main point of disagreement between us. I still think that the issue lies with the use of __class__ itself, not with the subtypes. To be clear, the actual “problem” becomes apparent only when those 2 things happen at the same time, but I think that “blaming” the subclasses here is wrong.

Yes, and this is also one of the reasons, why I’m blaming the implementation of Foo.as_int itself, rather than the subclasses. The whole idea of static typing and “soundness” is based on the fact that you should be able to unambiguously identify and eliminate certain classes of errors simply by making each class / function declare a “contract” (aka its “public API”) and then making sure that all other code only uses functionality that is provably safe (based only on this contract).

Looking only at the public API of Foo, it’s impossible to conclude that AlwaysStr is doing something wrong. So to me, the issue in this case is that the author of Foo.as_int “expected” Foo needed an additional guarantee – that all subclasses of Foo must also be generic (with the same type bounds).

Following this logic, the “problem” here is caused by the fact that Foo.as_int tries to do something that is not guaranteed to be correct by any “contract” (or that the Foo class itself doesn’t declare this contract in the first place). Either way, the AlwaysStr class is blameless here.

The “contract” that is missing from the declaration of Foo is something along the lines of “all subclasses of Foo must be generic with the same type bounds as Foo”. It seems that you are automatically assuming that this contract should be “implicitly” present on any Generic class. But as I mentioned multiple times – Generic types in python were designed under the assumption that inheriting from a specialization of a generic class it totally fine.

It seems obvious to me that the “correct” way to fix your use case is to add a way for Foo to declare this “missing” contract. Your @typing.uses_type_object decorator kind of does this, but in a really weird way – by annotating the function that ends up using the contract instead of annotating the class (or generic type parameter) that we want to declare this contract.

And this isn’t just a stylistic difference. For example, how would your annotation help if I wanted to do something like this:

@dataclass
class Foo[T: int | float | str]:
    value: T

def foo_str_as_int(foo: Foo[str]) -> Foo[int]:
    return foo.__class__(int(foo.value))   #!

You keep mentioning self.__class__ and type(self) specifically, but there’s really nothing special about self and methods here. The problem is exactly the same with a freestanding function as it is with a method.


Another issue with the function-level / class-level decorator approach. Consider the following:

@dataclass
class Foo[X: int | float | str, Y]:
    x: X
    y: Y

    def as_int(self) -> Foo[int, Y]:
        x = int(self.x)
        y = self.y
        return self.__class__(x, y)

# this is OK:
class AlwaysStrX[X: int | float | str](Foo[X, str]): ...

# this is not allowed:
class AlwaysStrY[Y](Foo[str, Y]): ...

In this case, I want the X type parameter to follow the “every subclass must also keep the same type bounds” rules, but the Y parameter must keep the current semantics. Or in other words, I want the type of self.__class__ to be Foo[*, Y], not Foo[*, *]. This kind of constraint is impossible to represent just by using a decorator / annotation on Foo.as_int or Foo as a whole.

I have no problem with this part. I called it Unspecialized in my proposal and I don’t particularly care about the exact name.


P.S. Thank you for providing an actual proposal. I really think that it’s much better to discuss actual specification / implementation ideas rather than arguing about hypotheticals.

1 Like

Right. This is for visibility into what methods are involved, and therefore need to be handled in subtypes. It’s also possible to detect if the decorator is needed for a reasonable diagnostic to be provided on both sides of the API boundary if something isn’t right between a library type and a user subtype. If the type object is only ever used in a way that always retains original parameterization, it’s unneeded.

Your example does show some level of needing to indicate more than just this to allow use of __class__ to result in remaining unspecialized while used outside of the class’s definitions.

Even if not extended further, and the extended knowledge is only for going from instance to type object within definitions belonging to the type, it would still greatly improve what can be expressed, especially for not reimplementing various dunders in every subtype (__add__, __replace__, for example)

It could be extended further by having a means of describing the contracts involved in more depth, but that likely ends up requiring arbitrary rank types be supported in full.

Thank you for pointing out that what was said wasn’t clear enough without a demonstrated proposal. Still working on the balance between walls of text and leaving out something that seems obvious at first, but on reflection, only is from some perspectives.

You seem to be focused on methods, but there is no fundamental reason, why methods would deserve special handling in the first place. Again, thinking in terms of contracts, class methods are given some extra capabilities compared to standalone functions, but those capabilities are almost exclusively related to visibility (ie access to private fields), not to type deduction (ie, I see no reason why self.__class__ and foo.__class__ should behave differently).

This isn’t the first time you mentioned reimplementing the affected methods in subclasses, but I don’t think I’ve seen a concrete explanation of how is this supposed to work. For example, the OPs Quantity wrappers don’t seem to require this kind of “reimplementation” for every subtype.

In fact, I was operating under the assumption, that the “contract” that is missing from Foo that we actually want to express – is precisely that all subtypes of Foo must be subtypes of the “whole unspecialized Foo”. So for such classes no “reimplementation” would be needed.

Can you provide a “practical” example, where “you can break the contract, but have to reimplement every method that used .__class__” would actually be useful / required?

At first glance, it seems to me that this is a completely separate and much harder issue to solve (because “users” of the contract don’t have to be restricted to only the methods on the class, and there is no way to “fix” outside consumers of the contract using method overwriting). Of course, in the general case this is completely unsolvable since contracts are supposed to be “one sided” (you declare that your class will adhere to some invariants and then everyone can rely on those invariants).

1 Like