`__orig_class__` availability

Here are some inconvenient behaviors of __orig_class__ that I’ve encountered.

1. Timing of availability

Suppose I want to type check an annotated class upon it initialization, the most intuitive thing is to add type_check(self, self.__orig_class__) to the last line of the class’s __init__ method.

However, self.__orig_class__ would not be available within either __new__ and __init__. See related threads (1) below for more discussion on this.

2. Masking of errors when setting the attribute

Suppose I want to work around it - how about listening on setattr and wait for __orig_class__ to come in?

class C[T]:
    def __setattr__(self, attr: str, value) -> None:
        super().__setattr__(attr, value)
        if attr == "__orig_class__":
            type_assert(self, value) # May throw TypeCheckError

In this example, TypeCheckError thrown from type_assert() will be masked by _BaseGenericAlias.__call__(). Here is the code to blame.

The TypeCheckError thrown inside setattr was supposed to propagate out to prevent object instantiation, not being silently masked internally.

My temporary solution is to proxy the entire _BaseGenericAlias.__call__ so I can do the trick after it returns. Here is my temporary work around - definitely not elegant.

3. Only available in Generic based objects, not builtins

class C[T]:
  pass

C[int]().__orig_class__    # __main__.C[int]

list[int]().__orig_class__ # AttributeError

I am wondering if there exist a road map in the typing module to improve or mitigate these inconveniences.


Related threads (evolving):

  1. typing/Generic orig_class availability during init in 3.6 vs 3.7 #658
  2. Runtime resolution of TypeVar

To make sense of why these issues matter to me, here is the feature I am trying to deliver in rttc (run-time type check):

from type_check import type_guard
from dataclasses import dataclass

@type_guard
@dataclass
class C[T]:
    x: T

# The following works because type_guard proxies the entire class
# Proxying will for sure break a lot of corner use cases.

C[int](1) # OK
C[str](1) # TypeCheckError: C[str].x = int(1) is not str

The 3rd problem listed in my original post prevents this feature from working consistently:

# Omit 2nd argument if object is instantiated with type args

type_check(C[int](x=1), C[int]) # True
# Can be written as
type_check(C[int](x=1)) # True

# However, this will error out:
type_check(list[int]([1])) # TypeError: list is not a parameterized generic

Proposing one possible solution to my OP

The root cause of such problem is that the typing system attempts to “inject” typing information into each type annotated object. However, this is not always feasible due to either name conflicts, attribute slots, or rejection from a custom attribute setter.

Let’s think it the other way: using a global typing database (e.g. WeakKeyDictionary) can serve the same purpose with no intrusion into the attribute space of the typed object!

Using _BaseGenericAlias.__call__() as an example:

class _BaseGenericAlias(_Final, _root=True):

    def __call__(self, *args, **kwargs):
        # [irrelevant code omitted]
        result = self.__origin__(*args, **kwargs)
        # ========== ORIGINAL ==========
        try:
            result.__orig_class__ = self
        except Exception:
            pass
        # ========== PROPOSED ==========
        global OrigClassMap         # WeakKeyDictionary[object, type]
        OrigClassMap[result] = self # Intrusive no more!
        # ==============================
        return result

The same technique could apply to any other intrusive typing attributes. This will also allow typing information to be preserved for builtins (i.e. list[T], tuple[T], set[T], etc.) as long as they are a valid object - this requires no changes to the builtins at all!

Further research show some implementation challenges:

Although WeakKeyDictionary is the closest tool in the stdlib that I can find, it cannot distinguish two objects (keys) that “equals” to each other - even if they have different ids. New infrastructure is needed to support the implementation of this idea.

WeakKeyDictionary is referenced here to denote the idea that gc should work identical to as if the properties are kept inside the objects as attributes. i.e. gc on the object also collects its type annotations.

An alternative solution would be to allow classes to explicitly request being given the orig_class as a keyword argument. This would very easily be done with a decorator:

@give_me_orig_class
class C[T]:
    def __init__(self, value, *, orig_class: TypeForm | None=None):
        self.value = value
        type_assert(self, orig_class)
        self.__orig_class__ = orig_class

give_me_orig_class would just set a single attribute on the type object - this will succeed for all non-builtin types.

Benefits:

  • reduced memory footprint compared to global dictionary or setting the attribute always.
  • orig_class is available early in creation - as soon as __new__ is being called, or actually as soon as metaclass.__call__ is being executed. Definitely early enough for all usecases.
  • No masked errors - if the class requests an orig_class but doesn’t accept it, no errors will be caused.

Drawbacks:

  • Still not available for builtins - I don’t believe this is a huge loss, but I might be wrong.
  • If we also deprecated automatically setting the __orig_class__ attribute, this is also not available unless the class author explicitly opts in - since most of the time the class itself wants to do something with the type I don’t think this is to big of an issue.
  • clutter in signatures. This is IMO the smallest issue. Documentation tools and linters can easily start ignoring this attribute in auto generate signatures and tooltips, and warn against specifying it manually. It having a fixed name also means it’s easy to google and figure out what it does (I am also open to calling the argument __orig_class__ to make it very clear that it has some magic property)

Will the decorator return a proxy of _BaseGenericAlias which injects a keyword argument for _BaseGenericAlias.__call__? If that’s the case, the behavior of such proxy might not always align with the proxied class.

In addition, similar to the workaround I wrote for rttc, the proxy must also return a new proxy upon __getitem__ is called, which might incur performance penalties at run time.

i.e.

Proxy(T)[P] => Proxy(T[P])

Another concern is that each of those magic properties will need a separate decorator:

@give_me_type_attributes
@give_me_type_parameters
@give_me_type_orig_class
@give_me_etc...
class A: ...

# Or
@give_me(attributes=True, parameters=True, orig_class=True)
class B:...

No, I described how it will work:

Thanks for clarification - I did not understand that correctly.