Proposal: Enhancing Type Safety for `__set_name__` in Descriptors

junkmd · March 11, 2024, 3:32am

Abstract:
The current type system specification does not take into account when there is a discrepancy between the owner class statically typed to the __set_name__ method of the descriptor and the class actually assigned.
I propose a new static type constraint to address this problem.

Motivation:
In Python’s descriptor, we can implement __set_name__(self, owner, name, /).
This method is executed when the descriptor is assigned to the attribute’s owner class and receives the owner class and the name string.
The arguments of the __set_name__ method can be statically typed like other methods.
However, under the current type system specification, the static type checker does not take into account when there is a type mismatch between the class that the descriptor was assigned to the attribute and the class that was statically typed to owner.
By being able to perform this type check, it is possible to reuse the descriptor in a type-safe manner in various classes.

Summary Examples:
Define abstract classes and/or protocols before defining the descriptor.
By typing these abstract classes and subtypes to the second argument of __set_name__, we can constrain the class to which the descriptor is assigned.

from abc import ABC, abstractmethod
from typing import Protocol, Self


class MyProto(Protocol):
    def method1(self) -> str:
        ...


class MyAbstract(ABC):
    @abstractmethod
    def method2(self) -> int:
        ...


class Descriptor1:
    def __get__(self, instance, owner, /) -> Self | None:
        ...

    def __set_name__(self, owner: type[MyProto], name: str, /) -> None:
        ...


class Descriptor2:
    def __get__(self, instance, owner, /) -> Self | None:
        ...

    def __set_name__(self, owner: type[MyAbstract], name: str, /) -> None:
        ...


class ConcreteA(MyAbstract):
    def method1(self) -> str:
        return "hello from ConcreteA!"

    def method2(self) -> int:
        return 1234

    d1 = Descriptor1()  # OK
    d2 = Descriptor2()  # OK


class ConcreteB(MyAbstract):
    def method2(self) -> int:
        return 5678

    d1 = Descriptor1()  # Type checker error: `ConcreteB` is NOT a structual subtype of `MyProto`
    d2 = Descriptor2()  # OK


class ConcreteC:
    def method1(self) -> str:
        return "hello from ConcreteC!"

    d1 = Descriptor1()  # OK
    d2 = Descriptor2()  # Type checker error: `ConcreteC` is NOT a nominal subtype of `MyAbstract`


class ConcreteD:
    def method3(self) -> float:
        return 3.14

    d1 = Descriptor1()  # Type checker error: `ConcreteD` is NOT a structual subtype of `MyProto`
    d2 = Descriptor2()  # Type checker error: `ConcreteD` is NOT a nominal subtype of `MyAbstract`

Related informations:

Any opinions are welcome. I hope this proposal will be beneficial for the Python community.

MegaIng · March 11, 2024, 3:37am

Is this something that needs to be taken care of in the typing standards somewhere? Shouldn’t this just be something type checkers can implement of their own accord?

junkmd · March 11, 2024, 9:41am

This is, indeed, not something that cannot be realized without changing the language specifications like PEP 695 or PEP 604.
Therefore, it might be a feature that can be implemented if a type checker developer community or maintainer wants to implement it.
However, there might be other developers like me who use both mypy and pyright and want this feature in both.
I have also seen some discussion at the Typing Council that “the xxx type interpretation in each type checker should be unified” after each type checker implemented its own xxx type interpretation.

I would like to know the community’s opinion on whether this feature should be implemented at the discretion of each type checker as part of the type ecosystem’s flexibility, or whether a standard specification should be decided and should be introduced to many type checkers.

Thank you.

Daverball · March 11, 2024, 12:07pm

Ultimately this is part of Python’s object model, so it really doesn’t need to be part of the type specification. Since the language and its object model already specifies at which point __set_name__ gets called and with which arguments.

So I’d really consider it a type checker bug if they don’t invoke __set_name__ for class level assignments, as they already do for __get__ and __set__ for instance/class level access.

But I can also see why this is perhaps not high on the list of priorities for maintainers of type checkers, since doing anything more complex than using the name in a __set_name__ is generally frowned upon and you’d already get a type error if your __get__ was annotated correctly. So all you’re really changing is how soon you know about the error and it doesn’t help you inside stub files anyways, since you don’t know if there is an actual assignment at runtime, so you can’t emit the error there.

MegaIng · March 11, 2024, 2:22pm

In my understanding, this is generally done if different type checkers have different interpretations of how this should behave, i.e. if this is a point of divergence in what library authors need to annotate for example. But I don’t think there is much to be discussed here. All it primarily needs is someone to do the work to actually implement this. You should ask the maintainers of the type checkers you care about directly, but I don’t think either of mypy or pyrigth are going to be opposed to this if this is just a small addition.

Writing standards alone does not implement stuff in the type checkers. That still requires work done by someone, and in fact the same amount of work whether or not there is a standard to look at (assuming the expected behavior is obvious enough, which it is here). The fastest way to get this check to be added is for you to add it yourself.

Jelle · March 11, 2024, 2:47pm

I agree this doesn’t look like it requires a spec change. The spec also doesn’t say that type checkers should verify that in a for loop, the object being iterated over is iterable. That falls under the general idea that type checkers should model the runtime behavior and flag cases where they can see that something will fail. This example seems to fall under the same category.

I’d encourage you to contribute an implementation to type checkers that you are interested in.

erictraut · March 11, 2024, 3:08pm

This is a pretty obscure part of the object model. I’ve never seen it used in a code base.

Support for __set_name__ has been requested in the mypy issue tracker, but in the past four years it has received only two upvotes (thumbs-up reactions). And no one has ever requested this support in the pyright issue tracker.

junkmd · March 12, 2024, 9:17am

This has helped my understanding.
Implementations using for loops are clearly more numerous than those defining custom descriptors.
If even such things are left to the flexibility of each type checker without a static type specification to check, I fully understand that the static type specification for __set_name__ should not be added/changed.

Thank you.

junkmd · March 12, 2024, 9:28am

Thank you everyone.

I made this proposal because there was a descriptor in my project that I wanted to reuse in a type-safe manner.
Once the project I’m currently working on settles down, I’m considering proposing/contributing to the type checker I use.
Also, if there is someone who can implement this feature in any type checker on my behalf, I would be happy to cooperate.

Daverball · March 12, 2024, 9:40am

By the way, just in case this got lost along the way: If your descriptor is a true descriptor and uses at least one of __get__ /__set__/__delete__ you can currently annotate the owner/instance parameter for those methods with your Protocol. You will then get an error if you try to access/set/delete the descriptor on an object that doesn’t satisfy the Protocol.

So you get almost the same level of safety^[1] this way.

and in some cases actually more safety, since __set_name__ doesn’t take into account potentially unsafe subclassing ↩︎

junkmd · March 12, 2024, 10:01am

I understand that such errors can be raised at runtime in those situations.
I made this proposal thinking that if the type checker could blame such a codebase with static analysis before execution, it could efficiently prevent bugs.

Thank you.

junkmd · March 12, 2024, 10:18am

I misunderstood your post as referring to runtime behavior and wrote my reply based on that misunderstanding. Let me correct and add to that.

Even in static analysis, indeed, if the owner/instance is not subtype of the protocol when __get__ is called, for example, in pyright, a reportAttributeAccessIssue occurs.

However, I thought that it would be more useful if it could be discovered earlier, at the point where __set_name__ is hooked.

Thank you.

Melendowski · March 16, 2024, 1:52pm

The one code base that I thought may use __set_name__ due to the library heavily utilizing descriptors would be param and they seem to side step that hook completely and get the owner and name through a different mechanism.

github.com

holoviz/param/blob/main/param/parameterized.py#L1569


      
          def _validate(self, val):
              """Implements validation for the parameter value and attributes"""
              self._validate_value(val, self.allow_None)
          
          def _post_setter(self, obj, val):
              """Called after the parameter value has been validated and set"""
          
          def __delete__(self,obj):
              raise TypeError("Cannot delete '%s': Parameters deletion not allowed." % self.name)
          
          def _set_names(self, attrib_name):
              if None not in (self.owner, self.name) and attrib_name != self.name:
                  raise AttributeError('The {} parameter {!r} has already been '
                                       'assigned a name by the {} class, '
                                       'could not assign new name {!r}. Parameters '
                                       'may not be shared by multiple classes; '
                                       'ensure that you create a new parameter '
                                       'instance for each new class.'.format(type(self).__name__, self.name,
                                          self.owner.name, attrib_name))
              self.name = attrib_name

github.com

holoviz/param/blob/main/param/parameterized.py#L3414


      
              """
              if not docstring_describe_params or not param_pager:
                  return
              class_docstr = mcs.__doc__ if mcs.__doc__ else ''
              description = param_pager(mcs)
              mcs.__doc__ = class_docstr + '\n' + description
          
          def _initialize_parameter(mcs, param_name, param):
              # A Parameter has no way to find out the name a
              # Parameterized class has for it
              param._set_names(param_name)
              mcs.__param_inheritance(param_name, param)
          
          # Should use the official Python 2.6+ abstract base classes; see
          # https://github.com/holoviz/param/issues/84
          def __is_abstract(mcs):
              """
              Return True if the class has an attribute __abstract set to True.
              Subclasses will return False unless they themselves have
              __abstract set to true.  This mechanism allows a class to
              declare itself to be abstract (e.g. to avoid it being offered

github.com

holoviz/param/blob/main/param/parameterized.py#L3336


      
          mcs._param__private = _param__private
          mcs.__set_name(name, dict_)
          mcs._param__parameters = Parameters(mcs)
          
          # All objects (with their names) of type Parameter that are
          # defined in this class
          parameters = [(n, o) for (n, o) in dict_.items()
                        if isinstance(o, Parameter)]
          
          for param_name,param in parameters:
              mcs._initialize_parameter(param_name, param)
          
          # retrieve depends info from methods and store more conveniently
          dependers = [(n, m, m._dinfo) for (n, m) in dict_.items()
                       if hasattr(m, '_dinfo')]
          
          # Resolve dependencies of current class
          _watch = []
          for name, method, dinfo in dependers:
              watch = dinfo.get('watch', False)
              on_init = dinfo.get('on_init', False)

They do have this note

github.com

holoviz/param/blob/main/param/parameterized.py#L3544


      
          Note that instantiate is handled differently: if there is a
          parameter with the same name in one of the superclasses with
          instantiate set to True, this parameter will inherit
          instantiate=True.
          """
          # get all relevant slots (i.e. slots defined in all
          # superclasses of this parameter)
          p_type = type(param)
          slots = dict.fromkeys(p_type._all_slots_)
          
          # note for some eventual future: python 3.6+ descriptors grew
          # __set_name__, which could replace this and _set_names
          setattr(param, 'owner', mcs)
          del slots['owner']
          
          # backwards compatibility (see Composite parameter)
          if 'objtype' in slots:
              setattr(param, 'objtype', mcs)
              del slots['objtype']
          
          supers = classlist(mcs)[::-1]