Need a way to type-hint attributes that is compatible with duck-typing

randolf-scholz · December 14, 2023, 8:49pm

Issue: There are many ways to realize attributes that are mutually incompatible.

When annotating an attribute foo: Foo in a class or protocol, often all I care about is that getattr(obj, "foo") succeeds and produces an instance of Foo. Now, python offers quite a few different possible ways to achieve this:

regular instance attributes
class-attributes (ClassVar)
@property
@cached_property
Custom descriptors.

These are not interchangeable when using type hints. But there should be a way to type hint foo in a parent class that is compatible with the duck-typing assumption isinstance(getattr(obj, "foo"), Foo).

See also: How to do the most fundamental duck-typing with `Protocols`? · python/typing · Discussion #1525 · GitHub

Some ideas

EDIT: In principle, it may be appropriate to define a whole range of type qualifiers Readable, ReadOnly, Writetable, WriteOnly and Mutable, each with a very concrete meaning. For details, please see this comment: Need a way to type-hint attributes that is compatible with duck-typing - #19 by randolf-scholz

Introduce a special form Attribute (and maybe also MutableAttribute) so that
```
class A:
    foo: Attribute[Foo]
```
is compatible with (1)-(5) above. (Or alternatively, make foo:Foo the generic case and foo: Attribute[Foo] the case that is specifically satisfied by a regular instance attribute only.)

Use overloading of __getattr__ with Literals (ugly imo)

class A:
     @overload
     def __getattr__(self, key: Literal["foo"]) -> Foo: ...

NeilGirdhar · December 15, 2023, 9:33am

This is a good question.

Could we extend protocol to support it?

class X(Protocol):
  foo: Foo

class A(X):
  ...

So protocols could make claims about methods that exist as well as attributes?

randolf-scholz · December 15, 2023, 10:10am

Well, this is already supported, but PEP 544 makes it clear that foo: Foo in a Protocol does specify an mutable instance attribute:

To distinguish between protocol class variables and protocol instance variables, the special ClassVar annotation should be used as specified by PEP 526. By default, protocol variables as defined above are considered readable and writable. To define a read-only protocol variable, one can use an (abstract) property.

Consequently, type checkers will flag if a sublcass implements it otherwise. Here is a compatibility chart (mypy 1.7.1 /pyright 1.1.339)

subclass\parent	attribute	classvar	property	cached_property
attribute	,	,	,	,
classvar	,	,	,	,
property	,	,	,	,
cached_property	,	,	,	,

EDIT: mypy results table was accidentally transposed… (mypy-playground)

ajoino · December 15, 2023, 10:58am

Edit: The table was updated and the text below reflects a previous version.

Considering this example

class Bar(Protocol):
    foo: Foo

class AttrBar:
    def __init__(self, foo):
         self.foo = foo

class PropBar:
    def __init__(self, foo):
         self._foo = foo

    @property
    def foo(self):
        return self._foo

It makes sense to me that both AttrBar and PropBar ducktype as a Bar, so to me pyrights behaviour in column ‘attribute’ is weird. I guess you can say that because you can’t set PropBar.foo it’s not compatible but that seems to be orthogonal in a sense? I’m not very good at type theory so it would be nice if someone could explain in detail how attributes and properties differ for a type checker.

pf_moore · December 15, 2023, 11:18am

Yeah, it’s assignability.

The problem is that “readonly” isn’t exactly part of the type - the value isn’t “readonly”, it’s the place the value is stored that has that property. So yes, it’s somewhat orthogonal. But (as far as I know - I’m far from an expert) Python’s type system can’t express the idea that an attribute (or indeed any “location”) can be readonly - leading to this sort of dilemma where you either have to be unnecessarily precise, or you can’t express what you really mean (which is “I will never write to this and the value in it has type Foo”).

randolf-scholz · December 15, 2023, 11:26am

Just to make this clear, this proposal here is not about the ability of specifying read-only variables, in fact it is about the ability of writing type hints that are agnostic about the writeability of a variable.

Daverball · December 15, 2023, 11:28am

I think this is actually intended behavior (save for maybe cached_property). property in a Protocol sort of already means “I only care about being able to read this attribute”, that’s why it’s compatible with both a regular attribute and a ClassVar downstream, since either of those will still be readable. At least as long as you don’t also define a setter in the Protocol. I’m not really sure how it behaves then.

What is a lot more frustrating to me personally is how this interacts with custom descriptors, so while this workaround for the lack of having a ReadOnly in Protocol works for some cases, it usually does not work for custom descriptors, which is where I’d really like to be able to use it.

Take for example a SQLAlchemy model vs. a NamedTuple or a dataclass. There’s no way to write a Protocol that will accept both a Mapped[T] and T when all you care about is getting a T when accessing that attribute on an instance.

randolf-scholz · December 15, 2023, 11:48am

I double-checked the results, and it appears the mypy-portion was transposed, my bad. (mypy-playground)

For the rationale of pyright disallowing overwriting property with regular attribute, see: False(?) rejection of overriding abstract property with literal · Issue #5564 · microsoft/pyright · GitHub. AFAIK there is only is (really dirty) workaround: Abstract class properties report inconsistent typing · Issue #2601 · microsoft/pyright · GitHub

pf_moore · December 15, 2023, 12:06pm

Good point. There’s two distinct cases here:

A protocol that says you’re only allowed to read from the property, but classes are considered to support the protocol even if they declare a setter for the property, and
A protocol that classes only satisfy if they prohibit writing.

I was thinking of (1), which is (I think) what you mean by agnostic, rather than (2). But “readonly” may not be the best way of describing it, I agree. I hadn’t appreciate that your question was specifically about how type checkers decide if a type satisfies the protocol.

Or am I still misunderstanding, and there’s something apart from the question of whether a class satisfies a given protocol that matters here?

randolf-scholz · December 15, 2023, 12:16pm

That’s it. The only detail that still matters is the distinction on what happens on the type vs the instance. pyright rejects overwriting a property with an attribute, because when querying the type, they will return different things. For instance,

class A(Protocol):
    @property
    def foo(self) -> int: return 42

actually makes 2 promises: if isinstance(obj, A) then isinstance(obj.foo, int) and if issubclass(typ, A) then isinstance(typ.foo, property).

What I want is the ability to write a Protocol-Class HasFoo that captures the structural type that encodes the set of all runtime values which satisfy the condition:

isinstance(obj, HasFoo) if and only if hasttr(obj, "foo") and isinstance(obj.foo, Foo)

Without any additional assumptions about the writeability of foo or what happens when trying to access foo on a type instead of an instance.

pf_moore · December 15, 2023, 12:24pm

OK, I see what you mean now, thanks.

Technically that hasattr check can’t be evaluated statically (you could define a __getattr__ that returned a foo attribute only on a Tuesday…) but I think it should be possible to come up with a check that is possible to handle statically which is close enough for all practical purposes.

Daverball · December 15, 2023, 12:35pm

For regular classes I would consider it a bug correct ^[1]. Although Protocol is a bit different in my mind, since it only needs to be structurally compatible and IIRC the mypy docs explicitly mention the use of property in a Protocol as a stand-in for a read only marker, so I don’t think the argument holds as long as you actually special-case property in Protocol. ^[2]

But there’s other reasons why we need something like a Readable anyways, e.g. to accurately represent a whole bunch of types defined using the C-API, since there it’s possible to have actual read-only attributes, that aren’t properties. So the current workaround of annotating those attributes as property isn’t fully type safe.

In ABCs it could be useful as well, if you want to be able to be more loose and change the contract with what subclasses have to implement to be considered compatible.

and mypy does incorrectly report compatibility there too ↩︎
Although this means you give up the ability to enforce the use of property through a Protocol, but that seems like a way less common use-case ↩︎

a-reich · December 15, 2023, 1:21pm

I think this seems right except FWIW there are a couple places where the type system does recognize something at least similar to this distinction. There’s the Final qualifier, which isn’t about the value of a variable but says that you can’t store something in the same name later, and similarly for @final on methods/classes. There’s also a still-under-discussion PEP to mark keys of a TypedDict as readonly, but TDs are a special case.

It’s arguably a little bit weird to have these “type qualifiers” or whatever you want to call them that aren’t describing the type of a variable/value, including other qualifiers like ClassVar or @deprecated etc., so maybe we don’t want to add tons of them for every possible situation, but it’s been done and type checkers have implemented the logic to understand those things.

grievejia · December 20, 2023, 1:16am

I think it’s easy to achieve what you want if you don’t insist that foo must be an attribute. Just make it a zero-arg method, e.g.

class A(Protocol):
  def foo(self) -> int: ...

Methods are “immutable” by nature so there’s no assignability issues. As long as a class defines a 0-arg int-returning foo() method, it will be considered an instance of A. The only thing that’s lost is that you’ll need to access the data via obj.foo() instead of obj.foo which, IMHO, is a very minor syntactical annoyance.

Daverball · December 20, 2023, 7:36am

That’s fine as a workaround if you are designing a new API from scratch, but often you try to either be backwards-compatible or want to be compatible with multiple dataclass-like objects that come from various other libraries where you don’t have control how they are going to look, they may use regular attributes, but they also may use some custom descriptor. You have no easy way to be compatible with all of them, even though it should be really easy if you only care about an attribute being readable and containing a certain type.

alicederyn · December 20, 2023, 9:01am

I am of the opinion that PEP-705’s ReadOnly annotation be extended to support this use case. It’s not in the current PEP only to keep things in standalone pieces.

Daverball · December 20, 2023, 10:13am

There’s a subtle but significant difference between ReadOnly and Readable. ReadOnly to me says the implementation is not allowed to make this attribute writeable, while Readable leaves that option open. We also have ReadableBuffer vs ReadOnlyBuffer.

A more rare use-case but maybe still significant for a Protocol in order to allow it to be contravariant would be a Writeable modifier.

randolf-scholz · December 20, 2023, 10:14am

What you are suggesting is the opposite of what’s desired here. I want to be able to write Protocols that are flexible enough to not care about implementation details (like attribute vs property). This is crucial in order to be able to type hint generic functions that can interact with classes from different libraries, without having to write tons of overloads. For example, I may have a protocol like

class SupportsShape(Protocol):
    shape: tuple[int, ...]  # note: usually not writeable.

That can be used for numpy.ndarray / pandas.DataFrame / torch.Tensor, etc. If some library decides to implement shape as a property, this Protocol suddenly doesn’t match anymore.

randolf-scholz · December 20, 2023, 10:25am

One way to think about it is that these modifiers can be translated into knowledge about the classes __getattr__ and __setattr__ methods (*):

Note: I abbreviate Literal["foo"] → "foo", otherwise the table gets too wide.

Modifier	`self.__getattr__`	`self.__setattr__`
`foo: Readable[T]`	`(name: "foo") -> T`	—
`foo: ReadOnly[T]`	`(name: "foo") -> T`	`(name: "foo", val: Never) -> None`
`foo: Writeable[T]`	—	`(name: "foo", val: T) -> None`
`foo: WriteOnly[T]`	`(name: "foo") -> Never`	`(name: "foo", val: T) -> None`
`foo: Mutable[T]`	`(name: "foo") -> T`	`(name: "foo", val: T) -> None`

(*) If Never is interpreted as the true, uninhabitable bottom type (uninhabitable means that no instances can exist, i.e. calling __setattr__ with Literal["foo"] and T is equivalent to raising an exception). It has come to my knowledge that unfortunately Never is not considered uninhabitable by python’s type-checkers, so possibly there needs to be another PEP to introduce a true bottom type that is uninhabitable.

Example of applying these principles

class A:
    foo: Readable[int]
    bar: ReadOnly[bool]
    baz: Mutable[str]

From a type-theory POV, this should be translatable to

class A:
    @overload
    def __getattr__(self, name: Literal["foo"]) -> int: ...
    @overload
    def __getattr__(self, name: Literal["bar"]) -> bool: ...
    @overload
    def __getattr__(self, name: Literal["baz"]) -> str: ...


    @overload
    def __setattr__(self, name: Literal["bar"], value: bool) -> Never: ...
    @overload
    def __setattr__(self, name: Literal["baz"], value: str) -> None: ...

EDIT: For ReadOnly, the __setattr__ might actually be better represented by (name: "foo", val: Never) -> None than (name: "foo", val: T) -> Never. This still prevents calling obj.foo = ..., but at the same time allows contravariant overriding, so that a subclass could replace a ReadOnly variable with a Mutable variable.

randolf-scholz · December 20, 2023, 11:07am

@alicederyn I wonder if this can be somehow combined with PEP 705, the only essential difference is that you want to apply these constraints on __getitem__ and __setitem__ rather than __getattr__ and __setattr__. This could be special-cased for TypedDict, I guess the way it works is that metaclasses can decide what they want to do with these annotations, so for mapping-like containers they can translate it into constraints on __getitem__ and __setitem__ instead.

I wonder if there is a possibility for something similar like dataclass_transform that allows this to be a general concept, so that type-checkers do not need to special case TypedDict as much.