Expanding `ReadOnly` to normal classes & protocols

carljm · November 19, 2024, 2:16am

Yes, it’s ready to be a PR. It would be great if you could make a PR and cc me for review there; I’d prefer to give it a more detailed review in the form of a PR.

Eneg · November 19, 2024, 3:10am

peps#4127

Seems like someone beat me to the PEP number

Eneg · November 19, 2024, 4:59am

@carljm if you intended me to request a review on github, I don’t seem to be able to, so I’m just going to mention you here

oscarbenjamin · November 19, 2024, 5:14pm

The PEP PR says:

Immutable classes like :class:fractions.Fraction often do not define __init__;
instead, they perform initialization in __new__ or classmethods. The proposed
feature won’t be useful to them.

The fact that this is incompatible with the proper way to implement immutable types at runtime is a major failing. Saying that the proposed feature “won’t be useful to them” hardly accounts for this: special casing __init__ for this is backwards.

Immutable types should generally always use __new__. Subclasses of types that use __new__ should also use __new__. Sometimes you can choose to use __init__ instead but in general it is not optional:

class Point1(tuple):
    def __new__(cls, x=0, y=0):
        obj = super().__new__(cls, [x, y])
        return obj

class Point2(tuple):
    def __init__(self, x=0, y=0):
        super().__init__([x, y])

p1 = Point1()
print(p1) # (0, 0)

p2 = Point2() # Error

Note that Point1 is exactly how namedtuple is implemented and it uses __new__ rather than __init__ because it must. The reason __init__ cannot be used here is precisely because tuple is immutable: the runtime won’t let you have an unitialised tuple. If you replace tuple with list in the code above then __init__ must be used instead.

carljm · November 20, 2024, 3:50am

I think the argument that __new__ is the correct place to initialize attributes that are actually read-only at runtime is strong, and it will be a problem if we don’t allow initialization of ReadOnly attributes in __new__.

I think this is not a problem; a type checker is free to in general disallow explicit calls to __init__ on an existing instance, and mypy already does. (I would be fine with specifying that type checkers should do so, but it’s not an issue where standardization has a lot of benefit, IMO, so I’m also happy for this to be up to individual type checkers.)

Here are the rules that I would propose. They allow a type checker to avoid the unsoundness @Eneg observed, while also allowing a different type checker to prioritize simplicity of implementation at the cost of some false negatives, if preferred:

A type-checker must allow initialization of read-only attributes on self in __init__.
A type-checker must allow initialization of read-only attributes in __new__, on instances of the class on which __new__ is defined, if that instance was created in __new__ via a call to a super-class __new__ method.
A type-checker may choose to allow any initialization of read-only attributes on instances of the same class in __new__, without regard to the origin of the instance. (This may not be sound, if an instance of the class is handled in __new__ that was not created there, but it is simpler to implement.)
A type-checker must error on any assignment to a read-only attribute that is not within __new__, __init__, or in the class body.

These rules would not allow initialization of read-only attributes in classmethods other than __new__. Alternate classmethod constructors would need to defer to __new__ or __init__ for initializing read-only attributes.

I think it’s not possible to support other classmethods soundly, without adding new explicit type annotation syntax to reflect whether an instance is fully-initialized or not. I think that would be too much complexity/weight cost for this feature.

oscarbenjamin · November 20, 2024, 4:39am

See my example above for why it is necessary to assign in alternative (non-__new__) constructors.

Any proposal here should answer the question: how would this apply to fractions.Fraction?

carljm · November 20, 2024, 5:04am

Thanks. I had re-read the thread, but hadn’t fully processed that aspect of the example.

I think in fact this comment I made was wrong:

I think it would be possible to slightly amend my proposal above such that any classmethod is allowed to mutate readonly attributes of an instance of its own type, but only if that instance was acquired directly from a call to __new__.

The thing that is lost here is the option for a type checker to choose a simpler implementation that doesn’t require tracking the provenance of instances in classmethods. In order to not emit false positives, such a type checker would have to allow all assignments to readonly attributes in all classmethods, which seems like it opens the door to a lot of potential false negatives.

erictraut · November 20, 2024, 5:59am

Writing to read-only attributes should not be allowed outside of __init__ and __new__. If it is necessary to construct a new instance in a classmethod, the read-only values should be passed to the constructor call, not assigned directly within the clasmethod as is done in the _new_raw example above.

carljm · November 20, 2024, 6:19am

What @oscarbenjamin ‘s example shows is that the refactor you propose may not be compatible with having the constructor be “public”.

In languages that allow private and public constructors, of course, this is not a problem; you have a private constructor that accepts raw values, and public constructor(s) that internally call the private constructor.

But in Python, you can’t make the primary constructor (the one defined by __new__ and __init__) private. So if you don’t want to allow external code to supply arbitrary, unvetted values for “raw” construction, you end up with something like what Fraction does. Which requires setting those attributes in a private constructor that is not __new__ or __init__.

I think my amended proposal can accommodate that pattern, safely. But it does require some additional complexity in the type checker.

The other approach is to say that all Python classes should behave more like a plain dataclass, and their default constructor should just accept raw attribute values, with other classmethod constructors maybe providing more friendly signatures, and documentation telling external clients which to prefer. This is a reasonable way to write Python code, but I think it’s quite opinionated (and doesn’t match existing reality) to say all classes should be written this way.

Eneg · November 21, 2024, 8:19pm

It seems that I’ve been focused too much on pyright’s behavior and didn’t double check with other type checkers, but the current Motivation > Protocols section is largely incorrect - mypy, pyre do support ClassVars/descriptors (see this comment)

Eneg · November 24, 2024, 6:31am

I’ve noticed that covariant overrides of read-only attributes open a door for a type of bugs.
A subclass overriding a read-only attribute with a narrower type may incorrectly initialize it:

class Base:
    foo: ReadOnly[int]

    def __init__(self, foo: int) -> None:
        self.foo = foo + 1  # True + 1 == 2

class Subtype(Base):
    foo: bool  # narrower type; lack of ReadOnly doesn't matter

    def __init__(self, foo: bool) -> None:
        super().__init__(foo)  # is a call to super always safe?
        assert isinstance(self.foo, bool)

I wonder if what I’ve observed here is just an unsoundness on the bool.__add__ side, or it could happen elsewhere

oscarbenjamin · November 24, 2024, 1:06pm

While it was always a mistake to have bool as a subclass of int I don’t think that there’s anything unsound about bool.__add__ there.

I’m not sure what the right way to spell this for a type checker would be but there does need to be some way for a subclass to have a narrower type for an attribute than the superclass. My interest in this proposal for read-only attributes is entirely about this and was initially motivated by SymPy where this is used extensively like:

from __future__ import annotations

from typing import Hashable, Self

class Basic:
    args: tuple[Basic, ...]

    def __new__(cls, *args: Basic) -> Self:
        if not all(isinstance(arg, Basic) for arg in args):
            raise TypeError
        obj = super().__new__(cls)
        obj.args = args
        return obj

class Atom(Basic):
    args: tuple[()]
    value: Hashable

class Expr(Basic):
    pass

class Boolean(Basic):
    pass

class Symbol(Expr, Atom):
    value: str

    def __new__(cls, value: str) -> Symbol:
        if not isinstance(value, str):
            raise TypeError
        obj = super().__new__(cls)
        obj.value = value
        return obj

class Integer(Expr, Atom):
    value: int

    def __new__(cls, value: int) -> Integer:
        if not isinstance(value, int):
            raise TypeError
        obj = super().__new__(cls)
        obj.value = value
        return obj

class And(Boolean):
    args: tuple[Boolean, ...]

    def __new__(cls, *args: Boolean) -> Boolean:
        if not all(isinstance(arg, Boolean) for arg in args):
            raise TypeError
        return super().__new__(cls, *args)

class Add(Expr):
    args: tuple[Expr, ...]

    def __new__(cls, *args: Expr) -> Expr:
        if not all(isinstance(arg, Expr) for arg in args):
            raise TypeError
        if len(args) == 0:
            return Integer(0)
        if len(args) == 1:
            return args[0]
        else:
            return super().__new__(cls, *args)

This code is rejected by mypy because it doesn’t handle __new__ properly (same issue I linked above):

$ mypy v.py
v.py:48: error: Incompatible return type for "__new__" (returns "Boolean", but must return a subtype of "And")  [misc]
v.py:56: error: Incompatible return type for "__new__" (returns "Expr", but must return a subtype of "Add")  [misc]
Found 2 errors in 1 file (checked 1 source file)

While pyright does understand __new__ it instead rejects the code because .args and .value are mutable:

$ pyright v.py 
WARNING: there is a new pyright version available (v1.1.388 -> v1.1.389).
Please install the new version or set PYRIGHT_PYTHON_FORCE_VERSION to `latest`

/stuff/current/active/sympy/v.py
  /stuff/current/active/sympy/v.py:26:5 - error: "value" overrides symbol of same name in class "Atom"
    Variable is mutable so its type is invariant
      Override type "str" is not the same as base type "Hashable" (reportIncompatibleVariableOverride)
  /stuff/current/active/sympy/v.py:36:5 - error: "value" overrides symbol of same name in class "Atom"
    Variable is mutable so its type is invariant
      Override type "int" is not the same as base type "Hashable" (reportIncompatibleVariableOverride)
  /stuff/current/active/sympy/v.py:46:5 - error: "args" overrides symbol of same name in class "Basic"
    Variable is mutable so its type is invariant
      Override type "tuple[Boolean, ...]" is not the same as base type "tuple[Basic, ...]" (reportIncompatibleVariableOverride)
  /stuff/current/active/sympy/v.py:54:5 - error: "args" overrides symbol of same name in class "Basic"
    Variable is mutable so its type is invariant
      Override type "tuple[Expr, ...]" is not the same as base type "tuple[Basic, ...]" (reportIncompatibleVariableOverride)
4 errors, 0 warnings, 0 informations

Currently I don’t see a way to get pyright to accept this besides adding type: ignore in every class. The different types like Expr, Boolean etc have different methods so it is necessary to be able to distinguish the fact that the args of an And are Boolean whereas the args of an Add are Expr: every method in the Add class needs to call the Expr methods on its args. Apparently pyright’s model of the types can only accept this if those attributes are marked as read-only somehow.

beauxq · November 25, 2024, 6:39pm

The problem with this code, that a perfect ideal type checker would report, is that the override of foo is not initialized, because there is no assignment of a bool to self.foo in Subtype

(This is an example of why making sure that attributes are initialized is a difficult problem.)

The Base class foo: int is initialized, but if we want to override foo to be bool, it needs to have an assignment of a bool to self.foo in Subtype

It’s the same problem in this code.

class B:
    pass

class C(B):
    pass


class Base:
    foo: ReadOnly[B]

    def __init__(self) -> None:
        self.foo = B()

class Subtype(Base):
    foo: C  # narrower type

    def __init__(self) -> None:
        super().__init__()
        assert isinstance(self.foo, C)

There’s no initialization of the override foo: C
If you want to override something, you need an initialization of the override in the subclass.

Eneg · November 28, 2024, 6:52pm

@carljm I believe everything necessary has been filled out.
Is all that’s left to wait for the merge/PEP editors review?

Eneg · December 5, 2024, 5:29pm

@Jelle since the PR was merged, do I open a new thread under PEPs here?

Jelle · December 5, 2024, 5:37pm

Yes, please!

Eneg · December 5, 2024, 7:19pm

Do I open a new PR to update the discussions-to/post-history?

brettcannon · December 5, 2024, 7:20pm

Yes please!

smurfix · December 21, 2024, 4:18pm

It seems that @Eneg 's repo with the PEP draft has been deleted.

Where can we find the latest version?

Monarch · December 21, 2024, 4:40pm

See: PEP 767: Annotating Read-Only Attributes