Yes, it’s ready to be a PR. It would be great if you could make a PR and cc me for review there; I’d prefer to give it a more detailed review in the form of a PR.
@carljm if you intended me to request a review on github, I don’t seem to be able to, so I’m just going to mention you here
The PEP PR says:
Immutable classes like :class:
fractions.Fraction
often do not define__init__
;
instead, they perform initialization in__new__
or classmethods. The proposed
feature won’t be useful to them.
The fact that this is incompatible with the proper way to implement immutable types at runtime is a major failing. Saying that the proposed feature “won’t be useful to them” hardly accounts for this: special casing __init__
for this is backwards.
Immutable types should generally always use __new__
. Subclasses of types that use __new__
should also use __new__
. Sometimes you can choose to use __init__
instead but in general it is not optional:
class Point1(tuple):
def __new__(cls, x=0, y=0):
obj = super().__new__(cls, [x, y])
return obj
class Point2(tuple):
def __init__(self, x=0, y=0):
super().__init__([x, y])
p1 = Point1()
print(p1) # (0, 0)
p2 = Point2() # Error
Note that Point1
is exactly how namedtuple
is implemented and it uses __new__
rather than __init__
because it must. The reason __init__
cannot be used here is precisely because tuple is immutable: the runtime won’t let you have an unitialised tuple. If you replace tuple
with list
in the code above then __init__
must be used instead.
I think the argument that __new__
is the correct place to initialize attributes that are actually read-only at runtime is strong, and it will be a problem if we don’t allow initialization of ReadOnly
attributes in __new__
.
I think this is not a problem; a type checker is free to in general disallow explicit calls to __init__
on an existing instance, and mypy already does. (I would be fine with specifying that type checkers should do so, but it’s not an issue where standardization has a lot of benefit, IMO, so I’m also happy for this to be up to individual type checkers.)
Here are the rules that I would propose. They allow a type checker to avoid the unsoundness @Eneg observed, while also allowing a different type checker to prioritize simplicity of implementation at the cost of some false negatives, if preferred:
- A type-checker must allow initialization of read-only attributes on
self
in__init__
. - A type-checker must allow initialization of read-only attributes in
__new__
, on instances of the class on which__new__
is defined, if that instance was created in__new__
via a call to a super-class__new__
method. - A type-checker may choose to allow any initialization of read-only attributes on instances of the same class in
__new__
, without regard to the origin of the instance. (This may not be sound, if an instance of the class is handled in__new__
that was not created there, but it is simpler to implement.) - A type-checker must error on any assignment to a read-only attribute that is not within
__new__
,__init__
, or in the class body.
These rules would not allow initialization of read-only attributes in classmethods other than __new__
. Alternate classmethod constructors would need to defer to __new__
or __init__
for initializing read-only attributes.
I think it’s not possible to support other classmethods soundly, without adding new explicit type annotation syntax to reflect whether an instance is fully-initialized or not. I think that would be too much complexity/weight cost for this feature.
See my example above for why it is necessary to assign in alternative (non-__new__
) constructors.
Any proposal here should answer the question: how would this apply to fractions.Fraction
?
Thanks. I had re-read the thread, but hadn’t fully processed that aspect of the example.
I think in fact this comment I made was wrong:
I think it would be possible to slightly amend my proposal above such that any classmethod is allowed to mutate readonly attributes of an instance of its own type, but only if that instance was acquired directly from a call to __new__
.
The thing that is lost here is the option for a type checker to choose a simpler implementation that doesn’t require tracking the provenance of instances in classmethods. In order to not emit false positives, such a type checker would have to allow all assignments to readonly attributes in all classmethods, which seems like it opens the door to a lot of potential false negatives.
Writing to read-only attributes should not be allowed outside of __init__
and __new__
. If it is necessary to construct a new instance in a classmethod, the read-only values should be passed to the constructor call, not assigned directly within the clasmethod as is done in the _new_raw
example above.
What @oscarbenjamin ‘s example shows is that the refactor you propose may not be compatible with having the constructor be “public”.
In languages that allow private and public constructors, of course, this is not a problem; you have a private constructor that accepts raw values, and public constructor(s) that internally call the private constructor.
But in Python, you can’t make the primary constructor (the one defined by __new__
and __init__
) private. So if you don’t want to allow external code to supply arbitrary, unvetted values for “raw” construction, you end up with something like what Fraction does. Which requires setting those attributes in a private constructor that is not __new__
or __init__
.
I think my amended proposal can accommodate that pattern, safely. But it does require some additional complexity in the type checker.
The other approach is to say that all Python classes should behave more like a plain dataclass, and their default constructor should just accept raw attribute values, with other classmethod constructors maybe providing more friendly signatures, and documentation telling external clients which to prefer. This is a reasonable way to write Python code, but I think it’s quite opinionated (and doesn’t match existing reality) to say all classes should be written this way.
It seems that I’ve been focused too much on pyright’s behavior and didn’t double check with other type checkers, but the current Motivation > Protocols
section is largely incorrect - mypy, pyre do support ClassVar
s/descriptors (see this comment)
I’ve noticed that covariant overrides of read-only attributes open a door for a type of bugs.
A subclass overriding a read-only attribute with a narrower type may incorrectly initialize it:
class Base:
foo: ReadOnly[int]
def __init__(self, foo: int) -> None:
self.foo = foo + 1 # True + 1 == 2
class Subtype(Base):
foo: bool # narrower type; lack of ReadOnly doesn't matter
def __init__(self, foo: bool) -> None:
super().__init__(foo) # is a call to super always safe?
assert isinstance(self.foo, bool)
I wonder if what I’ve observed here is just an unsoundness on the bool.__add__
side, or it could happen elsewhere
While it was always a mistake to have bool
as a subclass of int
I don’t think that there’s anything unsound about bool.__add__
there.
I’m not sure what the right way to spell this for a type checker would be but there does need to be some way for a subclass to have a narrower type for an attribute than the superclass. My interest in this proposal for read-only attributes is entirely about this and was initially motivated by SymPy where this is used extensively like:
from __future__ import annotations
from typing import Hashable, Self
class Basic:
args: tuple[Basic, ...]
def __new__(cls, *args: Basic) -> Self:
if not all(isinstance(arg, Basic) for arg in args):
raise TypeError
obj = super().__new__(cls)
obj.args = args
return obj
class Atom(Basic):
args: tuple[()]
value: Hashable
class Expr(Basic):
pass
class Boolean(Basic):
pass
class Symbol(Expr, Atom):
value: str
def __new__(cls, value: str) -> Symbol:
if not isinstance(value, str):
raise TypeError
obj = super().__new__(cls)
obj.value = value
return obj
class Integer(Expr, Atom):
value: int
def __new__(cls, value: int) -> Integer:
if not isinstance(value, int):
raise TypeError
obj = super().__new__(cls)
obj.value = value
return obj
class And(Boolean):
args: tuple[Boolean, ...]
def __new__(cls, *args: Boolean) -> Boolean:
if not all(isinstance(arg, Boolean) for arg in args):
raise TypeError
return super().__new__(cls, *args)
class Add(Expr):
args: tuple[Expr, ...]
def __new__(cls, *args: Expr) -> Expr:
if not all(isinstance(arg, Expr) for arg in args):
raise TypeError
if len(args) == 0:
return Integer(0)
if len(args) == 1:
return args[0]
else:
return super().__new__(cls, *args)
This code is rejected by mypy because it doesn’t handle __new__
properly (same issue I linked above):
$ mypy v.py
v.py:48: error: Incompatible return type for "__new__" (returns "Boolean", but must return a subtype of "And") [misc]
v.py:56: error: Incompatible return type for "__new__" (returns "Expr", but must return a subtype of "Add") [misc]
Found 2 errors in 1 file (checked 1 source file)
While pyright does understand __new__
it instead rejects the code because .args
and .value
are mutable:
$ pyright v.py
WARNING: there is a new pyright version available (v1.1.388 -> v1.1.389).
Please install the new version or set PYRIGHT_PYTHON_FORCE_VERSION to `latest`
/stuff/current/active/sympy/v.py
/stuff/current/active/sympy/v.py:26:5 - error: "value" overrides symbol of same name in class "Atom"
Variable is mutable so its type is invariant
Override type "str" is not the same as base type "Hashable" (reportIncompatibleVariableOverride)
/stuff/current/active/sympy/v.py:36:5 - error: "value" overrides symbol of same name in class "Atom"
Variable is mutable so its type is invariant
Override type "int" is not the same as base type "Hashable" (reportIncompatibleVariableOverride)
/stuff/current/active/sympy/v.py:46:5 - error: "args" overrides symbol of same name in class "Basic"
Variable is mutable so its type is invariant
Override type "tuple[Boolean, ...]" is not the same as base type "tuple[Basic, ...]" (reportIncompatibleVariableOverride)
/stuff/current/active/sympy/v.py:54:5 - error: "args" overrides symbol of same name in class "Basic"
Variable is mutable so its type is invariant
Override type "tuple[Expr, ...]" is not the same as base type "tuple[Basic, ...]" (reportIncompatibleVariableOverride)
4 errors, 0 warnings, 0 informations
Currently I don’t see a way to get pyright to accept this besides adding type: ignore
in every class. The different types like Expr
, Boolean
etc have different methods so it is necessary to be able to distinguish the fact that the args of an And
are Boolean
whereas the args of an Add
are Expr
: every method in the Add
class needs to call the Expr
methods on its args. Apparently pyright’s model of the types can only accept this if those attributes are marked as read-only somehow.
The problem with this code, that a perfect ideal type checker would report, is that the override of foo
is not initialized, because there is no assignment of a bool
to self.foo
in Subtype
(This is an example of why making sure that attributes are initialized is a difficult problem.)
The Base
class foo: int
is initialized, but if we want to override foo
to be bool
, it needs to have an assignment of a bool
to self.foo
in Subtype
It’s the same problem in this code.
class B:
pass
class C(B):
pass
class Base:
foo: ReadOnly[B]
def __init__(self) -> None:
self.foo = B()
class Subtype(Base):
foo: C # narrower type
def __init__(self) -> None:
super().__init__()
assert isinstance(self.foo, C)
There’s no initialization of the override foo: C
If you want to override something, you need an initialization of the override in the subclass.
@carljm I believe everything necessary has been filled out.
Is all that’s left to wait for the merge/PEP editors review?