Expanding `ReadOnly` to normal classes & protocols

beauxq · October 18, 2024, 6:22pm

That sounds ok.

Making sure that something is initialized is a hard problem. TypeScript has been trying to do it for a long time, and its system still has some significant holes. Python has not really been trying to do it.

TypeScript at least has an annotation to assert that something is initialized when the type checker doesn’t see that it is. (“!”) It seems better if Python doesn’t try to do it until it has something like that.

Eneg · October 18, 2024, 6:30pm

I wouldn’t say they disagree. Fwiw I haven’t exactly mentioned that an instance var of a metaclass becomes literally a ClassVar of a class. From the pyright’s docs it stems that becoming a ClassVar would imply the “pure class variable”, meanwhile the existing behavior implies “normal class variable”. I don’t think this conflicts with what I’ve said before.

beauxq · October 18, 2024, 6:40pm

If that doesn’t conflict with what you said, then what you said doesn’t fit the use case that it seemed you were talking about.

TeamSpen210 · October 18, 2024, 8:06pm

The initialisation problem doesn’t seem complex to me. The variable should be initialised, unless the class is an ABC or protocol. If it is one of those two, the attribute should be considered abstract, making subclasses abstract/not instantiatable if not assigned.

In regards to the mutable thing, it can be a mistake yes, but also exactly what you want (a cache for example). Using another ABC in the superclass means that the superclass is restricted to that API, while the child can use implementation-specific functionality.

In my own code I’ve used this a fair bit, mainly with booleans. In that case it does work as a mutable classvar, but this would allow the superclass to promise and enforce that it doesn’t change the value.

erictraut · October 20, 2024, 11:23pm

@Eneg, thanks for spearheading this PEP. This is great progress!

I think the latest draft is on the right track, but I see a number of behaviors that are unspecified or appear to be inconsistent with existing type system features.

The Final type qualifier allows users to omit the type argument and rely on the type checker’s type inference rules to determine the symbol’s type. Is this the case with ReadOnly as well? One could make arguments in both directions here. If we want ReadOnly to act more like Final, then it should probably support inference. On the other hand, PEP 705, which introduced ReadOnly, says that a type argument must be provided. But that’s probably because types are always required for TypedDict items.
The current draft indicates that "the set of initializing methods consists of __new__ and __init__. I don’t think __new__ should be included here. The __new__ method is not the right place to be initializing either class variables or instance variables, IMO. Neither mypy nor pyright support initialization of Final variables in __new__ methods for this reason, nor does the typing spec allow it. It’s also not clear to me know this would interact with the existing complex type checking rules that apply to constructors.
The current draft indicates that “type checkers may permit assignment in additional special methods”, but it doesn’t specify which ones. I think the spec needs to be clear here. We don’t want divergent behaviors between type checkers. My preference is that we limit it to __init__ only, which is consistent with the specification for Final. If we want to expand the list to include __post_init__, then I think we should do the same for Final.
I’m not convinced that we should allow double initialization — in both the class body and in __init__. Is there a good reason to allow both initialization techniques within the same class? Interestingly, mypy allows this for Final but pyright does not. The typing spec is currently silent on whether this should be allowed. Regardless of where we land on this decision, it would be good for this PEP to provide clarity for both ReadOnly and Final.
There’s currently an ambiguity in the typing spec about whether Final can be used for attributes in protocols. Mypy and pyright currently disagree on this point. I don’t have a strong opinion either way, but I think this PEP is the right place to clarify this behavior and eliminate the ambiguity. Perhaps supporting Final in protocols isn’t important once ReadOnly is supported?
I think it’s important for the spec to clarify the variance implications for a ReadOnly attribute. In particular, such attributes should be treated as covariant. Contrast that with writable attributes, which are invariant. This has implications for variance inference as well as override behaviors. A subclass that overrides a ReadOnly attribute can use a narrower type for the attribute (i.e. a type that is assignable to the type specified in the parent class), whereas a subclass that overrides a writable attribute must use a type that is consistent with the parent class.
The spec doesn’t specify whether a ReadOnly attribute can be overridden by a non-ReadOnly attribute in a child class. Unless I’m mistaken, this should be type safe. I realize that the typing spec is currently rather light on specifying override behaviors (it’s an area that requires more work), but PEP 705 provided good guidance here for ReadOnly items within TypedDicts, and I’d prefer to see the same level of clarity in this PEP.
The spec doesn’t say anything about the deletability of ReadOnly attributes. Normal attributes can be deleted. The typing spec isn’t clear on whether Final implies non-deletability. Currently, pyright enforces non-deletability for Final attributes but not for Final variables (which is arguably inconsistent). Mypy never enforces non-deletability for Final symbols. I’d like to see this PEP clarify the intended behavior for both Final and ReadOnly in this regard. I presume that we’d want all Final and ReadOnly symbols to be non-deletable?
The spec implies that ReadOnly can be used only for “attributes” — class variables and instance variables. In fact, the term “attribute” features prominently in the title of the PEP. What about global and local variables? Is there a reason why ReadOnly would not be supported for these? Should ReadOnly be allowed anywhere that Final is currently allowed? I guess I don’t have a strong opinion here, but if the design deviates from Final in this regard, I think the PEP should justify this decision.
The spec mentions a few cases — notably frozen dataclasses and NamedTuples — where attributes are implicitly ReadOnly. Is it permitted to use ReadOnly in these cases? I presume the answer is yes — and that it has no effect, since these attributes were already implicitly ReadOnly.

Those are the questions that came to mind during my initial reading. I anticipate that we will uncover additional questions when doing a reference implementation.

beauxq · October 21, 2024, 1:49am

I think it’s common for the entire initialization process of a single variable to involve multiple assignments.
It often starts with a single assignment (kind of seen as a default), and then a (possibly complex - not a single if statement) checking of a condition to see if it should be something other than the default.

This is why C# has good reason that it allows multiple assignments to readonly fields.

This makes even more sense in Python for an instance attribute, because the default can be in the class scope, and then take up less memory for those instances on which the default applies.

A significant reason why Final should differ from this is that Final is specified to imply ClassVar (in non-dataclasses).
Even without Final, I think it’s strange that mypy allows this (in non-strict mode):

class C:
    x: ClassVar[int] = 5
    
    def __init__(self):
        if some_condition:
            self.x = 6

mypy doesn’t allow this with Final nor ClassVar in strict mode.

beauxq · October 21, 2024, 1:57am

(C# also allows the default in the class scope and then another assignment in the constructor, but I suspect in C# it doesn’t save memory. So even without the memory consideration, I think it’s still worth it.)

Eneg · October 21, 2024, 9:21am

Thanks for detailed feedback!

Yup, should be.

I wasn’t sure on this. I know one example of a class in a typed library (yarl.URL), which:

is immutable,
does not define __init__,
~~does all initialization work in __new__~~ (they initialize it both in __new__ and alternate constructors)

Not a very adequate example, since their alternate constructors would violate the current rules.

I don’t know how I could be specific here. Naming the exact methods isn’t a sound idea - if we include dataclasses’ __post_init__, then what about attrs’ __attrs_post_init__? What about the other methods attrs specifies?
And if we include attrs, then what about any other future or existing class transformation libraries…

I might be getting ahead of myself here, but the way I see this working is by some configuration mechanism. It could be type checker level configuration (extendedInitMethods = ["__post_init__", "__attrs_post_init__", ...]), or some extension to dataclass_transform.

Under this idea, I think the best I can do is outline the general rules those methods should follow. I realize I didn’t give it enough justice in the current iteration.

I reckon you’re talking specifically about the ability to overwrite a class default from init, and not the multiple assignments within init?
I’ve included it per @beauxq’s request. I don’t have any clear objections, though their rationale of memory savings isn’t compelling to me - as I’ve mentioned multiple times, I’d expect __slots__ to yield better results.

Overall I’m -0 on this.

I don’t see why it couldn’t be, though IMO it is of very little utility. From my observations pyright currently treats it like a normal attribute (matches instance attributes, rejects descriptors).
The current convention of using @property additionally covers properties too.
A major point for ReadOnly is the support for both, + custom descriptors.

Will do.

I agree. IMO the ability to del a declared attribute, whether read-only or not, breaks some important assumptions in a typed codebase.

I purposefully only mention attributes. I believe the interpretation of ReadOnly in the context of local/global variables would be exact same as what Final already defines. One could argue it’s better to have only one qualifier with that meaning. OTOH, parity with Final might be desirable. I take the former stance.
Regardless, I reckon you’d like the spec to mention this.

This is indeed what I’d answer. I’ll make the spec clear on this.

Eneg · October 21, 2024, 9:35am

The same could be achieved by placing the default value as __init__ parameter default in a class defining slots. The cost of multiple references to the same object in a slotted class should be lower than the base cost of a per-instance __dict__.

beauxq · October 21, 2024, 1:41pm

That same is not achieved by that.

You suggesting changing from a common, more simple pattern to an uncommon, less simple pattern.
__slots__ is only smaller if there are very few variables with this default pattern. __slots__ can’t save memory if there are just a few more of these defaults.

I’ve tried to make it clear that memory is NOT the primary concern. It’s just a little bit extra on top of the already sufficient rationale of the common initialization pattern.

__slots__ does not yield better results if there’s more than 1 variable with a default.

Eneg · October 21, 2024, 8:01pm

Ok, that’s fair.

I fail to see how

class ObjA:
    value: ObjB = ObjB(...)

    def __init__(self, value: ObjB | None = None) -> None:
        if value is not None:
            self.value = value

Is simpler than

class ObjA:
    value: ObjB

    def __init__(self, value: ObjB = ObjB(...)) -> None:
        self.value = value

Neither pattern works with mutable objects, and neither initializes the default object any differently.

Personally, I have not seen any library use this, let alone place multiple of such “flyweight” attributes on one class.
Thus I find it hard to believe it is as common as you portray it ^[1]; could you present some existing examples?

Edit: I know one example of a library where a somewhat similar pattern is used. The class variable does not share a name with the instance attribute, and is mutable. It serves as a convenience way of setting a default value for the instance attribute.

survivorship bias, yup ↩︎

beauxq · October 22, 2024, 4:03am

Eneg:

I fail to see how

class ObjA:
    value: ObjB = ObjB(...)

    def __init__(self, value: ObjB | None = None) -> None:
        if value is not None:
            self.value = value

Is simpler than

class ObjA:
    value: ObjB

    def __init__(self, value: ObjB = ObjB(...)) -> None:
        self.value = value

I failed to notice when you injected the assumption that the value being set would be passed as a parameter. Why are you assuming that?
And the pattern that you suggested that I’m referring to included __slots__

in GitHub - python/cpython: The Python programming language

Lib/multiprocessing/shared_memory.py:71
Lib/idlelib/undo.py:173
Lib/asyncio/streams.py:189
Lib/asyncio/tasks.py:82
Lib/csv.py:97

This is far from an exhaustive list.

Eneg · October 27, 2024, 7:43pm

New commits (source)
Of notable changes:

noted that ReadOnly remains invalid for locals and globals, as it’d be 1:1 to Final in that context (opinions?)
syntax section, denoting that self.id: ReadOnly = 123 is valid (without [<type>])
initialization section
- I’ve removed the ability to assign to read-only attributes within __post_init__ (see rejected ideas)
- type checkers should warn on uninitialized read-only attributes outside ABCs and protocols ^[1]
subtyping section - I’d like some feedback particularly on the protocols/ABCs part, the wording can likely be improved

Todo:

parity changes to Final
“Type consistency” section? Not sure if something like that is necessary (akin to PEP 705)

Are there any edge cases where this comes useful? I don’t think this is in the same bag as the ClassVar[ReadOnly[...]] example ↩︎

oscarbenjamin · November 8, 2024, 2:16pm

Best practice for defining immutable classes is to use __new__ and other class methods for construction rather than __init__. This ensures that the object is never available partially initialised and would always be seen as fully constructed in an __init__ method of any subclass. Restricting initialisation of read-only attributes to __init__ is bad because __init__ should usually not be used for immutable classes. Restricting it to __new__ and not other class methods or functions is also bad because other constructor methods are needed in practice besides __new__. In fact one important reason for using __new__ rather than __init__ is because you can have multiple class method constructors and choose which one to call in context whereas there is no way to have multiple __init__ methods.

For an example see fractions.Fraction in the stdlib but there are many more outside of the stdlib. Here is a simplified version that shows how the class method constructors might typically look:

from __future__ import annotations
from math import gcd

class Fraction:

    _numerator: int
    _denominator: int

    @property
    def numerator(self) -> int:
        return self._numerator

    @property
    def denominator(self) -> int:
        return self._denominator

    def __new__(cls, num: int | str | None = None, den: int | None = None, /) -> Fraction:
        if den is not None:
            if isinstance(num, int) and isinstance(den, int):
                return cls._new(num, den)
            else:
                raise TypeError("Fraction() takes two integers")
        elif num is not None:
            if isinstance(num, int):
                return cls._new_raw(num, 1)
            elif isinstance(num, Fraction):
                return num
            elif isinstance(num, str):
                return cls._from_str(num)
        else:
            return cls._zero()

    @classmethod
    def _new(cls, num: int, den: int) -> Fraction:
        g = gcd(num, den)
        num = num//g
        den = den//g
        if den < 0:
            num, den = -num, -den
        return cls._new_raw(num, den)

    @classmethod
    def _new_raw(cls, num: int, den: int) -> Fraction:
        # Assumes num and den are already normaliased
        obj = super().__new__(cls)
        obj._numerator = num
        obj._denominator = den
        return obj

    @classmethod
    def _zero(cls) -> Fraction:
        return cls._new_raw(0, 1)

    @classmethod
    def _from_str(cls, s: str) -> Fraction:
        if "/" in s:
            num, den = s.split("/")
            return cls._new(int(num), int(den))
        else:
            return cls._new_raw(int(s), 1)

    def __repr__(self) -> str:
        return f"{self._numerator}/{self._denominator}"

    def __mul__(self, other):
        if not isinstance(other, Fraction):
            return NotImplemented
        return self._mul(other)

    def _mul(self, other: Fraction) -> Fraction:
        # Cancelling small gcd here is much more efficient
        # than going through __new__ at large bitsizes
        an, ad = self._numerator, self._denominator
        bn, bd = other._numerator, other._denominator
        g1 = gcd(an, bd)
        g2 = gcd(ad, bn)
        return self._new_raw(an//g1 * bn//g2, ad//g2 * bd//g1)

Note in this example that the __new__ method needs to provide the friendly public interface for users of the class and therefore has to accept many different types and check for them. Also __new__ cannot assume that input arguments are normalised so when given int arguments it always needs to cancel GCD and make the denominator positive (it should also check for zero denominator…). The same problems would apply if __init__ was used as well.

Internal calls to construct new instances should typically bypass __new__ and use a more specific class method constructor for known types. Note in particular that __mul__ computes normalised numerator and denominator and therefore needs to bypass the normalisation that is performed by __new__ which is why it uses the _new_raw class method instead of __new__.

The implicit rule understood by people who write such immutable classes is that if a class method or function actually creates a new Foo (by calling super.__new__()) then it should fully initialise the Foo before returning it. In the example shown _new_raw does this and is the only place that assigns the attributes: every other method just forwards an already initialised object that is received from elsewhere. Every method or function that returns a Foo therefore returns a fully initialised Foo that should be considered immutable once received by the caller.

Ideally a type checker could understand this and detect the error in this code:

class A:
    _val: ReadOnly[int]
    def __new__(cls, val: int) -> A:
        obj = super().__new__(cls)
        return obj  # Uninitialised A: obj has no _val

Allowing assignment of read-only attributes in an __init__ method is contradictory because __init__ can be called on an already created object or may never be called. I can understand why you might want to allow it for __init__ given that most Python programmers don’t know __new__. Allowing this for __init__ but not for __new__ is backwards though.

beauxq · November 8, 2024, 5:33pm

Also, PEP 526 says that instance variables annotated in the class scope “should be initialized in __init__ or __new__”.
So to not allow __new__ would be a deviation from PEP 526.

Eneg · November 8, 2024, 7:24pm

This always bothered me; there’s hardly any documentation on the subject.
I have limited understanding of it after looking through some examples of immutable classes. ^[1]
It’s also why my initial version of the draft included assignment within __new__.

I don’t think the immutable classes mentioned so far absolutely couldn’t use __init__, although it’d be impractical for many reasons, like performing checks or normalization multiple times.

Your example does not actually perform any assignment within __new__, but delegates that to _new_raw. I gather you’d like to allow the assignment not only in __new__ and __init__, but also classmethods?

I can think of an example where assignment within methods other than __init__ can be problematic:

class CachingThing:
    _cache: ClassVar[ReadOnly[dict[int, Self]]] = {}
    foo: ReadOnly[int]

    def __new__(cls, foo: int) -> Self:
        bar = foo * 2
        if bar in cls._cache:
           # different, already initialized instance!
            self = cls._cache[bar]

        else:
            self = super().__new__(cls)

        self.foo = foo  # error?
        cls._cache[foo] = self
        return self

Though I’m not sure if same situation couldn’t be made for __init__.

Thanks for mentioning the PEP, I didn’t read it before, and it provides an example to the “instance variable with class default” case

aforementioned yarl.URL, Fraction ↩︎

oscarbenjamin · November 8, 2024, 8:43pm

I think that a lot of Python programmers don’t understand __new__. It seems clear to me that the mypy code for __new__ was not really designed by people with much experience of using it.

I like to have a single class method like _raw_new that does the actual initialisation with the raw normalised types so that all other constructors can ultimately delegate to it. That is not essential but there do need to be other entry points that bypass __new__ because __new__ is necessarily part of the public interface. Absolutely there need to be other class methods besides __new__ that can construct the object.

Your example demonstrates one of the important uses of __new__ that __init__ cannot be used for: if you have all immutable instances then you may want to intern them to make them unique in memory. I would have written the code differently though:

class InternedThing:

    _cache: ClassVar[dict[tuple[int, str], Self]] = {}

    _foo: int
    _bar: str

    def __new__(cls, foo: int, bar: str) -> Self:
        key = (foo, bar)
        obj = cls._cache.get(key)
        if obj is None:
            obj = cls._raw_new(foo, bar)
            obj = cls._cache.setdefault(key, obj)
        return obj

    @classmethod
    def _raw_new(cls, foo: int, bar: str) -> Self:
        obj = super().__new__(cls)
        obj._foo = foo
        obj._bar = bar
        return obj

    @property
    def foo(self) -> int:
        return self._foo

    @property
    def bar(self) -> str:
        return self._bar

This code does not have the problem that you show because we separate the cache manipulation in __new__ from the object construction in _raw_new. The __new__ method here only deals in fully initialised instances and does not assign anything to their attributes.

The distinction that needs to be understood here is that _raw_new is allowed to mutate the supposedly “immutable” object because it created the object and has not yet shared it anywhere: every immutable object has to begin life in a mutable state while it is being constructed. As soon as _raw_new returns that object, no one else (not even __new__ or __init__) should mutate it.

The __new__ method is absolutely needed in many different situations. The __init__ method only initialises objects but __new__ and other class methods are what create the objects. It is not possible to control the creation of objects with __init__ (e.g. to return an existing object from the cache as above).

Eneg · November 8, 2024, 10:06pm

My example demonstrates how allowing assignment outside __init__ can break the invariant ReadOnly is supposed to impose. The instance retrieved from _cache is already fully initialized and writing to its .foo shouldn’t be possible, yet the code flow allows mutating its read-only attribute.

I don’t know if type checkers have or can have the context to flag the code as unsafe.
Unlike in __init__, where the instance can only possibly come from the first positional parameter, __new__ can get the instance from any arbitrary function or method.

The issue isn’t about whether you can write code free of this problem. ^[1] It’s about the fact you can write code with this problem, and whether type checkers could warn about it (and would they?)

I could put the assignment in the else branch and the problem would be gone ↩︎

oscarbenjamin · November 8, 2024, 11:06pm

You can do it inside __init__ as well:

class Foo:
    val: ReadOnly[int]
    def __init__(self, val):
        self.val = val

f = Foo(1)
f.__init__(2)

I think that a proper model for how this works needs to distinguish somehow between initialised and uninitialised objects: the readonly attributes are only assignable while the object is uninitialised. You can suppose that the first argument to __init__ is always uninitialised even if the runtime does not enforce this. You can also suppose that an object created locally with __new__ is uninitialised until passed out to another scope. The latter case works better because __new__ cannot be called twice. Note that inside __new__ the object is still created recursively by __new__:

class Foo:
    def __new__(cls, ...):
        obj = super().__new__(cls)

Eneg · November 18, 2024, 9:54pm

Update: (source)

Final & ReadOnly attributes are not deletable
Final does not imply ClassVar when initialized at class-level
Final should be a superset of ReadOnly again (by changing Final, not conforming to existing spec)
3 open issues

@carljm, could you take a look at the PEP?
I’ve noticed that in the meantime someone created and PR’d their own draft PEP, and they’ve been receiving feedback directly on GitHub.
Is mine at the stage where I could PR it? If not, what is missing?