PEP 795: Add deep immutability to Python

(they explicitly say this in the newly added section “Expected Usage of Immutability”)

1 Like

Thanks. I missed that. But taking into account both the point I made about static type checking being evidence that people don’t want to track such things manually, in conjunction with the confusion expressed both here and in the previous thread around what exactly gets frozen when I freeze a value, I’m not convinced that’s a realistic position to take.

That section also doesn’t cover the case “If a library relies on user-provided data to be mutable…” Which is the default case - as I said, library code taking a list as input quite reasonably (but not always) expects to be able to call .append on that list. So either the library has to document every case where it plans on mutating data that’s typed as being mutable (i.e., declaring the type is no longer sufficient) or the caller has to check undocumented details of the implementation if they want to pass a frozen type to a function, or users must never pass frozen types to any function (including stdlib functions) unless they are explicitly documented as supporting frozen values. At which point, aren’t we back to a situation where we could just include mutability in the type?

4 Likes

I don’t know whether this has been brought up already, but the people behind the pyre and pyrefly typechecker have been thinking about a way to represent immutability in the type system. See, for example, their slides from PyCon. They introduce a “wrapper type” (or rather, a special form) called ImmutableRef, which can be used like this:

class Foo:
    x: str = "foo"

def no_attr_modification(foo: ImmutableRef[Foo]) -> None:
    foo.x = "bar" # Type checker error: Cannot assign to attribute `x` since it is immutable

def returns_readonly() -> ImmutableRef[Foo]:
    return Foo()

def no_attr_modification2() -> None:
    foo = returns_readonly()
    foo.x = "bar" # Type checker error: Cannot assign to attribute `x` since it is immutable

It’s deep immutability:

class Bar:
    x: str = "hello"
class Foo:
    bar: Bar = Bar()

def test(foo: ImmutableRef[Foo]) -> None:
    reveal_type(foo.bar) # Type is `ImmutableRef[Bar]`
    foo.bar.x = "world" # Type checker error: Cannot assign to attribute `x` since it is immutable

It can also be applied to the self argument:

class Foo:
    def immutable_ref_self(self: ImmutableRef[Foo]) -> None:
        # Attribute accesses on `self` will be restricted.
        ...

However, in their proposal, the immutability only applies to a specific reference and is not “globally” applied to the object:

class Foo:
    x: str = "foo"

def callee_readonly(foo: PyreReadOnly[Foo]) -> None:
    # foo.x canNOT be modified in this function
    ...

def caller(foo: Foo) -> None:
    callee_readonly(foo)
    # foo.x CAN be modified in this function
    foo.x = "bar"
7 Likes

Wait, why can’t you share types without immutability? (See my comments about my proxy type below.)

No, it was sort of a hobby project that I made for fun. It’s based on similar work I was doing to implement arbitrary object immortalization.

Basically, there’s a SharedObjectProxy class that holds three things:

  • A reference to an object.
  • A mutex.
  • A pointer to an interpreter.

The proxy is made immortal (to prevent races on the reference count field), but the reference to the object is not made immortal. Then, all attributes and whatnot are serialized by the mutex on the proxy. So, for example, SharedObjectProxy.__call__ is roughly equivalent to:

class SharedObjectProxy:
    def __call__(self, args, kwds):
        interp = _PyInterpreterState_GET() 
        if interp != self.interp:
            enter_interpreter(self.interp)

        self.mutex.acquire()
        res = self.wrapped.__call__(args, kwds)
        self.mutex.release()

        if _PyInterpreterState_GET() != interp:
            enter_interpreter(interp)

        return new_proxy(res)

Everything on the class is implemented in a similar manner where the interpreter and mutex are acquired before making an equivalent call to the wrapped object.

The proxy is the reference to the object. You can interact with it in nearly all the same ways (except something like is, I guess). If you have a function wrapped by the proxy, then you just call the proxy. For example, if you wanted to share an (unserializable) file between interpreters:

import _interpreters

def my_function(my_file):
    # You're not actually calling the file, you're calling the proxy.
    # my_file.__getattr__("read") will be wrapped and serialized for
    # the calling interpreter, and then return a bound method object
    # that is wrapped by another proxy. Then, the call to read() is
    # actually calling the __call__ of the proxy, which is again
    # switching interpreters and serializing.
    print(my_file.read())

file = open(__file__)
interp = _interpreters.create()
_interpreters.set___main___attrs(
    interp,
    {
        "my_function": _interpreters.share(my_function),
        "my_file": _interpreters.share(file)
    }
)
_interpreters.run_string(interp, "my_function(my_file)")

There’s no chance of races at all here. By switching to the correct interpreter, you don’t have to worry about races on the wrapped object. I don’t totally remember why I added the mutex, you might not even need it. The tradeoff here is that my_function and my_file cannot be deallocated for the lifetime of the interpreter (not the runtime!), but I think that’s a little bit easier to swallow than sys.modules becoming immutable on accident.

Oh, ok, that’s a bit better. The PEP is a little confusing on this point: does the reference count of an object have to be 1 in order for it to be freezable? If so, that’s going to be a very hard sell, because of the points I made earlier (we don’t want a lot of explicit deletion in the language).

If not, I’d suggest removing the use of explicit deletion from the example in the abstract. The abstract is the first thing that people see when they read the PEP (and what most people will only read), so it’s a bit of needless negative perception if they get met with that. I think the Counter example would fit really well there instead, and then move the current snippet to a dedicated section on cycles.

Here’s the rejection part:

  1. Deep freezing immutable copies as proposed in PEP 351 The freeze protocol. That PEP is the spiritual ancestor to this PEP which tackles the problems of the ancestor PEP and more (e.g. meaning of immutability when types are mutable, immortality, etc).

It would be nice to understand why this was rejected, especially if there was prior art for it. To me, a prior PEP indicates that an idea was good or worth pursuing.

Another good thing to include in this PEP would be how it addresses Guido’s concerns on PEP 351.

I’m curious how you plan to implement this, though. Is the idea to protect reference count modifications behind Py_CHECKWRITE?

Imagine a legacy C function (i.e., written before this PEP, so it doesn’t use Py_CHECKWRITE) like this:

static PyObject *
trampoline(PyObject *self, PyObject *ob) // METH_O
{
    return Py_NewRef(ob);
}

The function has no way of knowing whether ob is immutable or not, so there’s no choice but to use atomic reference counting, right?

2 Likes

Yes - those would have to go. That way the freezing system can leverage on Python’s ability to hook on attribute access -that is my counter-proposal against the big-freeze-wave. ANyway - as I put in my first post, such a proxy could be very specialized - it could even be the very same object, if Python’s internals for retrieving anything in the object - it just needs to receive the information that “this instance is frozen in this context” in a side-channel. (That could probably rely on the structures used by context-vars and make use of weakrefs - I never said it would be simple! And then, if its expensive, instead of a proxy on every access, a proxy which would allow ‘lazy freezability’)

(and thus, good-bye to “direct slot access” in native code - that is really a no-no, but it also should be in “ordinary” frozen objects - of course extension code going through pointers in slots could always go into a frozen buffer and write things there)

But then, limiting what is frozen instead of just freezing entire class hierarchies actually sounds better - I will read more carefully the post you linked.

As for PyDict_GetItem would have to be modified to propagate the ‘frozen’ status of a retrieved item if the dict has the same ‘frozen’ status. Ordinarily the ‘frozen’ status could be a proxy-class wrapper (then it could even be done in Python code, without modifying PyDictGetItem - frozen hierarchies would need a special dictionary class), but working on lower levels, it could be a single bit passed along some side-channel for the current context, and different behavior on all attribute/item retrieval code - yes. (As I understand, the proposal already has to modify all attribute/item writing code anyway, so that is not that far fecteched)

[Checking your other post for a possible simpler freezing]

here, yes:

In other words, I think we should limit the scope of the proposal such that freezing an instance does not freeze its class, freezing a class does not freeze its bases, and freezing a function does not freeze its globals and builtins, at least by default.

That- maybe reducing the scope of freezing could really simplify a lot of things around!

But I don’t think that it means the same of your “in other words” in the previous paragraph to that:

In almost all legitimate real-world use cases I can think of, barring instance registries and caches, classes and module globals are altered only during initialization. Once they start spawning instances ready for concurrency, modifying classes and modules would be considered monkey-patching and a bad practice.

Yep - that can be “bad practice” in several domains/scopes, but that is the way the language works - and that PEP 795 as it is now would all but forbid. (Actually, worse than forbidden: it would still be allowed, but then fail in unpredictable scenarios when combined with other, totally unrelated code)

The idea of either proxying elements in a freeze graph, or limiting the things that are actually frozen is exactly so that this kind of code, for bad practice it is, keep working, as the alternative is not only breaking the language really hard, but a fundamental change to its semantics.

3 Likes

I think the last line should be f = None, FYI

3 Likes

This seems to me to be a critical issue for the folks advocating some kind of “immutable proxy” idea; if the proxy views the data of the object with any updates made to it (rather than at the time the proxy was created), how does that avoid data races? It’s a proxy that can be read by concurrent threads but also might change arbitrarily at a time that’s outside those threads’ control. The “deep” part is necessary.

3 Likes

It would be more robust if you made a deep copy first.

Hi Ben!

Yes — there are (at least) two aspects to this, one philosophical and one technical. Let’s start with the technical. If we allow developers to make mistakes that permit data-races to happen (i.e., our approach is not sound), then we need to fortify the Python interpreter against this happening. This is the reason why PEP703 needs to add so much complexity to the CPython implementation — a lot of it is about ensuring that the interpreter does not get corrupted or leaks memory if a program has a bad race (and without compromising performance). So technically, on-top of free-threaded Python, we are already paying for this. However, on-top of subinterpreters, we are not. If we wanted to pursue the approach you propose, then it would prevent subinterpreters from sharing immutable data directly by reference unless we effectively ported a lot of the complexity from free-threaded Python to subinterpreters. So technically, I disagree with your statement that making objects truly immutable is ”for all practical purposes it is wholly unnecessary.”

On to the more philosophical aspect. This proposal is the first step in an attempt at making it easier to write concurrent and parallel Python programs. We are trying to ensure that data-races lead to exceptions rather than silently corrupting the heap of your program or generating the wrong result. I think there is a fundamental incompatibility between the consenting adults approach and concurrency — a module that does not properly implement immutability can then no longer be used with threads without being susceptible to data races. One can make a somewhat similar argument with deep immutability — it is possible that a module does something with freezing that makes it impossible to use in combination with some other code, but when that happens, you will get an exception, not a non-deterministic and possibly silent bug.

I think that adopting concurrency in Python should be done in a way that excludes this nasty class of bugs if possible. This would seem like the Pythonic way to me (although I’m no authority), and it requires a sound approach. It is of course not up to me, but I think that if the cost of a sound approach is that one will sometimes have to go through some extra debugging steps due to unintentionally freezing some objects when, it is a price worth paying.

Do you see any merit to my reasoning?

FWIW, I see where you are coming from and I might have agreed in a different context, but not when it comes to data race bugs.

2 Likes

I am still somewhat concerned about the “viral” nature of the freeze operation, and in particular that it automatically applies to classes and to global interpreter state like sys.modules if referenced.

First, let me ask about classes:
Can we just not do this?

If I have a class instance, x = Foo(), then what’s wrong with the following two snippets being distinct?

# just the instance
freeze(x)

# the instance and its class
freeze(x)
freeze(Foo)

If the instance references its class in an ordinary attribute, that’s fair game. But this would at least make it possible to freeze instances of classes which contain mutable state as class attributes.

Otherwise, many classes which implement __init_subclass__ cannot have their instances frozen.

Sure, a class could be used to smuggle data to its instances or otherwise render them prone to race conditions. But so can access to the filesystem or any other externality. Freezing classes needs stronger justification than it simply being “safer”. I’m not currently satisfied that the PEP addresses why freezing must be so aggressive – for comparison, consider deepcopy.

Regarding sys and potentially other user-facing global interpreter state, can we allow objects to make themselves “unfreezable”? Then freeze({"x": sys.modules}) would fail, rather than break all imports.


Related to objects making themselves unsafe to freeze, I think it would be very valuable to have a dunder which gets invoked when an object is frozen.

Here’s a simple class which demonstrates why this would be useful:

from functools import cached_property
from math import sqrt

class P:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    @cached_property
    def magnitude(self):
        return sqrt(x**2 + y**2)

When freezing, magnitude may or may not have been calculated. If I want to force it to be computed, so that it can be retrieved later, just before freezing is my last chance to do so.

And yes, this means that freezing can become expensive and can raise errors, but it’s not so different from pickling or deep copying in that respect.
The alternative is that classes like P above become safe or unsafe to freeze depending on their usage prior to freezing. I think that’s a worse outcome, since it makes existing code “freezing unfriendly” with no obvious solution.

This hook can also return a sentinel (False, NotImplemented) or raise an error to indicate that the object cannot be frozen.


I remain wary of this proposal.

There’s a significant risk of breakage if this mechanism is used in nontrivial programs. If used incorrectly, the manner of that breakage could be very hard to diagnose.

If future proposals are supposed to give us the tools to make this safe to use, then the features provided should be experimental until those tools are available. That means “not available in the default build of CPython”.

4 Likes

No, it’s a critical issue for people using data sharing across threads. You don’t need immutability at all if you just don’t modify the data[1], and if you have immutability and still try to modify the data then your code is just as incorrect.

Your program needs to get this right regardless of what tools exist. The reason we need more tools is simply to reduce the overhead of the runtime, they’re not going to solve anyone’s actual coding problems. The best we can hope for is that the tools are designed in a way to lead people towards code that will work rather than away from it (sometimes called the “pit of success”, because it’s easy to fall in but when you do then you succeed).


  1. The runtime can optimise better if it knows this, but the code will be correct either way. ↩︎

5 Likes

Why? Python isn’t fortified in this manner right now, and the world hasn’t ended. People write code with data races, and that code has bugs. So what? The goal is surely to make (data race) bugs more difficult to introduce, not to guarantee code is 100% bug free.

The hardening needed for free threading is to prevent interpreter crashes, not data races. As far as I know, neither free threading nor multiple interpreters are subject to crashes because of data races.

So I think you are taking too extreme a position, based on an attempt to achieve a far stronger goal than people want, and that’s why people aren’t willing to accept the costs the proposal imposes.

11 Likes

I advocate for a proxy - but not for a “proxy which allows mutation” - not at this point, when the PEP seens to be about “freezing” and not 'freezing things for others, while the owner can mutate then".

That said, even if that is to be the goal, it is the same proxy: as it should get attributes (and items) of a frozen graph lazily, as requested (and vesting then the needed information so that they are frozen in that call-chain). For pure-python code, guarding __getitem__ and __getattribute__ would be enough. The owner would be the only part of the code with a non-frozen reference to the object - - and that is the one used for mutation.

This is not really hard for any pure-Python code (and I have similar dictionary/list wrapper code in my extradict.NextedData class)

Just for demonstration of a “freezeable mapping” (which under the idea of “consenting adults” or even better: through proper support from the typing system) could even be used to pass deeply-nested mappings across subinterpreters:

from collections import UserDict
from collections.abc import Mapping

class FreezeableDict(UserDict):
    frozen = False
    def freeze(self):
        cls = type(self)
        frozen = cls.__new__(cls)
        # carries the over content of the original dict with no copy to the new instance:
        frozen.data = self.data
        frozen.frozen = True
        return frozen
    def __getitem__(self, key):
        value = super().__getitem__(key)
        if self.frozen and isinstance(value, Mapping):
            value = type(self)(value).freeze()
        return value
    def __setitem__(self, key, value):
        if self.frozen: raise TypeError("mutation to frozen mapping")
        super().__setitem__(key, value)
    def __delitem__(self, key): ...
mutable = FreezeableDict(a=1, b = {})

And you can just paste this in a REPL and play along:

mutable = FreezeableDict(a=1, b = {})
frozen = mutable.freeze()
mutable["b"]["c"] = 3
print(frozen["b"]["c"])
frozen["b"]["d"] = 5
TypeError(...)

For the naive pure-python code, I create a new instance of the frozen mapping - because “frozen” is an ordinary attribute. But maybe, for keeping things simple, another object, meaning the “frozen root of an object graph” should really be another instance, separated from the mutable reference, but for its content.

I fully agree, and weight on - again - on the "viral natura of freezing as problematic.

As far as “KISS” is concerned, it seems that just not freezeing classes at all, (not even when retrieved in methods inside the instance) should be the safe bet.

If something more sophisticated is demonstrated to be needed, the proxy approach could also work - so that a class retrieved through a frozen instance will display the “frozen behavior” (by effectively being a proxy to the original class). The same class retrieved from elsewhere is not changed.

I’m not sure whether that’s true, but if it is, I’d say that just means we have to accept at least some level of possibility of data races. In other words I think a certain level of flexibility that underlies the “consenting-adults approach” is central to Python — much more central than a need for 100% safe concurrency. It just isn’t feasible to graft that level of certainty onto a language that has so many ways for people to do “unsafe” things.

The reason I’m not sure it’s true, though, is that I think it’s only true if we insist on that 100% safety level. I feel similarly about other tensions involving the “strict” side of programming theory. For instance, is there a “fundamental incompatibility” between the consenting adults approach and strict typing? Not really. What’s incompatible is insisting on completely safe/sound typing, and below that there’s a level of strictness that is perhaps achievable but too much trouble[1]. And, insofar as this incompatibility exists, strict typing is what has to give, not the consenting adults approach.

To me the question is whether we can find some level of increased safety that is worth the tradeoff in flexibility and inconvenience. If we can come up with something that catches many common error cases without messing up too much of how things are, great. But we don’t have to catch every single data race.


  1. I’d say this covers just about all of it, although others certainly disagree :slight_smile: ↩︎

5 Likes

It would be more robust if you made a deep copy first.

but then, all the benefits of freezing are gone.

If data is to be copied into the frozen object, that is already possible, and there is no gain at all in all this discussion.

The idea is being able to use data in another thread/interpreter without fearing data races or corruption.

It is already possible to share immutable objects between threads without worrying about data races or corruption. It is also possible to share mutable objects between threads and just not mutate them if mutation is not needed. In my experience issues around thread safety don’t come from mutating things that are not supposed to be mutated but rather from concurrent access to things that are intended to be mutated.

The proposal for freezing objects only makes sense if you think of it in a world where something else is going to come along and prevent all current use of mutable or immutable sharing. The expected solution this future provides for any current issues with shared mutable state is just that sharing would be disabled.

2 Likes

This is the key point, and the reason I think the concerns about virality are overblown.

I expect people to freeze objects that were already intended not to be mutated to prevent later misuse (whether by an api consumer, or just later refactoring in a large codebase)

Freezing objects that the intent is that they aren’t mutated prevents multithreading misuse without the cost of a lock, and allows further optimizations by the interpreter in how objects are handled, and results in unambiguously erroring wherever that object was later attempted to be mutated rather than needing to detect that a data race occurred later.

I’d also be fine with the default being “this object isn’t freezable” (inverting the the way the pep currently has it for pure python types) as a result, the objects people will want to freeze should be objects they control the definition of anyway.

1 Like

I am not sure if people have actually realized the consequences of this proposal:

It requires a complete change in how most people write their code. import X style statements instead of from X import ... are fundamentally incompatible with this proposal - any function that needs to be frozen cannot make use of module objects, which means it cannot reference module objects, whether mutable or not. Making modules freezable just shifts the problem - now no module can reference e.g. sys, which can’t be made freezable.

And “any function that needs to be frozen” means “any function that is referenced by any function that needs to be frozen or is part of a class whose objects need to be frozen”.

This is an incredible viral operation, fundamentally. And it will lead to most code in libraries being rewritten (probably with an enourmous amounts of user requests and issues being raised) to match the new required coding style because some chain is going to hit almost any library. And because of how viral this operation is, not supporting it isn’t really any option. If even one decently popular library doesn’t want to change their coding style to be compatible, this feature is dead in the water.

And this is just one of the consequences I have been able to come up with - I am sure there are others that are equally far reaching.

My currently conclusion is:

  • This proposal doesn’t stand on it’s own - without the further work already implemented, this proposal has little purpose.
  • The proposal fails to achieve it’s goal - see my counter example for how to get the current implementation to have a reference to a shared mutable object. And I am sure there are other examples as well.
  • The proposal has far reaching and hard to guess consequences. This post showcases just one of them.

So I am a strong -1 and would probably actively refuse to make any library I maintain be compatible with it.

Originally I was only weak -0, but the more I have thought about this, the more I am thinking that this proposal is catastrophic.

2 Likes

This is actually the point - a class in Python is ordinarily “intended not to be mutated” - but by freezing classes, you are mandating that they not be mutated. This is not Python anymore - this is a language with a separate “build” (or “pre-run”) step and which can’t do introspection or meta-programing.

Perceive that it is not an all or nothing approach - what is needed and is not contemplated is a frozen information which will be attached to objects retrieved through a frozen region. For object instances, the frozen bit in the object header suffices. But for classes, it is important -and I and others are claiming: important enough for a lot of people being against this going forward as it is,

Summary

(and it is not be a matter of opinion: it can be shown that due to indirect references of one project using different requirement libraries, this can break existing projects)-

that the same classes retrieved from references not involved with the frozen region aren’t changed. This can’t be accomplished by an ordinary bit in the class instance itself -

So far, what allows this is being spelled by having a wrapper-proxy class when the class is retrieved from a frozen region - but it could be some other mechanism: the important thing is a piece of information referring to that “instance of the class reference”.

Perceive that this allows for “100% immutability of the whole region, including all class hierarchies and reachable modules” as you intend, while neither modifying the behavior of existing Python code, neither changing the semantics of the language itself to something close to static languages.