PEP 795: Add deep immutability to Python

Mostly that.

And not only frozen functions in an object graph, as the proposal goes so far as eagerly freezing any class object for any instance present in a frozen region - including the superclasses. And since they want “100% of reachable objects”, it would also mean MyClass.mymethod.__globals__ at which point, that would reach all the interpreter state for most Python programs, if not all.

As is, the proposal is just broken.

Well, they work around this - instead of all of __globals__ they only freeze the objects in there that are actually used in the function, which stops it from breaking everything all the time. But they cannot do that for module references.

1 Like

Let me ask a question that will hopefully shed some light.

What can this proposal do (that we care about) that we cannot do with the current suite of tools for immutable data?

Specifically, I would list frozen dataclasses, frozen attrs classes, py-rpds, pyrsistent, tuples, and frozenset. There are more, but that’s a good start.

I’m trying to grasp why I’m reading this proposal rather than the introduction of a frozendict type, or a proposal for language-level support for frozen dataclasses.

I may want to freeze a large object graph. The best way, today, is to improve the application architecture so that those objects are frozen from the start, and to provide evolvers rather than any mutators. That’s actually just the better way in general though: think about the data as immutable and design your APIs around that constraint.

Fundamentally, the one thing it seems that this can do is freeze structures which were not designed to be frozen. That’s why accidentally freezing sys has come up as a topic. There’s an API there and it relies on mutation. So we shouldn’t be making it easy or, ideally, possible to accidentally freeze it.

That seems like a fine start. Immutability being opt in, rather than forced everywhere, basically does the trick.

Suppose, following from this, that a subclass can opt out. Then we can make sys.modules an UnfreezableList, which is a trivial subtype and prevents people from doing damage there.

Do you consider that to be a reasonable approach? If I have an API which relies on the mutability of some global state – let’s say I have a dict that acts as a registry – I’m now able to indicate that it can’t be frozen or my library will be broken?

Maybe, but I don’t think they’re invalid.

I’ll put it this way:

  • this makes it easy to break things at a distance in a large and complex codebase
  • I want to know how we’re going to prevent that
  • the answer is, apparently, “don’t do that”
  • my question is, “how will I know I’m doing that?”

We already have object traversal rules in deepcopy; can freezing use the same rules?

As a library author, this is a new way for users to break my library code. And this discussion isn’t providing me with the knowledge to be able to tell them what not to do.

4 Likes

It can take code that wasn’t written with immutable types and more reliably verify an invariant that was previously assumed, but not enforced.

This turns existing data races (that can be difficult to prove or disprove in a multi-threaded scenario) caused by misuse of “things that shouldn’t be mutated, but for performance or convenience reasons” into runtime exceptions that will consistently be caused if that code path is hit.

Importantly, this includes cases where an object is built up mutably, but once shared across threads, the expectation in the application is “no more editing this”. (We don’t have such a type included with the language without copies, no the private HAMT implementation doesn’t count) and implementing it yourself requires writing native code. (frozen dataclasses aren’t actually immutable)

While it’s not ideal (that would be going back to the drawing board with the right data structure is preferable), it’s better than “just use a (deep)copy” in terms of options that don’t require large restructuring of existing code to not end up with performance and memory expensive copies.

I agree with this in theory, but I’ve seen too many “temporary fixes that became permanent” to know that there’s a ton of code that will only ever get the minimal changes needed to remove a bug.

I don’t disagree that there need to be better answers here, but people claiming it’s impossible to draw those lines seem… extreme.

I think this will likely require opt-in, rather than opt-out.

If it requires opt-in, it does enable drawing a line in the sand somewhere and saying that certain things aren’t covered and that if you declare that it is allowed to freeze a type that does it, that’s your error for declaring such a type freezable, not python’s error. (Maybe with some level of detection of this with a warning)

Easy: only freeze things that should never be mutated after they are frozen. The errors are the responsibility of whoever froze the object, not any arbitrary library’s function that was called with the frozen object. This is the same as if someone created a task then called cancel on it. The cancellation is theirs to handle.

And it is theirs to handle and not your responsibility as a library author. If they froze something and it caused an error, they already were violating their expected invariants in calling a library function that would mutate the object, the answer is to stop calling your function after freezing it.

I don’t think allowing subclasses to opt back out is a neccesarily a design I would encourage if we’re trying to have this be a property of types (Typing concerns get rougher than they already are), but at worst, the concerns here are no worse than those that already exist for hashability being both removable and re-addable as a capability, and the path to properly supporting that in the type system is known, so I don’t see this as irreconcilable either.

1 Like

Agreed. But isn’t the problem with this proposal that because of the way freezing follows deep links into the object structure, even freezing things that themselves should never be mutated after they are frozen results in linked objects (like the often-mentioned sys.modules) which should remain mutable getting frozen as well?

The PEP authors seem to be unwilling to respond to these points, so it’s hard to know what options exist, but if it were possible for a freeze operation to be limited to only the things that the user considers “should never be mutated after they are frozen”, then that might be better. I don’t see how that’s possible to do in an automatic way, though :slightly_frowning_face:

8 Likes

You are not making that easy by doing the automatic deep-graph freezing frenzy, including classes and subclasses.

There is a rather subjective meaning of “things that should never be mutated after they are frozen initialized” is subjective and you are failing to recognize that. That is just one point.

There are proposals in these discussions - I just did not see any of the PEP authors directing then in a reasonable matter, just because they’d change the “preset course” you have in mind.

I will enumerate them again:

  1. There is no need to freeze classes. Less so class hierarchies. The PEP section devoted to this shows up a rather artificial example, of something one would not be doing anyway. And that can be covered with a simple phrase in the docs in the flavor ot “note that classes are not made immutable along with instances, and if there is concurrent access to class attributes of immutable objects the results are undefined”

We’d be able to use all the cleverness and things devised so far - the immutable bit, and it would be very useful for more than 99% of the cases where this immutability is desirable.

  1. Even so, a proxy-based approach could allow for classes retrieved from frozen regions to behave like frozen. That is trivial for pure Python code, and rather feasible with runtime modification.

And, if those are not enough, we can take a step back, and simply pick a different approach. In my perception, being able to deep-freeze just “json-like” structures (numbers + strings + dicts + lists + special singletons) would allow, by itself, a lot of the workloads depending on frozen data to be implemented.

For yet another another different approach:

I’ve experimented deeply with concurrent sub-interpreter code since 3.12 beta, and put out some code in the “extrainterpreters” package: not being able to freeze data really impair things a bit, in practice mandating a deepcopy to be on the safe side.

On my roadmap I was thinking about a “lending safe”: you’d call a method to "put’ a reference to your data structure (region in PEP 795 speak) there, and then it would allow that region to be retrieved in another interpreter, one at a time, as long as it would hold the only reference to the root and contained objects. But not following the branches for __class__ or __bases__, as i was thinking about practical workloads, that not even for a moment was a whorry. And if it where, having a metaclass with an implicit lock (in this case an specialized cross-interpreter lock) to allow access to class attributes would be easy to write.

This “safe” approach would also fix lending for mutability for both other threads, async tasks and interpreters, with no need for imutability bit on any instance simply because: just the current holder of the safe contents would have a reference to objects inside the safe, and so it would be the only code thread capable of doing any modifications to its contents.

1 Like

Eh. Memory is ultimately mutable so if you try hard enough you can mutate anything. I’ve never seen a frozen dataclass mutated without it being very intentional, so to me they are immutable enough.

7 Likes

There definitely needs to be better answers here. My questions about fractions.Fraction were asked in the previous thread and then restated at the top of this thread. They are still unanswered:

If this proposal was fleshed out enough to be worth detailed discussion then it should be possible to give some sort of answer to my question about Fraction. I assume we are waiting for a response because the authors don’t know the answer but if that is the case then it means that no one knows what the details of this proposal are.

I did not pick Fraction as an example randomly. The Fraction class is a class in the stdlib that is implemented in pure Python and that has instances that are intended to be immutable. The internal data of a Fraction is literally just two ints. If it is not possible to freeze a Fraction then it is basically not possible to freeze instances of any existing classes implemented in Python.

The authors of this proposal are apparently unable to list which objects would be frozen in a clear and specific simple case like freezing a Fraction or to say whether Fraction is even freezable at all. Without answers here it is impossible to assess the virality on the one hand or the applicability of the proposal on the other because it is entirely unclear what things would be freezable or what things would be frozen by freeze.

4 Likes

Right, frozen dataclasses doing their object.__setattr__ wonkiness is never an issue in practice, at least within anything I’ve ever seen.

FWIW, I think frozen classes are a great example of where an approach something like the current proposal could be fantastic. If there were a way to flip a bit on any object and make setattr fail forever after, that would be awesome. It has the advantage that I’d know the scope of that action.

I think we must be talking past one another on this front. I’m not at all concerned about pure functions I provide, and not much concerned about types I provide which are purely data. I’m primarily concerned about classes which are responsible for some amount of execution.

Let’s suppose I provide a type, ThingDoer.
Can you freeze a ThingDoer instance? That freezes the ThingDoer class. If ThingDoer has a metaclass, ThingDoerMeta, that gets frozen too, right?
Maybe it works and maybe it doesn’t, but deciding whether or not it should work is my responsibility as a maintainer.

If the contract around object freezing is "only freeze instances of types you control, or builtins, that’s fine, in that I don’t have to worry about any of my packages, but I also think it would harm the proposal quite badly to lose the entire stdlib.

There is a question to answer here about the support which library authors and the stdlib need to provide. Maybe it’s all very simple and I’m not seeing it. I’d like that, since it would mean that someone can tell me the simple answer and I’ll just follow that advice. But I suspect that it’s not so simple, since data and execution are often mixed together, sometimes not even intentionally.

1 Like

I think we are here, if I may for a moment go to the slightly bigger picture:

The proposal provides multiple goals explicitly

  1. to reduce data-races.
  2. to increase interpreter performance.
  3. to create the building blocks to share objects created in python across subinterpreters.

The first one is arguably handleable without a language addition and just good practices, or encouraging use of libraries that provide better concurrency safe primitive types.

The second one is not, but copy-on-write proxies could be used to create a “fence on virality” and still have some benefit.

The third one seems nearly impossible to safely achieve without full virality. In fact, I’d say that the third one actually creates a motivation case for a module-level boundary on freezing code objects contained, and only allowing those code objects to reference other frozen modules or types.

Immutability is the obvious way to achieve all of those goals, and it aligns with writing good concurrent code. The virality comes from a mix of ability to reference mutable external scopes and how dynamic the language is.

The reason I’m personally advocating to switch this from opt-out, to opt-in is specifically points 2 and 3. I think the virality is a good thing for the proposal, but should require the person who authored an involved type to have considered it before it just happens, but once considered, if we draw the boundaries appropriately, this should “just work.”


If there was only to desire to reduce dataraces, I see “less controversial” proposal options[1] available that cover the user side, assuming some willingness to rewrite code, but we will always have performance and implementation complexity overhead caused by protecting interpreter state if we only have immutability with “well, this is actually mutable, we just have a setattr implementation that errors”.


  1. I’d be 100% for a new “actually data-only immutable record type declaration” (doesnt support custom descriptors, subclasses, mutability, or user defined methods, is just an immutable record type implemented directly by the interpreter) that the interpreter doesn’t have to do as much to protect and encourage the use of it in concurrency. ↩︎

3 Likes

The PEP seems to tunnel vision on a runtime approach to enforce deep (viral) immutability. Unfortunately, it did not seem to have considered alternatives such as (1) doing nothing or (2) syntactical constructs to enforce immutability at static or type checking time without runtime enforcement.

Plenty of other mainstream languages[1] have immutability enforced in the type system without any runtime enforcement. They all tend to build from constant declarations, up to (partially or fully) immutable aggregate types that can be reference or value types.

[1] e.g., Swift, Kotlin and TypeScript

As long as the language has a reasonable memory model — and CPython de facto inherits the C memory model — one should not get data race from sharing these syntactically-enforced immutable data structures. Even “faux” immutable classes (e.g., frozen dataclasses) would be doing alright.

More specifically, all (faux-)immutable data structures do start as a mutable one on the thread instantiating them. But as long as memory barriers are in place — and you get this transparently from properly implemented higher-level concurrency/sync primitives[2] — those threads are going to read these (pinky-promised) immutable data structures in full consistently.

[2] e.g., ThreadPoolExecutor.submit having this from its internal SimpleQueue, which has this from PyMutex_* indirectly via PyCriticalSection usage.

There would always be escape hatches e.g. runtime reflection, or manipulation through memory access. But at that point these usages are intentional enough that people should be left to their own destiny.

So it would be unfortunate to see immutability being categorically equated to runtime enforcement (freezing).

(Not to mention that there had been a mainstream language going all-in with required runtime freezing for cross-thread sharing, but then moved away from runtime freezing completely a few years later, due to code portability between language targets/modes and the usability/learning curve)

The PEP also floated some optimization opportunities being unlocked. While I can see the point on immutable object graphs, there are two general counter arguments I can think of:

  1. Optimizing (eliding) reference counting is a separate goal and does not contingent on runtime immutability enforcement via freezing. The deferred reference counting work for free threading is one such example. It is also unclear how would the reduction from e.g. immutable object graph stack up after deferred RC is in place.

  2. Freezing classes may very well enable the JIT & other things to make stronger assumptions about the memory layout and method dispatch, leading to more efficient accessors & code inlining. However, it seems again a separable goal from freezing data of the instances.

Having read the PEP for a second pass:

  1. the only value proposition strictly requiring runtime freezing is sharing of immutable object graphs across sub-interpreters.

  2. the PEP seems to be subtly angled as a counter pitch to some aspects of PEP 703 free-threading.

Perhaps the author should consider reframing the PEP as an object graph sharing solution to improve sub-interpreter usability.

If it continues with the current framing (a la deep immutability for the language), the discussion will inevitably drift into the questions around other alternatives and the necessity under PEP 703 — like those of mine above, or some earlier in the thread.

4 Likes

Explicit is better than implicit. Whenever typing is mentioned it’s dismissed as a future problem but if something like this can’t be typed it can’t be explicit. Why not start with a way to write truly immutable data classes? Add immutable collection types and some runtime magic to ensure that only immutable data is included in the class. Ignore/exclude functions for now maybe. This doesn’t seem to be directly addressed in the PEP as an alternate approach and as a minimum I think it should be as it’s not clear why this approach isn’t the one being taken.

In the current proposal there’s this whole load of behaviour changes that stem from a single function call in some thread setup. It’s obviously implicit behaviour changes, that’s why the exceptions raised would include the line where freeze is called.

I’m all for this in theory, I’ve done a load of multithreading stuff in c++ and it’s all immutable data classes. In python until now I’ve not bothered, but free threading will probably change that. For me personally, dataclass(frozen=True) and np_arr.flags.writeable = False is enough but I can understand the desire for broader immutability.

1 Like

I am somewhat confused about the goal here.

Is it to help me write code in which it is impossible to modify objects that should be immutable (e.g., to avoid data races). This proposal does make the objects immutable, so it might make it easier to check that there is no data race, but does it actually help me write code to avoid data races?

That is, even with this proposal, trying to change an immutable object is an error, which means my code is wrong. (Ok, you can catch the exception but my reading of the PEP is that you don’t want to get to this point.) I do understand that having this capability can help in developing and debugging but it should actually be irrelevant once the code is user-facing, right?

If so, shouldn’t we instead “just” be developing good tools for developing parallel/concurrent/multithreaded code?

The PEP says “We expect programmers to use immutability to facilitate safe communication between threads”, but it seems that immutability is just a signal to developers (others and your later self) that the code is thread-safe (in this particular way). Is this more of a proper statement of the goal?

I understand that (a) there are also potential optimisations; and that (b) “user-facing” might be “developer-using-the-library-facing” which is not quite the same thing.

5 Likes

I didn’t.

If the answer is the challenge of getting consensus as to how to type it (or even write it explicitly) then fair enough.

What you are missing here is this:

More explicitly the proposal is aimed at subinterpreters and subinterpreters do not allow arbitrary objects to be shared. The intention of this PEP is that subinterpreters would allow sharing of objects that have been frozen by freeze(obj) but would continue not allowing other objects to be shared. The frozen flag would not just be a signal to other developers but rather something that is checked by the runtime without which the runtime would prevent you from sharing objects between interpreters and threads at all.

The intention is that the runtime guarantees that there is no shared mutable state but it is not the freezing that provides this guarantee. The freezing mechanism described here is an escape hatch that allows some immutable objects to be shared in a world where sharing is otherwise generally disabled.

If you instead envisage the current threading model in which mutable objects can be shared freely then it is potentially useful to be able to freeze some objects but that is not what this is about. None of the claimed guarantees about data races apply if you can still share arbitrary objects without freezing them.

3 Likes

While this is totally true, do keep in mind that it doesn’t allow arbitrary objects to be shared because of this problem. Starting without sharing makes sharing opt-in, and leaves open exactly how that will happen (because there are a variety of options).

In contrast, free-threading starts with sharing and makes safety opt-in, leaving open exactly how that will happen (though 50+ years of consensus says synchronization/locks and that isn’t really being challenged by Python).

Immutability works for both, but for different reasons. As a sharing technique, certain approaches make sense, while as a safety technique, others do. It’ll be really easy for us all to talk past each other if we don’t acknowledge this.

6 Likes

I’ve been playing with the current implementation. Seems that it’s sometimes trying to freeze type variables that are arguments to generic functions. And that can’t be done. This goes back to the concerns of the viral nature of freezing. A simple decorator example:

from __future__ import annotations
from immutable import freeze
from typing import Callable
def foo[**P, R](fn: Callable[P, R]) -> Callable[P, R]:
    def fn_wrap(*args: P.args, **kwargs: P.kwargs) -> R:
        return fn(*args, **kwargs)
    return fn_wrap
freeze(foo) # TypeError: Cannot freeze instance of type typing.ParamSpec due to custom functionality implemented in C

I’m surprised that from __future__ import annotations doesn’t make a difference. I have a debug library that uses ParamSpec in decorators. I didn’t directly freeze it but it can’t be called from a frozen function due to the viral nature of freezing.

The ParamSpec object is accessible through foo.__type_params__; from __future__ import annotations has no effect on that.

ParamSpec is actually mutable at the C level (it caches the evaluated value of the __default__ attribute), so it would need special treatment under freezing.

This kind of “incidental” mutability for the purpose of caching is probably pretty common: the PEP should provide guidance on how authors of classes should deal with freezing in such cases.

6 Likes

Thanks for the clarification. The PEP does explicitly address decorators that are there for the purpose of caching. But my example and your explanation suggests anything in the stdlib that performs caching (I think re off the top of my head does some) would need adjusting to work with freezing. (sorry if I’m explaining the obvious, it’s not obvious to me)

Thanks for this framing. Our underlying motivation has been too improve sharing between sub-interpreters. In developing that work, we felt that the deep immutability concept was interesting in its own right, and should be considered in its own right as a potential feature for Python.

We believed that providing an intermediate proposal would allow us to deliver value to Python earlier. But in doing so, we have not made as clear as we should have the underlying difficult runtime constraints on sharing objects between sub-interpreters.

Focussing on just deep immutability has also been extremely useful as we have gained a massive amount of feedback from all of you about where deep immutability would and would not be useful. We are still digesting all of that feedback, and will be updating the PEP to reflect it.

This is a great suggestion. We have a lot of material and work on the next step, so we could do this. Our concern was that would be a very large PEP, and we wanted to get something out that was smaller and more focused. But based on the feedback we have received, it seems that this is a more useful framing.

which I believe is referring to

In the sub-interpreters work, this would not consistute a data-race, as the import would get the interpreter local version of the sys module state. However, in the free-threaded world then this would allow a frozen function object to mutate the state of the sys module, and potentially data-race.

I think overall, moving more things towards being interpreter/thread local is a good idea to reduce the issues around data-races. If the caching in various modules/types was moved to be interpreter/thread local, then there wouldn’t be data-races.

As @steve.dower pointed out:

Sub-interpreters allow us to gradually enable more things to be safely shared. Our initial aim would be to be able to pass immutable JSon like data between sub-interpreters to allow more efficient messaging than currently exists.

Based on the feedback we have received, we will

  • move to a completely opt-in model (we have gradually been moving towards this, and this discussion has helped us to clarify that this is the right approach)
  • failure to freeze will be completely backtracked, so that objects can add attributes of unfreezable types, and then not become frozen. Hence, the sys module would have an attribute of an unfreezable type, and you would never be able to freeze the sys module.
  • actively consider expanding the scope of the PEP to include the sub-interpreter work, and we would really appreciate feedback on this as an option from the community.
6 Likes