PEP 795: Add deep immutability to Python

Python already has various immutable objects like tuples and ints and so on. I think that the proposed freezing mechanism here only makes sense as something that can be used with an object that is already immutable in the current sense of the word in Python. Such an object could already be safely shared across threads in the free-threaded build.

To put it another way you would not be able to freeze numpy arrays but numpy could add an immutable array type and then you would be able to freeze that. We don’t need this PEP to add immutable numpy arrays and share them safely though.

I think that in practice the mechanism in this PEP adds value in these senses:

  • It marks an already immutable object as shareable across subinterpreters without unsafely breaking their isolation.
  • It provides some optimisations such as reducing reference count contention on shared objects.

The latter benefit applies also to the free-threaded build (and perhaps also multiprocessing with copy-on-write memory). The former benefit only applies if using subinterpreters.

1 Like

I’ll repeat my earlier point. I’m 100% in favour of immutable data structures. I think they would be a great thing to have for concurrent programming. Either as a 3rd party library, or as a stdlib module. I don’t think they need deep language support - just as someone could monkeypatch a stdlib module, but code routinely assumes they haven’t, we can reasonably assume that people won’t deliberately hack around the immutability guarantees of an immutable data structure.

In fact, the proposed freeze function is exactly this sort of problem in reverse. It deliberately hacks around the mutability contract of (for example) the list class, in a way that normal code would quite reasonably consider unacceptable.

You have to remember, Python is built on a ā€œconsenting adultsā€ principle. We don’t need deep interpreter integration for immutability. Convention, plus sensible API design for immutable classes, is plenty. If you’re claiming that immutability must be integrated deep into the interpreter, it’s your responsibility to prove that assertion. And at the moment, I’ve seem no proof - just repeated statements of the idea as if it were a self-evident fact (which it isn’t!)

I don’t believe that. The ā€œamazing stuffā€ that got added[1] was just re-establishing the data structure integrity guarantees that the GIL had previously provided. I’ve been repeatedly corrected for stating that free-threading meant it was easier to write threaded code, so I’m very sure that there’s no such amazing user-level changes.

Aren’t reference count manipulations atomic anyway, because of the free threading work? If not, then surely you could crash the interpreter just using multiple threads, and not need multiple interpreters?

I’m really not seing why subinterpreters are any different here. You say that ā€œpoorly synchronised programsā€ could blow up the interpreter (or worse - I’m not sure what’s ā€œworseā€ here…) I don’t see how. Can you give an example, where poor synchronisation blows up an interpreter when using multiple interpreters, but that same approach would not blow up when implemented just using multiple threads? Interpreter crashes are meant to be impossible, in the sense that it should not be possible for user code to provoke them. You’re claiming that’s not the case, unless I’m misunderstanding. (It might be worth remembering at this point that PEP 795 is not proposing any changes to what can be shared between subinterpreters. If allowing frozen objects to be shared is a critical part of the justification for this PEP, it needs to be part of this PEP, otherwise you’re asking the SC to approve just the cost, with no associated benefit).

Maybe. I’ve said this before, but got no response. You can’t split the cost of a feature and the benefit into two separate PEPs. Otherwise you have no justification for the ā€œcostā€ part, and you’ve effectively disallowed the option of approving ā€œpart 1ā€ (the cost) but later rejecting ā€œpart 2ā€ (the benefit). I can’t speak for the SC, but I wouldn’t be willing to consider a proposal split like that.

It’s dead easy to capture immutability in the type system. Have a separate runtime type for ā€œimmutable listā€. Then users choose whether to create a normal list or an immutable list, and one can be passed freely but the other can’t.

You have never explained why it’s necessary for the core types to be freezable, rather than creating new, immutable versions of those core types. We have frozenset and set. Why can’t we expand that model?

I feel like you’re over-stating the problem here. Isn’t it up to the library to just document that sort of information? Sure, many libraries might not do that right now, but free threading is very new, and people and libraries need to adapt.

What’s the rush for immutability? Couldn’t we wait a few releases, to see whether the problems you’re claiming will arise are actually as bad as you’re suggesting? PEP 795 feels like a solution looking for a problem. Let’s wait and see if the problem arises, and then bring out immutability as a possible solution[2].


  1. Which was amazing work, I’m not denying that! ā†©ļøŽ

  2. I’ll still be arguing that having frozenlist, frozendict, etc. types will be a better approach, but at least we’ll have real use cases to evaluate the options against. ā†©ļøŽ

7 Likes

What definition of immutable are you using? I would say this makes sense with something that is functionally immutable like a frozen dataclass but not truly immutable (like a str). But the list of types that is truly immutable in python is quite short I think, None, bool, bytes, str, int, float and tuple (based on the concurrent.interpreters docs).

Numpy arrays can already be made read-only which I suspect would meet the requirements of this PEP, so not sure what other support would be needed but they probably wouldn’t need to add a new type.

I think the answer is ā€œthey’re not different because to pass an object between subinterpreters the current implementation requires them to be copied almost always and the objects are then distinct and unrelatedā€ but that has a performance impact and a memory impact.

Seconding this, I had assumed this was part of the PEP my apologies.

I think that’s right although I gave the example of Fraction above and it doesn’t work with that. My general question from post #3 remains unanswered:

I think that if you cannot freeze Fraction then chances are high that many existing Python classes that cannot be frozen. That means that a library providing types that you might want to freeze would have to add new freezable types.

NumPy allows creating read-only views into an array:

In [2]: a = np.array([1, 2, 3, 4])

In [3]: b = a[:]

In [4]: b.flags.writeable = False

In [5]: b[0] = -1
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 b[0] = -1

ValueError: assignment destination is read-only

In [6]: a[0] = -1

In [7]: b
Out[7]: array([-1,  2,  3,  4])

As long as this is not an object array then this does not compromise the isolation of the object graphs but it is definitely possible to have a data race with this.

1 Like

Agreed, I tried the current reference implementation and the first thing I installed (a debug library I wrote that uses param spec) wasn’t freezable because it uses paramspec in a decorator.

Where I had problems, but explained in next comment from Jelle.

On the free-threaded build, there’s a number of tricks used for thread-safe reference counting, atomicity is one of them. But, subintepreters are supported on the default build which doesn’t have atomic reference counting, so it’s a moot point.

It hasn’t been totally clear to me how the proposed atomic reference counting is going to be implemented; as I said before, it’s been tried and failed a number of times up until PEP 703 (which makes the tradeoff of breaking the stable ABI).

1 Like

I’d like to try to tackle a lot of things in a single post. Maybe this post is too long, but I don’t see how to fit everything into separate replies in a meaningful way.

First, it is becoming pretty clear to me that we should have scoped this PEP better. Thanks especially to Paul Moore for making that clear. We are in the process of updating the PEP so that it includes the ability to share immutable objects across subinterpreters by reference rather than having this as a separate PEP, which we intended for the same Python release anyway. This means that two things (which we have already implemented and will make sure ends up in the prototype) have to be added as well: support for managing immutable cyclic garbage without cycle detection and support for atomic reference counting for immutable objects. We will ping here as soon as the PEP text is updated.

Below I will try to explain the big picture for the immutable objects in PEP795, plus why we need to have strictly enforced immutability to be able to share immutable objects across subinterpreters, and also what subinterpreters and free-threaded Python stands to gain from a performance standpoint from immutability (and more).

Big picture of PEP795

I’ll start with the big picture. PEP795 builds towards a data-race free programming model for Python. What we envision beyond PEP795 is a Python that does not permit concurrent mutation, meaning it should not be possible for two threads to mutate an object at the same time. The same property falls out of properly synchronised programs — we simply want to enforce that and throw an exception if the program fails in this regard. Ensuring that a program is properly synchronised is hard, and debugging things like data races is often very challenging. Our goal is to ensure that programs that could data-race throw an exception instead.

The way we propose to enforce data-race freedom is through the ability to create groups of mutable objects which we call regions. Regions are isolated, which is a bit hard to define precisely at this point as I haven’t introduced all the moving pieces yet. For now, we can define isolation as ā€objects inside the region only point to other objects inside the same regionā€. For now, all objects in a Python program belong to some region.

A region is owned by a thread and unless a thread owns a region, it cannot access the objects inside it. If you want to share a region between threads it must be wrapped inside a lock, which ensures that only one thread at a time owns the region. Trying to release a lock for a region while hanging on to direct references into the region throws an exception. (Otherwise, two threads could end up mutating the same object at the same time.) In the rest of this text, I’ll use ā€lockā€ to refer to a lock object that manages the ownership of a single region. (It is possible to extend this model with a read-write lock which permits multiple threads to read objects at the same time, but not mutate them.)

PEP795-style immutability extends this model by introducing immutable objects that live outside regions. Since they live outside of the regions, they can be accessed by anyone at any time. They can be accessed across threads, and importantly, objects in regions are permitted to reference immutable objects.

Analogous to PEP795, if you have a class Foo and you want to have instances of Foo in different regions, then you must first make Foo immutable. To simplify this, we propose that certain types of objects should support implicit (automatic) freezing. For example, if a type Foo is shared across two regions it would become immutable automatically. Objects that seem sensible to freeze automatically include types (modulo possible opt-in), strings, tuples, and numbers. (Note that the read-write lock mentioned above does not allow sharing of types between regions.)

Regions with locks add an escape mechanism to immutability that PEP795 does not discuss. Freezing does not propagate through locks. Thus, if you want an immutable object to contain some mutable state — for example, let’s say you want to have a counter inside an immutable class that counts how many instances we make of this class, you can do this by sticking a lock with a region inside in the immutable class to hold your counter.

Regions are created explicitly, like r = Region(). Additionally, each thread has a ā€œdefault regionā€. The default region is where objects reside after being created. This region is essentially a thread-local storage which also contains all the stack frames of that thread. Variables and objects in the default region are permitted to point into the regions that the thread owns. This means that objects are thread-local when created.

Objects live in the default region until some other region creates a reference to that object. Then the object will be moved from the default region into the other region. (This operation may propagate if the moved object contains a reference into the default region.) Moving an object out of a region other than a default region is supported but is slightly more involved. (It involves disconnecting the object or objects you want to extract from the remaining objects in the region.) A region can be merged into another region and regions can also be nested.

During the Python Language Summit in May this year, we presented this bigger picture. The slides can be found here and we also have a recorded version of the presentation here. You can specifically look at slides 9-17 which shows much of the stuff described above, with some minimal syntactic ā€œexamplesā€. I should point out that the move and share keywords used in the presentation were mock syntax that we invented for that presentation.

Interaction with the subinterpreters model

I will begin by describing subinterpreters, because that will exemplify some of the problems of concurrency when there isn’t a single GIL. This will show things that free-threaded Python has solved in the process.

A big benefit of our design is that it permits subinterpreters to directly share objects by reference. The subinterpreters model currently relies on the subinterpreters being isolated — if you want to share an object, you have to pickle it and send across. This programming model is warranted by the fact that each subinterpreter uses a GIL to ensure that it can manipulate reference counts safely (in a way that makes them appear atomic in the program), and that its GC will only encounter objects created by that subinterpreter.

Unpacking subinterpreter isolation

Let’s unpack this further with an example. Assume that we were able to directly share an object O by reference across two subinterpreters, and that both subinterpreters call a function F on O. Now, both subinterpreters will execute something like the following:

_tmp = O.F # get a reference to the function object for F
_tmp.RC += 1 # increase the reference count on the function object
_tmp(…) # call the F function
_tmp.RC -= 1 # decrease the reference count

If we zoom in on the first reference count operation, it really looks something like this:

_tmp_rc = _tmp.RC
_tmp_rc = _tmp_rc + 1
_tmp.RC = _tmp_rc

That is to say that this operation is not atomic. Both subinterpreters use their own GIL to ensure that no other thread on the same subinterpreter is able to mess with this critical section. However, since they use different GILs, they do not synchronise with each other. Let’s assume that the reference count of O.F is 1. I’ll use a _1 or _2 suffix on the _tmp variables to denote the different variables on the different subinterpreters and prefix each line by (1) or (2). It is possible that the two subinterpreters effectively execute their statements in the following global order.

(1) _tmp_rc_1 = _tmp.RC # _tmp_rc_1 holds 1
(1) _tmp_rc_1 = _tmp_rc_1 + 1 # _tmp_rc_1 holds 2
(2) _tmp_rc_2 = _tmp.RC # _tmp_rc_2 holds 1
(2) _tmp_rc_2 = _tmp_rc_2 + 1 # _tmp_rc_2 holds 2
(1) _tmp.RC = _tmp_rc_1 # writes 2 into _tmp.RC
(2) _tmp.RC = _tmp_rc_2 # writes 2 into _tmp.RC

As the example above demonstrates, because both subinterpreters read the same initial RC from O.F, one RC increment get’s lost. Since we decrement the function’s RC after the call, we might bring the RC down to zero (unless we suffer the same race coming back).

When I said it was possible for the above to happen, I was not referring to a remote possibility. This is actually quite likely to happen because reference count manipulations happen so frequently. (Indeed, we suffered this bug early on to the point of hello world-size programs not being able to run.)

Does atomic reference counting fix the problem?

No. While making the _tmp.RC += 1 (etc.) operations atomic would address the problem with incorrect in the example above above, this happens to be the case only because both threads are reading O.F. (There is also an additional problem with respect to cycle detection/GC that we will return to later.) To illustrate why atomic reference counting does not work in all cases, let’s change the example, and see what happens if two subinterpreters share an object O, where one writes to the field O.F at the same time as the other reads O.F.

Subinterpreter 1 does:

O.F = y

Subinterpreter 2 does:

x = O.F

Like before, let’s zoom into what happens in Python due to these two statements:

Subinterpreter 1:

y.RC += 1
_tmp = O.F
O.F = y
_tmp.RC -= 1

Subinterpreter 2:

x = O.F
x.RC += 1

Let’s assume that the RC manipulations are atomic, so there are no problems there. Assume the reference count of O.F is 1 from the start. Here is a situation that could occur, again writing statements in temporal order, and prefixing each line by (1) or (2) to show which subinterpreter is running.

(1) y.RC += 1
(1) _tmp = O.F
(2) x = O.F # x is a valid pointer here (*)
(1) O.F = y # (**)
(1) _tmp.RC -= 1 # here O.F’s RC hits zero, O.F is deallocated, and x is no longer valid
(2) x.RC += 1 # this write can write to something that is not an RC, and corrupt the heap (use-after-free)

So, even though the reference count manipulations were atomic, this program just wrote a value into memory which could be in the middle of a newly allocated object etc. If we ran the above example in free-threaded Python, it would avoid the use-after-free by the per-object locks that ensures that the operation marked (*) cannot happen before the line marked (**) when the two statements x = O.F and O.F = y are racing.

PEP795 will permit immutable objects to be shared across subinterpreters. By enforcing immutability rather than relying on programmer discipline, we can be sure that the 2nd example (the read-write race on O.F) will not happen — it will result in an exception being thrown rather than the use-after-free. Thus, from a reference counting standpoint, we can safely share PEP795-style immutable objects across subinterpreters as long as we ensure that reference count manipulations on immutable objects are atomic.

For mutable objects the situation is even simpler. Because the contents of a region is only accessible to one thread at a time, neither the read-read race nor the read-write race are possible. So it will be possible to share mutable objects directly across subinterpreters using regions without any changes to Python’s reference counting.

Now, let’s bring back the foreshadowed additional problem with cycle detection / GC. Each subinterpreters expects to be able to manage its own heap in isolation. Every Python object is part of a doubly-linked list which is used by the GC. When we start sharing objects across subinterpreters, concurrent mutation could lead to a subinterpreter’s list to be corrupted. Free-threaded Python avoids concurrent mutation corrupting this list by switching out Python’s allocator and relying on mimalloc’s heap walk. In PEP795, we address this problem by removing frozen objects from cycle detection. This will not leak memory because we can use a clever trick where each strongly connected component of immutable objects can be managed by a single reference counter. Thus, we regain the possibility of detecting cycles in immutable garbage without the need to trace the immutable objects.

In our proposal, because of region isolation, (almost) every cycle will be contained within a single region. (This is analogous to, but more fine-grained than, how subinterpreters work today – each subinterpreter can collect cyclic garbage in its isolated heap individually from the others.) Thus, we can make cycle detection incremental by checking individual regions, and in principle also without even stopping the program if we are doing cycle detection in regions which are not currently in use.

Interaction with free-threaded Python

Free-threaded Python has already fortified the interpreter against both read-read and read-write races so with free-threaded Python, our proposal does not enable new things that a program can do, but we do unlock many performance optimisations. The fortifications like atomic reference counting and per-object locking all come at a cost (hence the biased locking and deferred RC), and it would be possible to lower these costs relying on the invariants that the regions bring. With respect to cycle detection / GC, Stop-the-World (STW) pauses could be made shorter (or in the best cases avoided) by operating on individual regions instead of on the whole program. Also, when operating inside a region, per-object locks and atomic reference counting is not needed. When operating on immutable objects,

Why immutability through static typing is not enough for this PEP

Currently, the Python type system is not really expressive enough to express the immutability that this PEP proposes. But even if it was, as Python’s type system is (intentionally) weak, there is no guarantee that just because a program passes type checking, it behaves according to the types. Some parts might drop out of type-checked land, some parts might use reflection, etc. Bottom line, checking immutability through types is – from the perspective of the implementation on-top of subinterpreters – no different than simply trusting that programmers do not mutate shared objects. The second that happens, we are wide open for the kinds of problems that were discussed above. (We are back to why free-threaded Python adds e.g. per-object locks.)

Benefits outside of new capabilities or performance

Above, we focused a lot on new ā€œcapabilitiesā€ (e.g. sharing objects by reference in subinterpreters) and performance (e.g. removing unnecessary locks and atomics in free-threaded Python). Ultimately – we believe that our proposal will make it easier to write concurrent Python programs, and that some of that is already enabled in PEP795. We have carefully designed it so that single-threaded programs will not observe any changes (all objects are inside the default region) unless it starts to explicitly create multiple regions.

5 Likes

I have tried to answer some of your questions here: PEP 795: Add deep immutability to Python - #90 by stw

+1 – these two PEPs will be merged.

Just a FYI: As long as the requirement to freeze function objects stays, I am going to stay loudly opposed to this PEP - at this point I convinced myself that this step alone is so damaging to the python ecosystem as to be intolerable.

(Alternatively, convert a decently sized & popular pure-python library to be fully compatible with this PEP with a very small amount of code changes - that might convince me as well)

2 Likes

Thanks. BTW, you don’t need to add an explicit reply - the linked post appears right before the one I quoted when I read the thread :slightly_smiling_face:

That presentation feels a lot better, although I did only skim the later parts of the post, which was mostly technical details as far as I could see. I’d strongly recommend further refining the proposal to start with an overview that focuses solidly on why an end user would want this, what benefits it provides, and how it’s better than what we have now.

That overview seems to be "Subinterpreters will be able to share objects. And the per-subinterpreter GIL can be removed. (I’m not 100% sure if the second point is true).

Which is great, but begs a number of questions, which I think the PEP should answer, long before getting into technical details.

  1. How does this approach compare with the approaches that have been discussed during the evolution of the subinterpreter feature (CSP-like mechanisms, queues, etc)? How does this compare with the current intentions for improving sharing with subinterpreters?
  2. How will this affect C extensions? My understanding is that making a C extension ā€œmultiple interpreter readyā€ already involves some work. Does this proposal add to that work? Do we really want to increase the bar for C extensions supporting subinterpreters, this early in the introduction of the feature?

There’s also the questions from this thread which are still unanswered:

  1. How will global state like sys.modules work in your region-based model? Will I still be able to write code that optionally imports a module only when needed for a particular feature?
  2. How does someone write a class that is freezable? @oscarbenjamin keeps using the Fraction class as an example, and I don’t think I’ve seen an answer to that yet.
  3. How can a developer know the ā€œblast radiusā€ of freezing an object to make it shareable?

All of these are questions that I (as a developer, who won’t be involved in implementing this PEP) want to understand, in order to form an opinion on the proposal. The technical details are important, of course, but they should be later, in an ā€œimplementationā€ section or similar - not as part of the motivation or specification.

1 Like

I’m not quite this strongly opposed, but I do think it’s imperative for the PEP authors to explain clearly why freezing won’t have the wide-ranging effects people are concerned about.

Also, what about ā€œimplementation levelā€ mutability? If functools.cache is written to be thread-safe, is a cached copy of a function (which is otherwise immutable/frozen) safe to share? Because ultimately we’re trying to improve performance here, and disabling caching goes directly against that goal.

2 Likes

Just a potential footgun for subinterpreters.

If a module has a mutable configuration side effect (like logging say), and one interpreter imports and configure logging in one way and another interpreter configures logging in a different way, if logging is then frozen and passed to a different interpreter (because normally interpreters have separate sys.imports) this would mean depending on how logging is called it would behave differently. And I think that could be a very hard thing to teach. That specific example is probably a bit crazy but you can see where I’m going hopefully.

I see the benefit this has for subinterpreters. It may not be a big deal for most people as I think for threads it’s overkill, but there’s enough interest in subinterpreters to make it into the standard. I’m not opposed on that basis but it is up to other people if it’s right, might be good to hear from the author of the interpreters pep if they’ve not already commented (I’m on my phone rn so don’t know)

1 Like

Any code that uses logging could never be frozen, that is far to much intentionally mutable state (like, idk, stdin or file objects).

So while this example isn’t a danger in the way you think it is, it still hints towards a major problem.

No, I know logging isn’t a good example I just couldn’t think of anything more coherent in 35 degrees heat :laughing:

TLDR:

  • The roadmap for regions seems to include adding move semantics to python which seems unpleasant.
  • I think this would be better as a broad set of immutable utilities like:
    • Entirely separate, frozen imports, types, builtins and such
    • queues to pass only frozen objects
    • a thread type that only allows frozen data be passed around

/TLDR

Some ice cream later, I think my point is that this proposal ā€œbreaksā€ some of the isolation of multiple interpreters in ways that might not be intuitive even to people using multiple interpreters. I can see definite advantages to this though.

I’d been thinking of this from a threading perspective where every thread shares the GIL (or runs free) and all global state is shared, now I’m thinking a bit more on the multiple interpreters side.

My understanding of multiple interpreters is limited, but my experience with concurrency is good, mostly c++ task pipeline/actor model style stuff. I see a few primary use cases for multiple interpreters:

  • ā€œsubinterpretersā€ where one main script starts each interpreter for a task based pipeline. This allows concurrency without needing to remove the GIL. This would work well for the actor model of concurrency.
  • ā€œseparated interpretersā€ where you might be working with conflicting dependencies within the same process. Each interpreter uses a different sys path.
  • ā€œseparated processesā€ where each interpreter is perhaps serving client specific resources like a website with different branding. This means that modules with mutable state can be imported multiple times without needing to use true subprocesses with the difficulty of communicating inherent there.

The above works because each interpreter has its own GIL (maybe), import state and builtins. As they’re isolated there’s no way to share mutable data so imports are duplicated, code objects are duplicated, etc as at runtime that’s all theoretically mutable. This is intentional and one of the benefits but has a drawback - memory & performance.

The proposed freeze mechanism would create truly immutable data that can be passed freely. But it’s just that - it has to be passed from one interpreter to another (even if just passing by reference to some kind of main function). From the perspective of multiple interpreters this would solve the duplicated data/imports problem, give better cpu cache behaviour etc, but feels clunky. For interpreters it seems like this would be better managed by some kind of ā€œshared interpreter importā€, so a mechanism that imports modules/classes/functions available to all interpreters then freezes them (using exactly the proposed freezing behaviour). Again, explicit is better than implicit.

The other problem this proposal solves is that instances of user created objects can also be frozen and then passed freely. This is currently not possible in multiple interpreters as python objects need to be pickled, essentially a full copy. But as I understand it pickling/unpickling may break as a result of the intended benefits of multiple interpreters - perhaps one interpreter is using a sys.path that does include the necessary import for a particular object, perhaps another is using some conflicting sys.path. If so I hope they know what they’re doing.

So I think I might have just talked myself out of this a little bit. I think the functionality is desirable but the way its represented doesn’t align with related projects well.

Perhaps this might be better done as something like a concurrent.immutable module with the following:

  • A set of immutable builtins that would allow replacement of a module’s __builtins__
  • A separate import mechanism that imports modules/types/functions then freezes them, optionally after some configuration modification is applied.
    • Frozen types would include information about the sys.path value, etc used to import them
    • This allows code modules to be shared between multiple interpreters without needing to duplicate memory.
    • This also allows threaded code to be 100% certain calls to modules have no unintended side effects, while separating it from the standard import mechanism. Transient imports would then use this same separated, isolated, frozen import mechanism. Failures of this import might highlight potential threading issues.
  • Creation of truly immutable instances using the proposed freeze mechanism, but limited to objects created from these immutable imported types. This would limit the blast radius by separating frozen code from non-frozen code while at the same time if your entire program is immutable not duplicating any memory/code objects.
  • A FrozenQueue type that only allows passing frozen objects, and works with multiple interpreters
  • Perhaps some kind of object-pub/sub ReadProxy to allow for a non-frozen object to be modified with updates broadcast to listeners that are retrieved when read.
  • A new thread type that only allowed frozen data inputs/frozen functions, allowing optimizations that rely on regions to apply to free-threading.
  • Caching in the stdlib could be done on a per-interpreter basis. I assume existing caching mechanisms are safe for free-threading.

The benefit is making freezing an explicit first step rather than an ad-hoc, can-happen-anywhere step which doesn’t sit well with me. It allows multiple interpreters to keep isolated imports while benefiting from reduced overhead where appropriate. The blast radius is explicit and for threading is an extra, opt-in safety. I’m not sure how mixing frozen and non-frozen types would work but for me I’d probably opt in to freezing my entire program (if it was easy and worked) so I don’t mind.

Re other comments:

Slightly conflicting statements even before free-threading. But I get it, trying to make a rust-style ownership model usable in python. I’m not a big fan of how rust implements it as there’s a lot of line noise to achieve the explicit ownership but I’m a c++ kid so who am I to talk. I do think tools to do this are good however. The above frozen import mechanism means that each interpreter/thread has an owned region of mutable imports, objects and such, and shared immutable.

This seems like it would require new move semantics and I think that’s not been highlighted enough. That would take a while to be adopted. That’s why multiple interpreters currently copy everything. I don’t see the benefit of being able to transfer ownership in python. Because everything is a reference, just create a new instance that references some of the old values. This in theory might slow performance in large data, but if concurrency is your goal then you just became single threaded anyway. See panda’s copy on write for what I mean.

If you had a new thread type that only accepted immutable data and only called frozen functions or had thread-local mutable imports, you could probably get a similar level of isolation without the move semantics.

I feel like your example is a big simplification. Sure, read/modify/write atomicity might not solve the problem you show but compare-and-swap instructions would and are still lock-free and atomic. But complicated.

This requires move semantics and reference counting errors and all manner of changes that I don’t think play well with python. IE to transfer object id 123 with local references x and y to another region would require action at a distance that invalidates all local identifiers for a value, or manually deleting y before moving x? I’d also say it’s quite difficult to construct an object that doesn’t contain objects referenced externally and so even if object id 123 has a single reference it contains objects with non-1 references and therefore can’t be moved unless its frozen. In which case just freeze it.

This is really confusing. If any object in the cycle is immutable then all objects in the cycle are immutable, therefore the cycle is removed from cycle detection and all objects in the cycle use a common refcount. If objects reference objects in another region, surely they need to be immutable so the cycle detection problem across regions is a non-problem?

I fully agree static typing doesn’t solve the problem for multiple interpreters, however…

This model is unlikely to be adopted universally so the existing model of multiple threads must keep working so this goes back to my point that you’d need a thread class that explicitly opts in to using this frozen code approach which would allow these optimizations.

2 Likes

I can understand what you mean by this. If this immutability is part of the object model and data races mean memory corruption then Python’s type system is not strict enough to provide the guarantees wanted for memory safety here.

You also need to see this in reverse though: a Python programmer needs to write code that works with these immutable objects and there is a good chance that they will want to track immutability in the static type system even if the runtime tracks it in a different way.

The claim in the PEP is that the mechanism here means that data races from shared mutable state don’t happen because either the state is not shared or mutation results in an exception. Having an exception is better than a memory corrupting data race but is still not really the desired goal of the Python programmer: they want their code to work without raising data races, memory corruption or exceptions. They don’t want to guard everywhere defensively that a list may or may not be an immutable list meaning that a bunch of its methods don’t work even if isinstance(x, list) holds and a static type checker has confirmed that it is a list.

This is part of why I think in practice you would freeze objects of types that are already considered to be immutable e.g. you freeze a tuple rather than a list. Then runtime checks with isinstance and static type checkers can tell the difference between what is mutable or not and whether or not it is valid to call x.append(). The idea that you take any current mutable type and freeze some of its instances seems like it would lead to confusion everywhere.

Maybe the freezing mechanism does not depend at all on static typing but there will still be a need to represent the immutability it implies in both the runtime and static type systems.

4 Likes

There’s a lot of interesting info here, but the main impression I get is that if we’re at the stage of hashing out these big-picture ideas, it’s too early for a PEP proposing concrete behavior changes. These ideas may be motivating ideas for the PEP, but certainly on reading all this I was rather bowled over because none of this is really in the PEP except in the ā€œfuture extensionsā€ part. I think the PEP needs to include justification of these changes independent of all this data-race-free stuff, or else the PEP needs to grow substantially larger and incorporate the entire data-race-free plan. (I guess is this is basically @pf_moore 's point about not separating the costs and benefits, although I’m not sure if the separation I see is really costs and benefits so much as just. . . different concepts.)

In that vein, I would like to draw attention to two things:

That is a huge red flag to me and suggests the PEP is premature. Your subsequent text in your comment goes on at length about how regions will be used. But how can the impact, cost, and benefit of the proposal be evaluated when this fundamental concept remains nebulous? In the first place, the PEP doesn’t actually include the whole data-race-free plan, so it’s sort of asking people to take it on faith that these freezing semantics will support that. But then the data-race-free plan also doesn’t include a clearly defined notion of isolation, so that’s another leap of faith about whether things will all fall into place once region isolation is worked out. It just seems to me like the cart before the horse.

Then there is this:

The worrisome part to me is that despite all this discussion, these concrete questions are still unanswered. If the notion of deep immutability is fully nailed down, these should be easy questions to answer, with clearly defined behavior. The fact that we still have no answers, and instead still discussion about big-picture concepts like regions and isolation, suggests to me that these ideas need some more working over in this discussion space.

To clarify, I appreciate the work that’s been done here, and down the road it would be great if these plans could come to fruition. I’m certainly a believer in thinking big and envisioning large-scale changes! :slight_smile: But I think in order to get there we need a more clear view of where we’re headed, and right now this PEP seems to be taking a first step into the mist toward an unseen destination. Your long post gives some glimpse of that destination, but I’d say that discussion needs to be brought into the loop here so that everyone can see the connection between the concrete proposal and the eventual goals.

7 Likes