Pre-PEP: Safe Parallel Python

I’d like to present an early draft of a possible PEP for safe parallel Python, which I hope will provide a means to allow in-process parallelism while retaining safety and without (much) loss of performance.

24 Likes

I’ll leave it to others to discuss the benefits of this proposal, but I have a few questions about the API.

As far as I can tell, there are four new proposed proprerties, all of which are special names/dunders:

  • object.__freeze__()
  • object.__protect__()
  • object.__mutex__
  • object.__shared__

From the python docs, “A class can implement certain operations that are invoked by special syntax (such as arithmetic operations or subscripting and slicing) by defining methods with special names.” Whilst there are certainly plenty of special names (mostly non-methods) which don’t correspond to operators (e.g. function.__closure__, object.__annotations__, function.__doc__, __slots__) , these are usually either very internal (e.g. __closure__) or have some other way to access them (e.g. get_annotations() or help(function)). Not to say that there aren’t exceptions to this (__slots__), but in general, special names are special for a reason.

However, when looking at the example, there seemingly isn’t a ‘standard’ way to access these properties.

If I’m learning Python and trying to make an object immutable to share it between threads, I’m expecting to find some kind of freeze() builtin, or a threading.freeze() function, and am probably not going to reach for a specially named attribute on the object itself, which seems like it shouldn’t be messed with (object.__freeze__()).

Particularly, it can make a perfectly normal function where I’m just trying to mutate some variables look very internal:

def func1(a, b):
    with a.__mutex__ + b.__mutex__:
        some_mutation_here()

Or, I might assume that object.__protect__() is meant to be overwritten/overwritable, like the special methods for operators. Now, maybe this is the intent? But it isn’t clear in the PEP.

object.__shared__ seems more like the kind of implementation detail that you would leave as a dunder though, so I don’t mind this.

Anyway, those are my thoughts. Have you had any ideas yourself, as to APIs for accessing the added special attributes/methods?

6 Likes

I tend to agree that it looks odd to be referring to dunder-things directly.

Would the __mutex__ be reentrant? As in, if I attempt to lock it again inside an already locked region, will it work, raise exception, or.. what?

It would be helpful to have a bit more depth in the examples. I’ve read the Thread safe tuple iterator example a few times but feel like I’m missing something. Maybe show how that object would be shared thread to thread (or maybe superthread to superthread)? Like cool: we have an object that uses the stuff in the pep, but I don’t see how I would use it after that.

1 Like

Hi Mark :waving_hand:

I like this PEP, in part.

I haven’t read it in full details yet, but I can say that I definitely liked the inspirations from the Pyrona project.

The part I’m quite against is the general sentiment that mutable shared objects are just :fire::enraged_face::fire:

I think the guardrails you lay down make a lot of sense and I would gladly see them in Python.
The problem is that mutable shared state can only exist, under this PEP, when guarded by a mutex.
Correct me if I’m wrong, but the special synchronized state is only assignable to built-in objects.
That would completely stifle development of concurrent data structures (that is, shared mutable objects) that can ultimately provide more thread safety guarantees and better multithreaded scaling than their sequential counterparts that have to be mutex protected.

I know this is a small niche, but it is my niche, so I’ll be vocal about it :wink:

We briefly talked about this issue at the last PyCon US, did you give it any more thought?
Honestly, I can only see AtomicDict working under this proposal if I can set a trust-me-bro flag/state, otherwise a concurrent hash table would be pointless to develop.

I wouldn’t like a solution where we say: we have concurrent data structures in the stdlib, so we don’t need this flag.
Because, even if we did, someone ten years from now could come up with a novel, better approach for some concurrent data structure, and there should be no reason why they can’t have that published on PyPI and be meaningfully useful.

Please, let me know if I’m missing something from the PEP that would enable this.

I know the counter argument is: how do I, the VM, know that what is marked as trust-me-bro can actually be trusted?
On one hand, we’re all consenting adults here.
On the other, we may wish to be stricter in order to provide more guards against data races, but as you write in the PEP, it “eliminat[es] most race conditions.”

For instance, if I read the ... in ThreadSafeList as including a __setitem__(), then it’s easy to write the usual my_not_so_thread_safe_list[item] += 1 data race.
If instead it’s an append-only list, then we can see how a programmer might rewrite it with __setitem__() and fall down to the same mistake.
(Better clarity in the examples would help.)
Under this PEP, data races can still exist at the upper level, where the mutexes are acquired and released, possibly at the wrong places.

(I’m saying this to elicit the synchronized state be available to normal humans as well, not because I think the PEP is unsound. I like the PEP and I believe it should move forward.)

6 Likes

Adding a freeze() function to the builtins makes sense.

I expect both __protect__ and __mutex__ to be used within classes that are specifically designed to handle concurrency in a robust way, they might not be used commonly enough to justify a new builtin.

I’ve not given much thought to ergonomics yet, just functionality.

Both __protect__ and __mutex__ could be overridden, for proxies maybe. The language allows it, and I don’t see a need to prohibit it.

1 Like

__mutex__ is not re-entrant. It would deadlock, but the PEP proposes adding deadlock detection, so it will raise an exception instead.

3 Likes

What about pep-795 (PEP 795: Add deep immutability to Python - PEPs - Discussions on Python.org)? As I understand 805 and 795 interleaves in immutability part. Or 795 will extend 805 for freeze?

1 Like

You make a good point about the ThreadSafeList not necessarily being thread safe. I’ll adjust the example.

I can only see AtomicDict working under this proposal if I can set a trust-me-bro flag/state

That’s the synchronized state, although I won’t be renaming it “trust me bro” :slight_smile:

The term “built-in” refers to any C/C++/rust function or class, not just the standard library.

>>> numpy.ndarray(1).max
<built-in method max of numpy.ndarray object at ...>
2 Likes

PEP 795 proposes deep immutability which is problematic, as freezing an object can end up freezing their module and all classes and functions in that module.

For now PEP 805 just proposes shallow freezing, but deep freezing is a possible extension. PEP 805 avoids the problem of freezing classes and functions by making classes and modules “stop the world” mutable, so deep freezing an object would not change its class or module.

5 Likes

Ahh ok, I see.
Sorry, I mistook built-in as meaning the built-in dict/list/…
Maybe add some section outlining how 3rd party extensions can make use of the changes proposed in this PEP?

With that aside, it’s a big +1 from me!
(Although, it would’ve been an even bigger +1 had the synchronized state been called trust-me-bro.)

5 Likes

I’m not sure whether making freeze() a builtin is the best move. Making a freeze() function, certainly, but I would think it should go in whichever library the Channel and SuperThread classes are going in - since it has related functionality. This is the same approach PEP 795 is taking.

Unless, of course, you expect that people will be using freeze() for various not-parallel-related purposes as well, which would make it less of a specialist function and adding it to the builtins would definitely make sense.

Fair enough - would you envision the Channel class being the main user of __protect__ or __mutex__? If not, it might be nice to include some stdlib batteries which do this job of ‘handling concurrency in a robust way’.

Currently Python lacks an officially supported way to mark an object as (shallowly) immutable. freeze() will be useful to indicate that an object should not and cannot be mutated and this information will be useful to type checkers and linters.

The main point of freeze() being a builtin, however, is that freeze() will require very extensive interpreter support and ABI breakage, which means that its significance is way more than something that just lives in a separate library. Freezing a list, dict or other builtin will be a fundamental operation that changes the behavior of that object even in single-threaded applications.

This pre-PEP avoids the biggest pitfall of PEP 795 : the object graph is not a well-defined concept in Python. What exactly are the objects being referred to by another object? In its __dict__, its slot descriptors, or in some dunders synthesized by the interpreter? Deep-freezing in PEP 795 exposes this implementation uncertainty into publicly-visible behavior with a huge blast radius in a possibly forward-incompatible manner (what if more slots and dunders are coming in the future?). All that object graph complexity is unnecessary for simple structured stateless objects such as fractions.Fraction, ipaddress.IPv4Addressand alike. To me it is entirely reasonable for now to suspend judgement on what the object graph really consists of and postpone deep-freezing until we first get shallow-freezing right.

13 Likes

I’d like to share my perspective regarding the proposed Safe Parallel Python API, specifically the use of dunder methods like freeze and mutex.

From a usability standpoint, exposing dunder methods directly to developers might be confusing. In Python, dunders are usually reserved for interpreter-level protocols, and users typically do not call them directly.

A more Pythonic approach could be to provide a library-level API or context managers. For example:

from concurrent.mutex import Mutex

counter = 0
mu = Mutex()

with mu.lock():
  counter += 1

For built-in data structures, I would propose creating concurrent-safe versions, similar to what we have in Java:

  • ConcurrentList
  • ConcurrentQueue
  • CopyOnWriteList
  • ConcurrentSet
  • ConcurrentDeque
  • AtomicInt
  • AtomicBool
  • AtomicRef
  • BoundedQueue
  • ConcurrentSortedDict / ConcurrentSortedSet

Maybe all of these could be part of a concurrent module in the standard library.
Explicit is better than implicit: adding thread-safety methods directly to built-in types creates implicit behavior, whereas providing separate concurrent-safe types allows the user to choose explicitly which type to use.


For user-defined types and functions, instead of protect:

from concurrent.sync import synchronized_class, synchronized_method

@synchronized_class
class ConcurrentClass:

    @synchronized_method
    def concurrent_method(self):
        ...

Or, in my opinion, a more explicit syntax:

sync class ConcurrentClass:
    sync def concurrent_method(self):
        …
  • sync class would mark all methods in the class as automatically synchronized with an internal lock.
  • sync def could be used for individual methods of any class.
    This makes thread safety explicit and obvious, rather than relying on users to manually add locks in each method.

Otherwise i can show the table of examples :

Keyword Example
sync sync class Counter:
sync def increment(self):
locked locked class ConcurrentList:
locked def append(self, x):
atomic atomic class Counter:
atomic def increment(self):
safe safe class SharedList:
safe def append(self, x):
guarded guarded class BankAccount:
guarded def transfer(self, amount):
protected protected class SharedDict:
protected def set_item(self, k, v):

Instead of __freeze__ for class attributes:

class Point:    
    final x: int = 0    
    final y: int = 0

Again, this follows the principle that explicit is better than implicit.

Maybe i forgot something, thank you for attention

__slots__ is, as far as I know quite internal too, as far as I know. (Sorry for answering like 12d later, I’ve forgot to send this reply).

From the Python docs glossary:

__slots__
A declaration inside a class that saves memory by pre-declaring space for instance attributes and eliminating instance dictionaries. Though popular, the technique is somewhat tricky to get right and is best reserved for rare cases where there are large numbers of instances in a memory-critical application.

So, sure it’s tricky to get right (internal?) but is still ‘popular’ (widely used, not very internal).

4 Likes

We can expect that the set of the use cases of __slots__ and those of __freeze__ / __protect__ to significantly overlap. In any case the introduction of these operations will be a large change to the Python data model.

Java is afaik one of forerunners of introducing concurrent constructs from the start of the language design, but how do these constructs actually fit into the context of Python? For one, every object in Java has its own intrinsic reentrant lock, but the per-object lock in Python are not yet proposed to be reentrant. Java being a static language also does not have moving parts like __dict__ or other auxiliary objects. And I have a feeling that these objects will become a problem. In the current proposal, does locking an object also lock on its __dict__? If so, there needs to be a mechanism to link the __shared__ state of a dict (and some other auxiliary objects) with that of a parent object. Given the existence of __dict__, implementing final in Python will be more work than “don’t change those 64 bits (or 32 bts on x86) on the heap“.

It may be possible to start small (such as only allowing objects without __dict__ to be frozen), but sooner or later we’ll run into issues with these objects again. It may be necessary to discourage or even prohibit certain operations (such as assigning to certain dunders, foo.__dict__ = some_dict or foo.__class__ = some_class).

3 Likes

Totally agree :+1:
If you would like to work on this, cereggii is welcoming contributors :slight_smile:

I think the point of freezing, as opposed to say Java’s final, is that you can modify an attribute as many times as necessary, before freezing it.
The freezing could also be performed outside the class, if that makes more sense for a program; for instance, say an object is now leaving a thread’s boundaries and should become immutable before it’s shared.
This can be a common use-case for an actor pattern where threads exchange objects through channels and the receiving actor shouldn’t be allowed mutations on the object-message.

I think you have Java in mind here? Your sync doesn’t seem to have much to do with the proposed synchronized object state. Or am I misunderstanding?

I don’t think we want to have synchronized methods like in Java, not that it’s a bad idea per se, but I think Mark was aiming for something different.
I think the primary purpose of the PEP is to make sure that only certain objects are shared, and can be shared, between threads.
That’s where ownership, immutability, and protection come in.

It’s an interesting question. I think they would fit rather well. For instance, this is an example of how to use a concurrent hash table in Python. I’m sorry but I lost your train of thought there, what are your concerns about concurrent data structures in Python?

1 Like

I’d put freezing in the same categoy as weak-referencing with regard to the user: whilst the user may try to create weakrefs on any object, not all objects are meaningfully weak-referenceable, some objects are persistent, and some can’t be weak-referenced at all. Freezing doesn’t just affect synchronization; it itself is a mutation of state and changes the behavior of the object permanently afterwards. Freezing should only be encouraged during certain points of the object lifecycle (before it is ever published to another thread) in a way that the library author intended.

Java having these constructs is a result of the need of its memory model of fame. The Python data model does not currently make guarantees in the same vein as the Java Memory Model, nor does the Python interpreter make the kind of aggressive optimization and JIT compilation that Java VM does. There are practical limits as to how much things can be optimized in a language as dynamic as Python, much of which relies on dicts and descriptors, and the end result of the Python JIT project will probably look different from that of JVM or othe language runtimes. This PEP in terms of performance is in effect giving up on a small amount of dynamism in the Python data model for the benefit of enabling better specialization strategy in JIT in face of free-threading. The kind of decisions and guarantees that we choose to make and not (like whether we should go as low-level as taking benefit of instruction reordering as JVM does) in Python will strongly affect the kind of concurrency constructs that we ultimately need.

I see your point — I didn’t mean to propose a literal Java-style synchronized keyword. My intention was more about explicitness: just like Python has async def as syntactic sugar over coroutines, we could imagine something like sync def (or locked def) as syntactic sugar over a decorator such as @synchronized_method.

On freeze: I agree that it’s not the same as final. My earlier example was more about static immutability. What you describe — freezing at certain lifecycle boundaries (e.g. before an object crosses into another thread/actor) — makes sense and is a different but complementary use-case. Both seem valuable, but they solve different problems.

So, the need for separate concurrent-safe structures will depend on the chosen memory/data model for python, right?