Ok, perhaps I can give a few examples to better illustrate the problems I am interested in.
1. Memory Model and Classic Loop Optimizations
Let’s start with a classic one for optimizing VMs. For the following code, let’s assume the functions thread1
and thread2
are executed in parallel on two different threads.
class Example1:
def __init__(self):
self.keep_looping = True
def thread1(self):
while self.keep_looping:
pass
print("Done")
def thread2(self):
time.sleep(10)
self.keep_looping = False
Which of the following behaviors would you ideally want to see:
- 1.1
Done
is never printed.
- 1.2
Done
is printed after 10 seconds.
2. Trivial Parallel Access to Objects
Again, let’s assume the functions thread1
and thread2
are executed in parallel:
class Example2:
def __init__(self):
self.x1 = 0
# ...
self.x16 = 0
def thread1(self):
for _ in range(100_000):
self.x1 += 1
def thread2(self):
for _ in range(100_000):
self.x16 += 1
What would you ideally want?
- 2.1
thread1
and thread2
run in parallel without any need for synchronization.
- 2.2
thread1
and thread2
need to synchronize somehow before accessing self.x1
and self.x16
respectively.
3. Trivial Parallel Access to Lists
Let’s assume two parallel threads as before:
class Example3:
def __init__(self):
self.list = [0] * 1_000
def thread1(self):
for _ in range(100_000):
self.list[0] += 1
def thread2(self):
for _ in range(100_000):
self.list[1_000] += 1
What would you ideally want?
- 3.1
thread1
and thread2
run in parallel without any need for synchronization.
- 3.2
thread1
and thread2
need to synchronize somehow before accessing self.list[0]
and self.list[1_000]
respectively.
4. Trivial Parallel Access to Dictionaries
As before:
class Example4:
def __init__(self):
self.dict = {'x1': 0, 'x2': 0}
def thread1(self):
for _ in range(100_000):
self.dict['x1'] += 1
def thread2(self):
for _ in range(100_000):
self.dict['x2'] += 1
What would you ideally want?
- 4.1
thread1
and thread2
run in parallel without any need for synchronization.
- 4.2
thread1
and thread2
need to synchronize somehow before accessing self.dict['x1']
and self.dict['x2']
respectively.
- 4.3
thread1
and thread2
only need some minimal check that the dictionary is “consistent”.
5. Parallel Access to Objects While Objects Change Shape
Expanding on Example2
, let’s define a class without any fields, and let’s add the fields dynamically in the parallel threads:
class Example5:
def __init__(self):
pass
def thread1(self):
setattr(self, "x1", 0)
for _ in range(100_000):
self.x1 += 1
def thread2(self):
setattr(self, "x2", 0)
for _ in range(100_000):
self.x2 += 1
Let’s change the question.
What do you think is a reasonable goal for practical implementation?
- 5.1
thread1
and thread2
run their loop, without any need for synchronization, and only need some form of synchronization to add the fields.
- 5.2
thread1
and thread2
need to synchronize every time before accessing self.x1
and self.x2
respectively.
6. Parallel Access to Lists While They Change Length
Expanding on Example3, let’s add two more threads that change the length of the list:
class Example6:
def __init__(self):
self.list = [0] * 1_000
def thread1(self):
for _ in range(100_000):
self.list[0] += 1
def thread2(self):
for _ in range(100_000):
self.list[1_000] += 1
def thread3(self):
for _ in range(500):
self.list.append(0)
def thread4(self):
for _ in range(500):
self.list.pop()
What do you think is a reasonable goal for a practical implementation?
- 6.1
thread1
and thread2
run their loop, without any need for synchronization, and only need some form of synchronization to change the length of the list.
- 6.2
thread1
and thread2
need to synchronize every time before list[0]
, list[1_000]
, list.append(0)
, and list.pop()
are executed.
- 6.3
thread1
and thread2
only need some minimal check that the list is “consistent”.
7. Parallel Access to Dictionaries Under Arbitrary Access
Adding two more threads to Example4:
class Example7:
def __init__(self):
self.dict = {'x1': 0, 'x2': 0}
def thread1(self):
for _ in range(100_000):
self.dict['x1'] += 1
def thread2(self):
for _ in range(100_000):
self.dict['x2'] += 1
def thread3(self):
for i in range(100_000):
self.dict[f"x{i}"] = 0
def thread4(self):
for i in range(100_000):
del self.dict[f"x{i}"]
What do you think is a reasonable goal for a practical implementation?
- 7.1
thread1
and thread2
run their loop, without any need for synchronization. thread3
and thread4
need to synchronize somehow before accessing self.dict[f"x{i}"]
and del self.dict[f"x{i}"]
respectively.
- 7.2 all threads need to synchronize before accessing the dictionary.
- 7.3
thread1
and thread2
only need some minimal check that the dictionary is “consistent”, but thread3
and thread4
need to synchronize somehow before accessing the dictionary.
Back to My Original Question
So, with these examples in mind, let me restate my original question:
Where do you see Python going or want it to go with respect to its programming model and the guarantees it should provide in the presence of optimizing JIT compilers and free threading?