PEP 703: Making the Global Interpreter Lock Optional (3.12 updates)

I tried to do something like that above but a) I didn’t test it very well and b) I didn’t explain it well. I think those use who reach for multiprocessing on a regular basis [1] are a bit dazzled by the possibilities here and not taking the time to make a argument for general python. :star_struck:

I don’t think the idea needs a keyword–it’s just a parallel version of itertools (a Python equivalent to Rust’s rayon). These functions don’t protect you from shooting yourself in the foot but for lots of common tasks (like “iterate over a million things and do a thing to each one”) it should be a drop-in replacement.

I realized my previous example was partially hamstrung by using a 2-vCPU machine (which is one physical CPU). On an n2d-standard-8 (4 physical CPUs), I can use the parmap defined in the previous post to parallelize a map operation and get about a 4x speedup vs python 3.11.

Test details

I cloned nogil 3.12 and built it, and installed python 3.11 from conda-forge (I could clone python 3.12 and build that, but this was simpler). In each I just ran an interpreter and defined:

>>> from concurrent.futures import ThreadPoolExecutor
>>> from time import time
>>> def parmap(fn, *args, n=8):
...     with ThreadPoolExecutor(n) as exc:
...             yield from exc.map(fn, *args)
... 
>>> def a(n):
...     for i in range(n):
...             i += 1
... 
>>> m = [1_000_000] * 100

Then I just timed the difference between list(map(a, m)) and list(parmap(a, m))

Results:

command py311 nogil nogil / py311
a(1_000_000) 0.0214 0.0231 1.08
list(map(a, m)) 2.123 2.577 1.21
list(parmap(a, m)) 2.191 0.545 0.249

It’s curious that the plain map is so much slower when the single function call is only 8% slower [2] but I hope this demonstration is somewhat useful for illustrating why nogil has the potential to be broadly useful.


  1. people using python for science, finance, ML, etc ↩︎

  2. which is still nothing to ignore, for sure ↩︎

6 Likes