I tried to do something like that above but a) I didn’t test it very well and b) I didn’t explain it well. I think those use who reach for multiprocessing
on a regular basis [1] are a bit dazzled by the possibilities here and not taking the time to make a argument for general python.
I don’t think the idea needs a keyword–it’s just a parallel version of itertools (a Python equivalent to Rust’s rayon). These functions don’t protect you from shooting yourself in the foot but for lots of common tasks (like “iterate over a million things and do a thing to each one”) it should be a drop-in replacement.
I realized my previous example was partially hamstrung by using a 2-vCPU machine (which is one physical CPU). On an n2d-standard-8
(4 physical CPUs), I can use the parmap
defined in the previous post to parallelize a map operation and get about a 4x speedup vs python 3.11.
Test details
I cloned nogil 3.12 and built it, and installed python 3.11 from conda-forge (I could clone python 3.12 and build that, but this was simpler). In each I just ran an interpreter and defined:
>>> from concurrent.futures import ThreadPoolExecutor
>>> from time import time
>>> def parmap(fn, *args, n=8):
... with ThreadPoolExecutor(n) as exc:
... yield from exc.map(fn, *args)
...
>>> def a(n):
... for i in range(n):
... i += 1
...
>>> m = [1_000_000] * 100
Then I just timed the difference between list(map(a, m))
and list(parmap(a, m))
Results:
command | py311 | nogil | nogil / py311 |
---|---|---|---|
a(1_000_000) |
0.0214 | 0.0231 | 1.08 |
list(map(a, m)) |
2.123 | 2.577 | 1.21 |
list(parmap(a, m)) |
2.191 | 0.545 | 0.249 |
It’s curious that the plain map
is so much slower when the single function call is only 8% slower [2] but I hope this demonstration is somewhat useful for illustrating why nogil
has the potential to be broadly useful.