I don’t believe that asking for real-world use case comparison between threads and multiprocessing is nonconstructive.
Nonconstructive is hoping that removing GIL will solve parallelization problems, or thinking that GIL is stopping us from solving parallel problems.
Read Mark Shannon comment: PEP 703: Making the Global Interpreter Lock Optional (3.12 updates) - #9 by markshannon
For the purposes of this discussion, let’s categorize parallel application into three groups:
- Data-store backed processing. All the shared data is stored in a data store, such a Postgres database, the processes share little or no data, communicating via the data store.
- Numerical processing (including machine learning) where the shared data is matrices of numbers.
- General data processing where the shared data is in the form of an in-memory object graph.
Python has always supported category 1, with multiprocessing or through some sort of external load balancer.
Category 2 is supported by multiple interpreters.
It is category 3 that benefits from NoGIL.How common is category 3?
All the motivating examples in PEP 703 are in category 2.
Look closely at performance overhead numbers. Also, we don’t know what the overhead will be in the algorithms that we hope to benefit from GIL removal, because I haven’t seen any number yet (If I have missed any number, please show me). In other words, would this benefit outnumber the performance overhead.
Frankly speaking, why should I have to pay more for something I don’t use? It’s like paying a toll for a road I never use. If it weren’t for the performance overhead, we wouldn’t be discussing this at all, and the PEP would have been accepted quietly.
If you can demonstrate a GIL removal design without any overhead, I would gladly put ‘NOGIL’ on a t-shirt and wear it proudly all year round.