Python - Python Interpreter with Multiprocessing & Multithread - Theoretical Clarification

vaejiang · April 9, 2022, 11:41pm

Hey Python Community,

I have two Python concurrent-related questions that want someone’s clarification.

Task Description:

Let us say I set up two py scripts. Each script is running two IO-bound tasks (API Calls) with multithreading (Max Workers as 2).

Questions:

1: If I don’t use a virtual environment, and run both scripts through the Global Python interpreter (the one in system-wide Python installation). Does this make the task I described the single process and multithread? Since we are using one interpreter (single process) and have two scripts running a total of 4 threads?

2: If I use the Pycharm to create two separate projects where each project has its own Python interpreter. Does such a setting turn the task into multiprocess and multithread? Since we have two Python interpreters running and each running two threads?

Thank you so much for your help, and I am looking forward to hearing from the community!

CAM-Gerlach · April 10, 2022, 12:24am

To address both of your questions, assuming you’re starting both scripts from the shell, systemd, etc, each script is running completely independently, with no unit of commonality between them, each running in a completely independent process with two threads each.

It makes no difference what Python executable you’re running from, whether it is the same location on disk, symlinked to the same (same virtualenv or system Python), symlinked to the same (two virtualenvs using the same base interpreter), or two totally different interpreter installations.

In memory, there are two independent copies of the interpreter running different scripts, which may or may not happen to be loaded from the same location on disk, but share no more connection than any other two processes you may launch by the same mechanism, unless you explicitly set one up just as you would do for anything else (with pipes, sockets, etc).

At least to a reasonable approximation, management of different Python interpreters on disk is almost completely orthogonal to communication between them in memory. You’re either going to be managing multiprocessing internally inside your Python (or C) code in whatever process you launch (e.g. threading, multiprocessing, asyncio, concurrent.futures, subinterpreters, etc), or you’re going to be doing so outside of Python itself (using subprocess, or via launching multiple processies via the shell, systemd, etc and then using sockets, pipes, dbus or various libraries to communicate between them). In the latter case, it doesn’t matter what interpreter you’re using unless you’re using a technique that requires binary compatbility between Pythons and data objects, e.g. pickle (rather than serializing to a common format like JSON, or using some standard protocol), which you generally want to avoid anyway, and in which case what is actually important isn’t using the same Python interpreter, but rather merely any compatible one.

Do you mean threading in the standard library? This would be a good choice for I/O bound tasks, as would asycio, a ThreadPool, etc.

steven.daprano · April 10, 2022, 8:49am

If you run this command:

python script_one.py

to start the first script, and then wait for it to finish, then run:

python script_two.py

and wait for it to finish, then you have two processes, each process has two threads, but only one process at a time.

If you arrange to run both scripts together, at the same time, then you have two processes with two threads each.

It doesn’t matter how you run them, whether you use PyCharm or not, whether they run at the same time or separately, you have two processes, one for each script.

Blackward · April 11, 2022, 12:39am

Then you presumably create two independent APPLICATIONS.

To start/run an INSTANCE of such an application, you start/run an INSTANCE of the python interpreter providing it the name/path of your main “.py” file of said application.

Every application instance runs in its own process. But if your application does not spawn SUBprocesses and it does not exchange data with other application instances too, we normally do not talk about “multiprocessing” (due to the independence). The term “multiprocessing” typically is used, if two or more processes exchange data while they are running. Have a look at Python’s (multiprocessing) “Queue”

which typically is used for said data exchange / communication.

But if your applications (.py-scripts) spawn one or more SUBthreads (e.g. as daemons), then we talk about “multithreading”. So the answer for your question might be: “multithreading”.

If you want to create IO-bound threads, have a look at Python’s “select” mechanism:

It is quite helpful for IO-stuff.

Furthermore you could be interested in the following enhanced lists/queues - which simplify working with threading/multiprocessing:

Have Fun,
Cheers Dominik

Topic		Replies	Views
Running a process inside multithreading while multithreading is uses 100% of CPU resources Python Help	1	1778	April 21, 2022
Thread Limits . . Python Help	7	15116	March 1, 2023
Understanding Python Multithreading Python Help help	1	731	October 5, 2021
PEP 734: Multiple Interpreters in the Stdlib PEPs	28	3303	April 11, 2024
Which parallelism way should I use with a webframework? Python Help	4	266	January 11, 2024

Python - Python Interpreter with Multiprocessing & Multithread - Theoretical Clarification

Related Topics