ThreadPoolExecutor parameter to request "one thread per logical CPU"

Now that we have builds with free-threading, maybe a one thread per logical CPU scheduling option for ThreadPoolExecutor would make sense to tackle problems that are CPU-bound.

Passing a true value in a non free-threading build would raise an exception.

If you don’t mind adding a dependency on psutil, you can do this right now without modifying the standard library or waiting for all your dependencies to support Python 3.15. Here’s how we do it in pytest-run-parallel, which also tries to get at the logical CPU count using APIs provided by the standard library if psutil isn’t available.

Sorry, I don’t see how it is related :sweat_smile:

What I’m thinking about is an automatic scheduling that submits a thread to a logical CPU that is not already running the task. It’s not actually related to the number of workers.

I don’t understand how that’s different from this:

with concurrent.futures.ThreadPoolExecutor(max_workers=get_logical_cpus()) as tpe:
    for _ in range(10000):
        tpe.submit(task)

where task is some worker function and I’m ignoring arguments to the task function for the sake of brevity. Won’t that supply work to the thread pool until the task queue is exhausted?

Maybe I don’t understand how ThreadPoolExecutor really works, but I believe the max_workers count says nothing about where the thread goes. It just says, at any given time, there are at most max_workers threads. That’s how it is for non free-threading, where only one logical CPU is used.

That isn’t quite true. For pure Python code that always holds the GIL, only one logical CPU is ever used at any given time, but ThreadPoolExecutor does launch separate OS threads that each independently run Python code. Whether or not the code actually runs on separate CPUs comes down to details of the CPU affinity and the OS scheduler.

If you call into code that releases the GIL, then you can get multi-CPU parallelism, even on the GIL-enabled build.

The free-threaded build doesn’t have a GIL, so even pure Python code can run on multiple CPUs simultaneously.

It sounds like you haven’t actually experimented with ThreadPoolExecutor on the free-threaded build. Maybe give is a try? Here’s an example based on generating the mandelbrot set with a pure Python generator: Mandelbrot threads - Python Free-Threading Guide

Yes, you are correct that I haven’t experimented with free-threading builds.

Does ThreadPoolExecutor in free-threading builds make sure not to start a thread on a logical CPU that is already running a thread for the same task when another logical CPU is not?

1 Like

Sorry, I noted “Whether or not the code actually runs on separate CPUs comes down to details of the CPU affinity and the OS scheduler.”.

Maybe this is something Python should not take care of. It’s something the OS must deal with.

1 Like

There seems to be a bit of a misunderstanding about how multiple threads/processes work. You can have a single core CPU and still have multiple threads and processes but the scheduler makes them take turns. If you have multiple cores then more than one can run at a time but the scheduler still makes them take turns on the available cores.

The actual number of processes and threads is generally a lot larger than the number of cores e.g. here on Linux I currently have 253 processes and (I think…) 743 threads running on a machine that only has 4 cores and yet CPU load is at 10% while I type into this website. Most of those threads are sleeping and the scheduler takes care of dispatching them to different cores when actually needed.

The allocation of “this thread on that core” is usually only a short-lived thing and I think not really something that ThreadPoolExecutor has much influence over besides just choosing how many threads to create.

3 Likes

os.process_cpu_count() could be useful.

Indeed you will get better performance if you leave scheduling to the OS.