ThreadPoolExecutor parameter to request "one thread per logical CPU"

adelfino · January 20, 2026, 10:46pm

Now that we have builds with free-threading, maybe a one thread per logical CPU scheduling option for ThreadPoolExecutor would make sense to tackle problems that are CPU-bound.

Passing a true value in a non free-threading build would raise an exception.

ngoldbaum · January 20, 2026, 10:55pm

If you don’t mind adding a dependency on psutil, you can do this right now without modifying the standard library or waiting for all your dependencies to support Python 3.15. Here’s how we do it in pytest-run-parallel, which also tries to get at the logical CPU count using APIs provided by the standard library if psutil isn’t available.

github.com/Quansight-Labs/pytest-run-parallel

src/pytest_run_parallel/cpu_detection.py

524862833


      
          def get_logical_cpus():
              try:
                  import psutil
              except ImportError:
                  pass
              else:
                  process = psutil.Process()
                  try:
                      cpu_cores = process.cpu_affinity()
                      return len(cpu_cores)
                  except AttributeError:
                      cpu_cores = psutil.cpu_count()
                      if cpu_cores is not None:
                          return cpu_cores
          
              try:
                  from os import process_cpu_count
              except ImportError:
                  pass
              else:

This file has been truncated. show original

adelfino · January 20, 2026, 11:03pm

Sorry, I don’t see how it is related

What I’m thinking about is an automatic scheduling that submits a thread to a logical CPU that is not already running the task. It’s not actually related to the number of workers.

ngoldbaum · January 20, 2026, 11:07pm

I don’t understand how that’s different from this:

with concurrent.futures.ThreadPoolExecutor(max_workers=get_logical_cpus()) as tpe:
    for _ in range(10000):
        tpe.submit(task)

where task is some worker function and I’m ignoring arguments to the task function for the sake of brevity. Won’t that supply work to the thread pool until the task queue is exhausted?

adelfino · January 20, 2026, 11:09pm

Maybe I don’t understand how ThreadPoolExecutor really works, but I believe the max_workers count says nothing about where the thread goes. It just says, at any given time, there are at most max_workers threads. That’s how it is for non free-threading, where only one logical CPU is used.

ngoldbaum · January 20, 2026, 11:13pm

That isn’t quite true. For pure Python code that always holds the GIL, only one logical CPU is ever used at any given time, but ThreadPoolExecutor does launch separate OS threads that each independently run Python code. Whether or not the code actually runs on separate CPUs comes down to details of the CPU affinity and the OS scheduler.

If you call into code that releases the GIL, then you can get multi-CPU parallelism, even on the GIL-enabled build.

The free-threaded build doesn’t have a GIL, so even pure Python code can run on multiple CPUs simultaneously.

It sounds like you haven’t actually experimented with ThreadPoolExecutor on the free-threaded build. Maybe give is a try? Here’s an example based on generating the mandelbrot set with a pure Python generator: Mandelbrot threads - Python Free-Threading Guide

adelfino · January 20, 2026, 11:16pm

Yes, you are correct that I haven’t experimented with free-threading builds.

Does ThreadPoolExecutor in free-threading builds make sure not to start a thread on a logical CPU that is already running a thread for the same task when another logical CPU is not?

adelfino · January 20, 2026, 11:19pm

Sorry, I noted “Whether or not the code actually runs on separate CPUs comes down to details of the CPU affinity and the OS scheduler.”.

Maybe this is something Python should not take care of. It’s something the OS must deal with.

oscarbenjamin · January 21, 2026, 12:01am

There seems to be a bit of a misunderstanding about how multiple threads/processes work. You can have a single core CPU and still have multiple threads and processes but the scheduler makes them take turns. If you have multiple cores then more than one can run at a time but the scheduler still makes them take turns on the available cores.

The actual number of processes and threads is generally a lot larger than the number of cores e.g. here on Linux I currently have 253 processes and (I think…) 743 threads running on a machine that only has 4 cores and yet CPU load is at 10% while I type into this website. Most of those threads are sleeping and the scheduler takes care of dispatching them to different cores when actually needed.

The allocation of “this thread on that core” is usually only a short-lived thing and I think not really something that ThreadPoolExecutor has much influence over besides just choosing how many threads to create.

GalaxySnail · January 21, 2026, 7:14am

os.process_cpu_count() could be useful.

barry-scott · January 22, 2026, 3:14pm

Indeed you will get better performance if you leave scheduling to the OS.