ThreadPoolExecutor.map with buffersize returns truncated iterable under certain circumstances

salomvary · October 15, 2025, 12:13pm

Hi folks!

I am wondering why my first example of ThreadPoolExecutor(max_workers=10).map(fn, iterable, buffersize=20) returns only 20 mapped items of 100. Using CPython 3.14.0.

Looking at the implementation of map() there is some weakref wizardry, is the reason for returning a truncated iterable the executor being garbage collected by the time the the buffer needs to be appended to? https://github.com/python/cpython/blob/a005835f699b5ba44beb8c856db1f62454522e1e/Lib/concurrent/futures/\_base.py#L633

Is this a CPython bug perhaps? Or expected behavior? Or am I holding it wrong?

import concurrent.futures


def get_results():
    ints = range(100)
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        return executor.map(str, ints, buffersize=20)

print(len(list(get_results())))  # -> 20 (???)


def get_results_without_buffersize():
    ints = range(100)
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        return executor.map(str, ints)


print(len(list(get_results_without_buffersize())))  # -> 100

def get_results_2():
    ints = range(100)
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        yield from executor.map(str, ints, buffersize=20)

print(len(list(get_results_2())))  # -> 100

def get_results_3():
    ints = range(100)
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        print(len(list(executor.map(str, ints, buffersize=20))))

get_results_3()  # -> 100

jamestwebber · October 15, 2025, 4:12pm

What happens if you collect all the results inside the context manager? I suspect this is because the function returns before finishing all the work, and the thread pool is closed.

salomvary · October 15, 2025, 4:24pm

That works as expected, see last example.

jamestwebber · October 15, 2025, 4:32pm

Oh yeah sorry for not reading through. So I think you fully identified the issue. I wouldn’t call it a bug in Python–exiting from a context manager early is going to do that.

salomvary · October 15, 2025, 4:55pm

Not sure. Removing the buffersize argument makes the example work as expected (added this example above too):

def get_results_without_buffersize():
    ints = range(100)
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        return executor.map(str, ints)


print(len(list(get_results_without_buffersize())))  # -> 100

jamestwebber · October 15, 2025, 5:34pm

This might just be a race, because str([int]) is so fast. You could try with a slower function (i.e. something with a time.sleep in it) and see how many results you get.

jamestwebber · October 15, 2025, 5:37pm

Hm I just tested this myself (on 3.11) and it does seem to block until the whole map is complete…so perhaps this is indeed a bug…

ebonnal · March 22, 2026, 5:37pm

Indeed, this silent truncation you spotted is misleading. It happens when the iteration over the generator is not started within the with block.

A first next while the executor exists (within with) makes the generator start holding a strong reference to it, so that a post-shutdown next (out of with) explicitly raises:


ints = range(100)
with ThreadPoolExecutor(max_workers=10) as executor:
    it = executor.map(str, ints, buffersize=20)
    assert next(it) == "0"
# raises "RuntimeError: cannot schedule new futures after shutdown"
next(it)

We should remove the weakref wizardry you mentioned, to make the generator initially hold a strong ref, not only when the iteration starts.

Perhaps it would also be more intuitive to yield all results from the buffer before raising? The behavior would be:

ints = range(100)
with ThreadPoolExecutor(max_workers=10) as executor:
    it = executor.map(str, ints, buffersize=20)
# buffered results available
assert next(it) == "0"
assert next(it) == "1"
assert next(it) == "2"
...
assert next(it) == "19"
# raises "RuntimeError: cannot schedule new futures after shutdown"
next(it)