I am wondering why my first example of ThreadPoolExecutor(max_workers=10).map(fn, iterable, buffersize=20) returns only 20 mapped items of 100. Using CPython 3.14.0.
What happens if you collect all the results inside the context manager? I suspect this is because the function returns before finishing all the work, and the thread pool is closed.
Oh yeah sorry for not reading through. So I think you fully identified the issue. I wouldn’t call it a bug in Python–exiting from a context manager early is going to do that.
This might just be a race, because str([int]) is so fast. You could try with a slower function (i.e. something with a time.sleep in it) and see how many results you get.
Indeed, this silent truncation you spotted is misleading. It happens when the iteration over the generator is not started within the with block.
A first next while the executor exists (within with) makes the generator start holding a strong reference to it, so that a post-shutdown next (out of with) explicitly raises:
ints = range(100)
with ThreadPoolExecutor(max_workers=10) as executor:
it = executor.map(str, ints, buffersize=20)
assert next(it) == "0"
# raises "RuntimeError: cannot schedule new futures after shutdown"
next(it)
We should remove the weakref wizardry you mentioned, to make the generator initially hold a strong ref, not only when the iteration starts.
Perhaps it would also be more intuitive to yield all results from the buffer before raising? The behavior would be:
ints = range(100)
with ThreadPoolExecutor(max_workers=10) as executor:
it = executor.map(str, ints, buffersize=20)
# buffered results available
assert next(it) == "0"
assert next(it) == "1"
assert next(it) == "2"
...
assert next(it) == "19"
# raises "RuntimeError: cannot schedule new futures after shutdown"
next(it)