I think this is a reasonable suggestion and will put it on my list to add to CPython 3.12. Prior to that, I can add a recipe to the docs, perhaps something like:
def batched(iterable, n):
it = iter(iterable)
while (batch := list(islice(it, n))):
yield batch
Will put some thought into the API:
-
Possible names are “chunked”, “batch”, and “batched”. The word “chunked” is familiar because of its use in HTTP headers and in `more-itertools (though some of that module’s name choices are questionable). “Batch” is nice because it is defined as "“a quantity or consignment of goods produced at one time”. The name “batched” is more consistent with “sorted” and “reversed”.
-
Possible chunk data types are tuple, list, or iterator. A tuple makes more sense when the input size is an exact multiple of the chunk size. Expect for the case of
*args
, we mostly don’t use tuples for variable length outputs. A list better communicates possible variable width and it is convenient when the result list needs to be altered. An iterator is a better fit with the theme of the module, but asitertools.groupby()
has shown, it is awkward to work with. Also, the data will already be in memory, so using an iterator wouldn’t save memory even if the chunk size is huge. -
There may be a use case for a zip-like argument
strict=True
to enforce even sized outputs when that is what is expected. -
The norm is for itertools to terminate when the input iterable is exhausted. However, a user may need to force the batched iterator instance to flush a partial batch. We may need to support
batcher_instance.throw(Flush)
so they have a way to get their data without waiting on a potentially infinite input iterator to terminate. -
We need to limit n to ints or objects with
__index__
. The value must be greater than zero. Also, we should think about what happens when a user inevitably sets n to an unreasonably large size and posts a bug report saying, “it hangs”. -
The argument order can be
(iterable, n)
or(n, iterable)
. The former matches the argument order forislice
and for the combinatoric itertools. We have some experience with the latter in the case ofheapq.nsmallest
and it does have some advantages. For example, it works well with partial function evaluation:dozens = partial(batched, 12)
. Also, it reads better when the iterable argument is a large expression:batched((expr(var) for var in someiterable if somecondition(var)), 64)
. In that example, the64
dangles too far away from thebatched
call that it modifies. We’ve seen this problem in the wild withrandom.sample
for example. Making n a keyword-only argument might help.