Add batching function to itertools module

But dozens = partial(batched, n=12) is also fine, surely? And if not, dozens = lambda it: batched(it, 12) would be just as good.

I don’t really have a preference regarding argument order, myself.

Maybe there should be an option to raise an exception on detection of that “odd-lot” as in your de-serialisation of fixed length records example, it would indicate an error. Zip has the strict argument for similar.

Analyzing a Python implementation that nobody has written yet is admittedly difficult; nonetheless I have a few objections.

  1. First of all, I don’t see why one would use such method to count the number of batches instead of this:

    def num_batches(tot_len, batch_len):
        return 1 + (tot_len - 1) // batch_len
    

    The tot_len is either known in advance or can be computed with

    def count(iterable):
        """Return the number of items in an iterator."""
        n = 0
        for _ in iterable:
            n += 1
        return n
    
    count(batched(iterable, n))
    
  2. The problem in your example

    len(list(batched(iterable, n)))
    #   ^^^^ this is the problem
    

    is that you are forced to materialize all sub-iterators in a list because iterators generally do not implement __len__. This issue is addressed in the previous point, or can be worked around either by implementing __len__ on the object returned by batched so that len(batched(iterable, n)) becomes valid.

    Now, while it’s true that this example causes buffering, I don’t see it as batched’s fault, but rather list’s fault. After all, you are collecting things in a list just to count their number: even disregarding the discussion about batched this is clearly sub-optimal.

  3. I don’t think the Rust version would cause buffering in a similar situation, if I understand it correctly, unless one unnecessarily collects the sub-iterators in a Vec. The analogous code would be the following:

    let batches = iterator.chunks(5);
    batches.count()
    

    Thanks to the deterministic nature of Drop, at any time during the iteration which occurs implicitly in the count method there is a single sub-iterator in scope, hence no buffering occurs because, again,

    it only buffers if several chunk iterators are alive at the same time.

    Achieving the same behavior in Python might be harder because of the “non-determinism” of the garbage collector, which can defer the destruction of objects beyond their last point of use, however I’m sure there are ways to achieve the same effect in Python.

Let me rephrase what I said earlier to be more precise then:

in every circumstance where your iterator version is sufficient, this other implementation is equivalent in terms of memory usage (no buffering, no extra allocation), unless one does unnecessarily keep alive more than one sub-iterator.