Proposal: Add smart chunking utility (like chunked()) to itertools or stdlib

Hi All,

I’d like to propose adding a general-purpose, composable chunking utility to the Python standard library (possibly under itertools). The goal is to cover a wide range of real-world chunking needs like fixed-size grouping, sliding windows, conditional filtering, and chunk selection logic.

Project Info:

:wrench: Features:

  • Fixed-size chunking
  • Sliding window (via stride)
  • nth_position and chunk_position for advanced selection
  • Optional padding for incomplete chunks
  • Filter function to keep only qualifying chunks
  • Materialize as generator or list

Example usage:

python

CopyEdit

from smartchunks import chunked

data = [1, 2, 3, 4, 5, 6, 7, 8, 9]

# Basic chunking
print(list(chunked(data, size=3)))
# → [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# Sliding window
print(list(chunked(data, size=3, stride=1)))
# → [[1,2,3], [2,3,4], ..., [7,8,9]]

# Advanced selection
print(list(chunked(data, size=2, nth_position=2)))
# → [[1, 3], [5, 7], [9]]

Would love to hear your feedback on whether this could belong in the standard library or potentially as an enhancement to itertools.

Thanks so much!
– Maurya Allimuthu

Im against it

the tricky cases are not an api thats sensible to use and the easy cases are easy combinations of zip iter and tee

1 Like

Did you check more-itertools?

chunked

windowed

stagger

1 Like

Thanks Ronny and Neil — appreciate your quick feedback :folded_hands:

@ Neil Girdhar:
Yes, I reviewed more_itertools. It’s fantastic — and I agree that chunked, windowed, and stagger cover a lot of ground.
Where smartchunks expands beyond that is in pipeline logic:

  • nth_position and chunk_position: useful for selective time-series sampling (e.g., telemetry, logs)
  • stride + filter_fn: mimics overlapping windowed views but supports filtering out irrelevant chunks
  • apply_nth_before_chunk: lets you reconfigure the pipeline like map → chunk vs chunk → map

So it’s not meant to replace the basics — but to cover cases where chunk filtering and logic routing matter.

@ Ronny Pfannschmidt:
Totally fair — the easy cases should stay easy (and zip/tee are still king there).
The more expressive options are inspired by real-world usage in logs, NLP batching, error detection pipelines, etc.

That said, I’d be open to a simpler version like chunked_plus() with a few extra knobs:

  • stride
  • optional filter
  • optional pad

Would something in that direction feel more at home in itertools or more_itertools?

Just compose functions.

Compose in the other order.

Stagger does this.

Also, it’s a bit odd that since you knew about more-itertools, you would choose examples that are already handled by more-itertools.

1 Like

Hi Neil and team, thanks for the pointer!

I’m absolutely aware of and appreciate how more_itertools provides core utilities like chunked, windowed, and stagger. These cover the essentials brilliantly.

What smartchunks adds on top of that:

  1. nth_position – lets you skip items before or after chunking
  2. chunk_position – lets you sample every nth chunk
  3. life of pipeline – choose the order of transformations with apply_nth_before_chunk
  4. stride + filter_fn – get overlapping chunks and filter based on custom logic
  5. Optional padding and generator/list materialization

So where more_itertools solves the building blocks, smartchunks offers a composed chunk-processing pipeline in one function—ideal for conditional workflows like log segmentation, telemetry sampling, or NLP batching.

I’m open to condensing this into a clear “chunked_plus()or simplified interface if that aligns better withitertools`. Would love to hear if that reframing makes sense!

Thanks again :folded_hands:
– Maurya

itertools added batched in 3.12.

2 Likes