Add `flatten_list` to itertools

There are two “Pythonic” ways to flatten an iterable l:

  1. [item for sublist in l for item in sublist]
  2. import itertools; itertools.chain.from_iterable(l)

Both of those will remove one level of nesting from the list l, or if any of the elements of l is not iterable, it will raise an exception.

The issue with this is that if a list contains a mixture of leaves and iterables, you will have to wrap all of the leaves in a container to avoid an exception. Worse, if any of the elements of the iterable are themselves iterable but not intended to be flattened (e.g. strings), you will get silent data corruption.

Acknowledging that homogenous lists are the most common case, this is still a case that I find myself running into reasonably frequently:

import itertools
l = [[1],[2,3],"foo"]
# I want [1,2,3,"foo"]
print(list(itertools.chain.from_iterable(l)))
print([item for sublist in l for item in sublist])
# but in both cases I get [1,2,3,'f','o','o']

I propose a function flatten_list with the following semantics, in pseudocode.

 def flatten_list(l: list, *, max_depth: int=1) -> list:
        def flatten_list_recursive(l,depth):
                for elem in l:
                    if isinstance(elem, list) and depth:
                        flatten_list_recursive(elem,depth-1)
                    else:
                        out.append(elem)
        if not isinstance(l, list): raise TypeError
        out = []
        flatten_list_recursive(l,max_depth)
        return out

This function will continue to flatten lists until it hits the max depth, but will leave any other kind of iterable alone. This allows for flattening heterogenous data types that contain non-iterables, lists, and iterables that you don’t want to flatten, like strings, and it allows for arbitrary flattening, not just one level of recursion.

Does this seem like it could be a useful addition to the itertools module?

If it’s to be considered an itertool, it needs to accept arbitrary iterables (even infinite ones) and yield its results - a relatively small tweak to your algorithm, I expect.

What you may be interested in, though, is that it’s already in more-itertools:

https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.flatten

2 Likes

Got it, the only thing is that it feels wrong to me is to flatten things like strings and bytes the same way as lists, tuples, etc. Do you also have that intuition, and do you think there’s a way to distinguish between types that should be flattened and types that shouldn’t in a principled way?

The collapse function in more_itertools [1] will ignore strings and bytes. You can specify the thing you want to flatten (e.g. only lists) edit: actually I misread, you can specify the things you don’t want to flatten.


  1. Chris linked to flatten which only does one level ↩︎

2 Likes

That’s exactly what I was looking for, thanks! It seems very useful to me and I didn’t know about it, IMO it could be a good addition to the standard library.

2 Likes

That’s an arguable point, but there is one data type that definitely should be able to be flattened, and would be tricky to pin down: generators. In nearly every way, a generator (the result of calling a generator function) should behave identically to a list when passed to itertools functions. So all you could really do would be to pick out a couple of data types and exclude them, which is what collapse() does (btw, thanks for the correction there, I automatically reach for “flatten” as it’s what I expect that to be called).

2 Likes

There is some “prior art” on this idea from older resources mentioned here related to making your own flatten() function, one that handles nesting and mixed types.

If you were seeking something pre-implemented, there are some tools in more-itertools as mentioned above. Demonstrations found here.

As to whether it should be added to the standard library, the latter options have been around so long, I’m not sure. The demand is certainly there.