There are two “Pythonic” ways to flatten an iterable l
:
[item for sublist in l for item in sublist]
import itertools; itertools.chain.from_iterable(l)
Both of those will remove one level of nesting from the list l
, or if any of the elements of l
is not iterable, it will raise an exception.
The issue with this is that if a list contains a mixture of leaves and iterables, you will have to wrap all of the leaves in a container to avoid an exception. Worse, if any of the elements of the iterable are themselves iterable but not intended to be flattened (e.g. strings), you will get silent data corruption.
Acknowledging that homogenous lists are the most common case, this is still a case that I find myself running into reasonably frequently:
import itertools
l = [[1],[2,3],"foo"]
# I want [1,2,3,"foo"]
print(list(itertools.chain.from_iterable(l)))
print([item for sublist in l for item in sublist])
# but in both cases I get [1,2,3,'f','o','o']
I propose a function flatten_list
with the following semantics, in pseudocode.
def flatten_list(l: list, *, max_depth: int=1) -> list:
def flatten_list_recursive(l,depth):
for elem in l:
if isinstance(elem, list) and depth:
flatten_list_recursive(elem,depth-1)
else:
out.append(elem)
if not isinstance(l, list): raise TypeError
out = []
flatten_list_recursive(l,max_depth)
return out
This function will continue to flatten lists until it hits the max depth, but will leave any other kind of iterable alone. This allows for flattening heterogenous data types that contain non-iterables, lists, and iterables that you don’t want to flatten, like strings, and it allows for arbitrary flattening, not just one level of recursion.
Does this seem like it could be a useful addition to the itertools module?