Currently (thanks to PEP 448), we have a very nice shorthand for combining small numbers of iterables/dictionaries:
[*it1, *it2, *it3]
{*it1, *it2, *it3}
{**dict1, **dict2, **dict3}
I think a natural extension of this syntax would be to allow starred expressions in comprehensions and generator expressions, to provide easy ways to combine an arbitrary number of iterables/dictionaries:
[*x for x in its] # list with the concatenation of all iterables in `its`
{*x for x in its} # set with the union of all iterables in `its`
{**d for d in dicts} # dict with the combination of all dictionaries in `dicts`
(*x for x in its) # generator representing the concatenation of all iterables in `its`
These would effectively be shorthand for the following syntax, but more concise by avoiding the use and repetition of auxiliary variables:
[x for it in its for x in it]
{x for it in its for x in it}
{key: value for d in dicts for key, value in d.items()}
(x for it in its for x in it)
Of course, there are alternative ways to do things like this. Taking the concatenation of a bunch of lists as an example, any of the following would result in the same output (though some are more efficient than others):
[x for it in its for x in it]
list(itertools.chain(*its))
sum((it for it in its), [])
functools.reduce(operator.concat, its, [])
However, the proposed syntax of [*it for it in its]
is both more concise and more intuitive (to me, at least) than any of these options. Given the existing unpacking syntax, the additional syntax proposed here feels like a natural way to build such a collection given the existing language features, moreso even than, e.g., [x for it in its for x in it]
(where, in my experience with teaching Python, the two for
clauses often feel backwards for beginners, so the impulse is to swap their order).
Currently, the proposed syntax results in specific error messages:
>>> [*x for x in its]
...
SyntaxError: iterable unpacking cannot be used in comprehension
>>> {**d for d in dicts}
...
SyntaxError: dict unpacking cannot be used in dict comprehension
My suspicion is that these error messages are mostly encountered by people who wishfully use this syntax and already have a correct intuition for how it would/should behave, rather than by people accidentally typoing a *
or **
into those expressions. My confidence that this extension feels natural is inspired in part by students using this notation on written exams, assuming that it is already a part of the language.
Here are a few small example programs written several different ways, to show how this proposed syntax compares against what we can already do in Python. For each, the last version demonstrates the proposed syntax.
-
Finding all files contained within a directory and its subdirectories:
def get_all_files(path): all_files = [] for _, dirs, files in os.walk(path): all_files.extend(files) return all_files def get_all_files(path): return [file for _, _, files in os.walk(path) for file in files] def get_all_files(path): return list(itertools.chain(*(files for _, _, files in os.walk(path)))) def get_all_files(path): return [*files for _, _, files in os.walk(path)]
-
Typical CS2-level Python exercise, finding the leaf values of a tree using recursion:
def leaf_values(tree): if not tree['children']: return {tree['value']} out = set() for child in tree['children']: out.update(leaf_values(child)) return out def leaf_values(tree): if not tree['children']: return {tree['value']} return { grandchild for child in tree['children'] for grandchild in leaf_values(child) } def leaf_values(tree): if not tree['children']: return {tree['value']} return {*leaf_values(child) for child in tree['children']}
-
Merging information across configuration dictionaries:
def merge_configs(configs): out = {} for conf in configs: if conf.get("enabled"): out.update(conf) return out def merge_configs(configs): return { key: value for conf in configs if conf.get("enabled") for key, value in conf.items() } def merge_configs(configs): return {**conf for conf in configs if conf.get("enabled")}
-
Filtering out values from an HTML document:
def matching_items(source): out = [] for section in BeautifulSoup(source).find_all('ul', class_='mylist'): out.extend(section.find_all('li')) return out def matching_items(source): return [ item for section in BeautifulSoup(source).find_all('ul', class_='mylist') for item in section.find_all('li') ] def matching_items(source): return [ *section.find_all('li') for section in BeautifulSoup(source).find_all('ul', class_='mylist') ]
More formally, my friend/colleague Erik Demaine (@edemaine) and I have put together a draft PEP and a basic reference implementation:
- Draft PEP: peps/peps/pep-9999.rst at comprehension_unpacking · adqm/peps · GitHub
- Reference Implementation: GitHub - adqm/cpython at comprehension_unpacking
This isn’t the first time that this idea has been proposed. Erik and I previously proposed this extension in a thread on the python-ideas mailing list in 2021, where it was met with positive feedback (all replies to that thread were at least +0), but we were ultimately unable to find a sponsor at that time.
So I’m giving it another go here, hoping for feedback (on the proposal and/or the reference implementation) and, ultimately, hoping to find a sponsor for moving forward with the PEP process if there’s still enthusiasm behind this idea.
For additional reference, similar ideas were also presented in PEP 448 itself and in another mailing list thread from 2016, and more recently in another thread on this forum.