+1. I like this; it feels like a natural extension to me that doesn’t need much explanation if you’re already familiar with the existing unpacking syntax. It simplifies a construct that gets used a lot in real code.
For the unparenthesised generator case, maybe it’d make sense to keep that prohibited, require the user to parenthesise to remove ambiguity. You already can’t do f(x for y in it, 4) for instance since that’s also ambiguous. Everywhere else parentheses are required, the function is the special one, so limiting that special case seems reasonable especially if there’s a clear error message.
I certainly agree with this statement, but for that reason, I think it’s important that function calls remain special in that same sense for generator expressions of all kinds. The general rule is: if you’re passing a generator expression into a function as its sole argument, you don’t need to wrap it in extra parentheses. Since this proposal involves expanding the notion of a generator expression to include things like *x for x in y, then it seems more consistent to me if f(*x for x in y) does not raise an exception, but rather is interpreted as f((*x for x in y)).
In current Python versions, if you want to unpack the result of a generator expression, you need explicit parentheses, e.g., you have to do f(*(x for x in y)) if you want to unpack the generator to get separate arguments for the function. To me, it seems natural to extend that same principle here: if the intent is to unpack the result of the generator expression, parentheses around the genexp should still be required, regardless of whether the genexp itself contains a *.
Said another way, I think we should aim for uniform behavior across all kinds of comprehensions and genexps, regardless of whether they make use of * internally. We currently have:
f([x for x in y]) # pass in a single list
f({x for x in y}) # pass in a single set
f(x for x in y) # pass in a single generator (no parentheses around genexp)
f(*[x for x in y]) # pass in elements from the list separately
f(*{x for x in y}) # pass in elements from the set separately
f(*(x for x in y)) # pass in elements from the generator separately (parentheses required)
So for the proposed new kind of comprehension/genexp (containing a *), it feels like we ought to mirror that same structure, following the same conventions:
f([*x for x in y]) # pass in a single list
f({*x for x in y}) # pass in a single set
f(*x for x in y) # pass in a single generator (no parentheses around genexp)
f(*[*x for x in y]) # pass in elements from the list separately
f(*{*x for x in y}) # pass in elements from the set separately
f(*(*x for x in y)) # pass in elements from the generator separately (parentheses required)
Also, from an implementation perspective, this interpretation involves only small changes to the grammar (just adjusting things so that * can be on the front of the target of the comprehension/genexp), so it feels like a natural generalization of the rules and structures that are already in place. By contrast, raising an exception for f(*x for x in y) would require adding a special case to the grammar to reject what otherwise looks like a coherent and consistent expression.
I guess the question here, though (and the main source of the ambiguity) is whether people will naturally see the * as part of the genexp in that case or not, i.e., whether the * attaches to x or to x for x in y in the examples above. To me, it really feels like the star unambiguously attaches to x (and I think I would have felt that way even before I started working on this
), but as @pf_moore said, this is certainly a case where opinions and perspectives can differ.
Either way, I just wanted to try to provide a little more support/justification/clarification for my interpretation here.
How about tackling this without new syntax, just adding classmethods to builtins:
list.collect(x for x in its)
set.collect(x for x in its)
dict.collect(d for d in dicts)
tuple.collect(x for x in its)
One advantage over syntax is that it is a convention that can be adopted by user types, e.g.
SortedList.collect(x for x in its)
Better yet, you only need one new function:
import itertools
def collect(iterables):
return itertools.chain(*iterables)
list(collect(x for x in its))
set(collect(x for x in its))
dict(collect(d.items() for d in dicts))
tuple(collect(x for x in its))
In a perfect world, itertools.chain would have been defined from the start to take arguments in the same style as min and max, and it would have worked as our collect function out of the box.
The helper function is not strictly needed. You could just write:
from itertools import chain
list(chain(*(x for x in its)))
set(chain(*(x for x in its)))
dict(chain(*(d.items() for d in dicts)))
tuple(chain(*(x for x in its)))
itertools.chain.from_iterable flattens an iterable of iterables
Thanks, I had forgotten about from_iterable. Not quite as discoverable as plain chain.
from itertools import chain
list(chain.from_iterable(x for x in its))
set(chain.from_iterable(x for x in its))
dict(chain.from_iterable(d.items() for d in dicts))
tuple(chain.from_iterable(x for x in its))
Eh, a little verbose. But better than adding more asterisks to Python.
These can be written slightly more concisely, since *(x for x in its) amounts to *its (and you could say something similar for the from_iterable versions). This kind of use of itertools.chain is actually mentioned in the examples at the top of this thread, as well as in the draft PEP:
Alternatively, the notation is effectively short-hand for the following uses of
itertools.chain:list(itertools.chain(*its)) set(itertools.chain(*its)) dict(itertools.chain(*(d.items() for d in dicts))) itertools.chain(*its)
I still personally feel that syntax proposed here is more concise, readable, intuitive, and discoverable than using itertools.chain or the variant with two loops and an auxiliary variable in the comprehension.
In a similar manner, the example in OP
sum((it for it in its), [])
can be simplified to
sum(its, [])
this is extends to tuples and sets and seem the best solution to me, (but it does not apply to dicts).
Indeed, you’re right that that could be written more concisely. Thanks for pointing that out!
This syntax is definitely nice and concise, but I do think it has some downsides:
- In my experience, many people are unaware of what that second argument to
summeans and so can’t glance at this and tell what it does. - It only works if every element in
itsis a list, whereas the other options work for arbitrary iterables. - As you mention, it doesn’t extend to dicts or generators.
- Perhaps most importantly, because the this form actually does repeated concatenation instead of mutating a single list, it can be dramatically slower than the other options presented throughout this thread, particularly if you’re concatenating lots of lists together.
To even better justify the new syntax over existing approaches, I think we can further generalize unpacking in comprehensions by allowing * and ** to unpack any arbitrary expression within a comprehension, not just the outermost expression.
So that as an example to flatten [1, 2, [3, 4]] into [1, 2, 3, 4]:
[*x if isinstance(x, Iterable) else x for x in its]
would be equivalent to:
[x for it in its for x in (it if isinstance(it, Iterable) else [it])]
This can potentially be done by making * and ** true unary operators that wrap the operand in a new Unpackable object for the yielder of a comprehension to unwrap and unpack.
Edit: I misread @blhsing’s proposal, which renders most of my objection below moot. I’m +0 on whether optional unpacking (I.e. *x if cond else x for x in its) should be supported.
I would be -1 on [edit: Ben’s proposal in the preceding post]. It takes a (IMO) intuitive behavior [1] and adds in quite a bit of cognitive load to use correctly. To wit:
How many layers of flattening would this operator provide? Your example demonstrates an answer of “0 or 1”, but is it limited to that?What happens if you try to use this on strings (e.g.["abc", ["def", "ghi"]])? Unless the flattening occurs arbitrarily deep, the only reasonable answer would be `[“a”, “b”, “c”, “def”, “ghi”] (which is probably not what you wanted).I am aware of no other instance where unpacking can optionally work on non-iterables. Adding this special case adds just a bit more cognitive load on remembering the rules for unpacking.
I would also be surprised if this particular need (flatten an iterable that’s a mixture of iterables and non-iterables) is widespread enough to need support in the syntax of Python.
I’m +1 on the original proposal. The proposed syntax blends the syntax for unpacking in literals with the syntax for comprehensions in a way that struck me as intuitive upon first reading:
[*it for it in its] “unrolls” to [*its[0], *its[1], *its[2], …], which in turn “unrolls” to [its[0][0], its[0][1], …, its[1][0], its[1][1], …]
This reading of “unrolling” also shows my position on how dictionary unpacking should work: the last instance of a key is the one that’s kept, exactly as if one wrote out {**d[0], **d[1], **d[2], …})
While itertools.chain.from_iterable can do something similar, I find that I have to refer to the itertools documentation every time to figure out if I want itertools.chain or itertools.chain.from_iterable. The discussion in this very thread showed that several contributors to the discussion were not aware of (or forgot) the niceties of these functions.
Similarly, the nested for loop comprehension version is somewhat unintuitive (is it [x for x in it for it in its] or is it [x for it in its for x in it]?[2]). I find this unintuitive enough that I actively avoid writing nested for-loop comprehensions.
Obviously, one can replace the whole comprehension with a full nested for loop, and that is sometimes the right solution.[3] But for simple cases, I think this proposed syntax is intuitive and concise like no other existing alternative.
Thanks for all the input so far! I’ve made a few changes since posting this, which I wanted to call out here as a little bit of a progress update. Here are fresh links to the current version of everything:
- Draft PEP: peps/peps/pep-9999.rst at comprehension_unpacking · adqm/peps · GitHub
- Reference Implementation: GitHub - adqm/cpython at comprehension_unpacking
Here’s a brief list of the changes I’ve made since my original post:
- Rewrote and reorganized big chunks of PEP itself (current rendered version, diff)
- Added some responses from this conversation (note that “Rejected” is maybe too strong a word for this point in the process)
- Added examples from the standard library
- Updated the reference implementation:
- Made a first pass at documentation for the proposed changes (diff)
- Added/changed some test cases (diff 1, diff 2)
- Fixed some issues with the grammar in my original implementation to make sure that existing tests were all passing (diff)
- Added some new specific error messages for malformed comprehensions (diff)
- Fixed an error with how async comprehensions where being handled (thanks to one of the test files in @alexprengere’s draft implementation) (diff)
I think I’m -1 on this as well, in large part because this would represent a much bigger change to the language than I was intending to propose. I also agree with @kapinga’s specific concerns.
I see exactly 1. Where do you see “0 or 1”?
I don’t want to speak for @kapinga, but looking again, it seems that I may have misread @blhsing’s message (I originally interpreted the intention as being that * could apply to anything, including an integer, and that *1 and *[1] would both “unpack” to a single integer 1). Looking again, that doesn’t in fact seem to be what @blhsing was suggesting. Sorry about that!
That said, I’m still -1 on that idea since it feels like a much bigger change than I was aiming for (and also maybe because I’m not grokking the general rule behind the translation between the two pieces of code given):
I think
[*x if isinstance(x, Iterable) else x for x in its]
was meant like this:
[(*x) if isinstance(x, Iterable) else x for x in its]
In words: If x is an iterable, then do *x (extend the result list by x), otherwise do just x (append x to the list).
Or without a list comprehension:
result = []
for x in its:
if isinstance(x, Iterable):
result.extend(x)
else:
result.append(x)
Thanks for the explanation, and sorry for not being clear. I understood the intended equivalence between those two particular pieces of code, but I’m still not sure I see the general rule that one would use to translate between the two (or how this would be implemented).
Maybe the thing I’m struggling with is twofold:
- This would require
*xto stand alone (which I’m now realizing was indeed part of the intention). But I’m not immediately sure what the resulting object would actually look like, nor how it would might be useful outside of the context of a list/set/dict view; if it’s only useful in the context of comprehensions/generators, then I don’t see the value in trying to make it work outside of that context. - This would require additional typechecking at runtime to know whether to attempt to unpack or not (depending on what that expression evaluates to), which further muddies the waters to me.
I think that your intuition is close to what would have decided if we were to start fresh today, but as Paul said, that ship has sailed and we are stuck with f(x for x in it) meaning f((x for x in it)).
That said, a better meaning for f(x for x in [x_1, x_2, x_3]) is f(x_1, x_2, x_3). It’s too late for that though.
Also, I’m +1 on this proposal. I was one of the implementers of PEP 448 (Unpacking generalizations). We wanted to add this syntax in the beginning, but felt like being modest in our proposal in order to push through the significant opposition. I figured in time as people got used to unpacking, this would feel natural to more and more people ![]()
If I remember correctly, Guido was actually for this. Did you look at the original commit thread where we implemented 448?