Conditional collection literals

Charles Machalow:

what about this syntax?

A = [
    1,
    2,
    3 if use_three else ...,
    4,
]

To be clear: use ellipsis to say continue on without this item.

The problem with this is that … is itself a valid expression.

>>> use_three = False
>>> A = [1, 2, 3 if use_three else ..., 4]
>>> A
[1, 2, Ellipsis, 4]

Thanks for the quote of @Alex-Wasowicz’s post, which is basically the same proposal and which I totally missed earlier.

This proposal doesn’t have to be a breaking change though. The implementation of the tokenizer just has to change to always emit INDENTs and DEDENTs even when inside brackets, and the parser will disregard all INDENTs and DEDENTs when it finds no statements inside and will proceed to build the collection node in the conventional, backwards-compatible way. It’s when there is a statement inside that it will switch to an indentation enforcing mode to parse the INDENT/DEDENT tokens properly.

This means that the code in Alex’s post will produce a syntax error but existing code will continue to work.

Yes, to be sure there are existing ways to do the same things.

The proposal is all about an arguably more readable and more expressive syntax much like how comprehensions cover some of the most common use cases of building collections with for loops.

1 Like

Of course I saw that you intentionally misunderstood Charles’ proposal.

This cannot be expressed with the formal grammar of the parser.

It can. With a PEG parser, the new block-based grammar just has to have a lower priority (on the right of the | operator) than the current indent-less grammar inside brackets.

This was not the point!

In order to “disregard” indent, newline and dedent tokens, you would need to add something like ([indent][newline][dedent])* between any two tokens in the grammar.

1 Like

Ah yes I missed the point, and you’re right that it would be the right way to for the parser to implement the existing indent-less grammar with indents/dedents ignored. So yes it’s certainly doable.

I thought I’d commented before - I’m afraid I hate it, because it creates a weird new type of expression that’s required to be multi-line.

8 Likes

I would like it if parentheses were mandatory and only a single list/tuple/set item would be inside. I find that unambiguous and readable in both multi-line and single-line formats.

optional_item := ( <item> if <condition> )

[1, (2 if cond), 3, 4]     #  [1,2,3,4] or [1,3,4]

(1, *(2, 3 if cond), 4)    # (1,2,3,4) or (1,4)
(1, (2, 3 if cond), 4)     # (1, (2,3), 4) or (1,4)

call(
    COMMAND,
    ("-v" if verbose),
    *(("-d", str(level)) if level>0)
    arg1, arg2
)

(I am omitting dicts and kwargs, I’m not sure about them)

1 Like

Most if not all of the existing workarounds (unpacking from a tenary operation, unpacking from a list multiplied by a Boolean, using a generator, explicit appends, etc.) either require or read better with multiple lines anyway, so I don’t personally see that requirement as a downside.

I don’t know if this is significant but I see a paradigm break for the collection instanciation without unpacking or comprehension : the size of the collection cannot be known before evaluation.

If the paradigm stated above is to be respected, the conditional would only be allowed in unpacking, but possibly with multiple conditionals.

*(a, b if cond, c)

For dict unpacking, new parentheses are required :

**dict(a=1, (b=2) if cond, c=3)
**{'a':1, ('b':2) if cond, 'c':3}

This is my favorite so far. But maybe replace if with a different (soft) keyword to make it easier for readers to recognize the difference from ternary expressions. How about when? Also, I think this can be made to work for all collection types and function calls.

Therefore:

(1, (2, when cond), 4)                          # (1, (2,), 4) or (1, 4)
(1, (2, 3 when cond), 4)                        # (1, (2, 3), 4) or (1, 4)
(1, *(2, 3 when cond), 4)                       # (1, 2, 3, 4) or (1, 4)
[1, [2 when cond], 3, 4]                        # [1, 2, 3, 4] or [1, 3, 4]
{1, *{2, 3 when cond}, 4}                       # {1, 2, 3, 4} or {1, 4}
{1: x, {2: a, when cond}, 4: y}                 # {1: x, 2: a, 4: y} or {1: x, 4: y}
{1: x, **{2: a, 3: b when cond}, 4: y}          # {1: x, 2: a, 3: b, 4: y} or {1: x, 4: y}
f(1, *(2, 3 when cond), 4)                      # f(1, 2, 3, 4) or f(1, 4)
f(1, *(2, 3 when cond), 4, **(a=b when cond2))  # f(1, 2, 3, 4, a=b) or f(1, 4)

How is that for:

  • the parser?
  • the human reader?

In summary:

  • a display like *(a, b, c, ... when cond) is roughly equivalent to *((a, b, c, ...) if cond else ()), and
  • a display like (a when cond) is roughly equivalent to *((a,) if cond else ())

This illustrates the reduction in punctuation.

1 Like

OK, but you asked my opinion… :slightly_smiling_face:

And I see a very significant difference between “single line is allowed but looks ugly” and “single line isn’t allowed”. The discussions on the grammar changes around INDENT/DEDENT tokens demonstrates why (and “yes, but you can tweak the grammar to make it work”, misses the point that the need to tweak the grammar is what people aren’t comfortable with).

6 Likes

Leaving the questionwhen or if aside for now, I like that:

(value if condition)           # optional value
(value, if condition)          # optional 1-tuple
(t1, t2, t3 if condition)      # optional tuple
[li1, li2, li3 if condition]   # optional list
{s1, s2, s3 if condition)      # optional set
{k1:v1, k2:v2 if condition}    # optional dict
1 Like

Could it be possible to introduce this new object and make collections disregard OMIT objects instead of introducing a syntax change?
For example [1, 2, OMIT, 4] would be the same as [1, 2, 4] and {'x': 1, 'y': OMIT, 'z': 3} would become {'x': 1, 'z': 3}.

1 Like

This is something I’ve been thinking about for years. Semantically what we want is an inline generator:

def g():
    if environment == "dev":
        yield "--whitelist"
        yield developers
    else:
        yield "--blacklist"
        yield banned_users

args = [
    "run",
    *g(),
]

So maybe it could look like one:

args = [
    "run",
    if environment == "dev":
        yield "--whitelist"
        yield developers
    else:
        yield "--blacklist"
        yield banned_users
]

The idea is that when a controlflow statement appears in a collection literal, the body would get parsed as a regular python code block where yield statements let you emit items that will be immediately unpacked into the enclosing collection.

This avoids having to introduce a weird variant of python code where each expression needs to be followed by a comma, like in your original example. This also means that arbitrary statements would also be allowed within the code block:

items = [
    while not stream.done():
        yield stream.current()
        stream.advance()
]

This is probably the least “creative” design that fulfills the requirements. It doesn’t require coming up with new parsing rules or semantics, it’s just generator unpacking with an inline generator. Another advantage is that if over time your inline generator starts growing in complexity you can trivially extract it into a proper generator function.

Overall it has nice symmetry with both generator expressions and generator functions:

items = (i**2 for i in range(10))

items = (
    for i in range(10):
        yield i**2
)

def g():
    for i in range(10):
        yield i**2
items = g()

If the introducer keyword for the controlflow statement doesn’t appear at the beginning of the line, we could accept inline generators with a single statement as the body:

items = [x, y, if condition: yield z]

Insightful…
Maybe a bit more readable this way :

(a, yield b if cond, c)
{'a':1, yield 'b':2 if cond, 'c':3}

This does not introduces any keyword but a yield-if statement.

Inventing a new expression variant, even if it leans on existing keywords, will probably make it harder to reach consensus compared to reusing existing syntax verbatim. I think my proposal is less prone to syntactic bikeshedding.

(a, if cond: yield b, c)
{'a': 1, if cond: yield ('b', 2), 'c': 3}

Both if cond: yield b and if cond: yield ('b', 2) would work as-is when pasted into a proper generator function. Users already know how to interpret this syntax, and slotting it verbatim inside the collection literal makes the semantics pretty obvious. It’s not as compact as your alternative but it doesn’t require parsing a new kind of construct for dictionary entries yield 'b': 2, and expands more naturally into the multiline version:

(
    a, 
    if cond: 
        yield b
    c,
)
{
    'a': 1,
    if cond:
        yield 'b', 2
    'c': 3,
}

Note that the single-line version would probably require parenthesized tuples when yielding key-value pairs for dictionaries.

A lot of folks are posting to say how “elegant” or “more readable” an inline if can be. I do not find that convincing when all I’m being shown are single character expressions like 1 or x.

Obviously we’d want any conditional control flow to short circuit execution, so

x = [
    f() if condition
]

should not call f when the condition is false.

This is where I think the arguments about improved readability become a lot weaker. When composed with other behavior, this becomes much less readable. Anyone is, of course, free to disagree, but consider a relatively simple nested structure:

d = {
    "x": [
        func1() if func2()
    ] if func3(),
    "y": {
        "p": [
            func4() if func5()
        ] if func6()
    } if func7()
}

What is the execution order of such code?
Assuming the conditions all evaluate to true, it’s probably 3, 2, 1, 7, 6, 5, 4
But a reader will encounter the functions in numerically ascending order. And maybe the order should actually be… 3, 7, 2, 1, 6, 5, 4

I could make a similar argument about inline if-else. Some folks will point at the existence of inline if-else else as evidence that this is a good idea. But here’s the thing: I only very rarely use inline if-else because I think it’s generally bad for readability for the exact reasons that I’m suspicious of this idea.

The indented form is interesting for conditional kwarg passing – that one is a real case that sometimes drives me to assemble a dict to unpack, which I find renders things less readable. But given that calls can have arbitrary whitespace between arguments, I don’t see a good way to make that work which is worth the cost. That idea is worth thinking about more, but I haven’t seen a completing proposal for it here – spinning it off from a thread about collection literals into a separate discussion is probably the right next step, if someone thinks they have a solution.

New syntax is a very slow and expensive way to solve problems. Is everyone who wants this already using generator functions to build collections? That’s what I would do and it looks nice to me because writing a generator keeps the control flow constructs which we already know.
Keeping top-to-bottom control flow and using existing tools? No particularly strong downsides? Sounds “elegant”.

5 Likes

Yes, it has to be that for the same reason that ternary operators do that. Really, this is just sugar for a ternary operator with a fixed alternative case and automatic unpacking.

If you have something that works for function calls, it should also work for displays. The same way that generalized unpacking was done for both displays and function calls. It would be unnecessarily inconsistent to do only one.

I don’t think this argument is useful in the Ideas category. All ideas are about making Python better. They’re not about solving problems today. You can see what people are doing today in the early replies in this thread, and I don’t think they’re “elegant”. They’re pretty ugly compared to some of the solutions posted here.

4 Likes