Can `ast.BoolOp` use `left` and `right` instead of a `values` list?

There’s a comment in the Python grammar present in the ast module documentation right now:

    -- BoolOp() can use left & right?
    expr = BoolOp(boolop op, expr* values)

BoolOp uses op and a list of values, but BinOp does a pure binary tree with left and right. This is something I’ve wondered quite a few times before:

  • Why is it designed like this?
  • Why is the comment there?
  • And if it can be changed, is the standardization worth the breaking change?

I think that is due to the fact that, much like Compare, a BoolOp isn’t limited to left and right values

>>> print(ast.dump(ast.parse('a and b and c', mode='eval'), indent=2))
Expression(
  body=BoolOp(
    op=And(),
    values=[
      Name(id='a', ctx=Load()),
      Name(id='b', ctx=Load()),
      Name(id='c', ctx=Load())]))

You can say the same about 1 + 2 + 3 though. They’re semantically the same.

2 Likes

They are not the same because the boolean operators can short-circuit, I suppose? Good question, actually!

1 Like

I’m not entirely sure. Are there any situations in which (a and b) and c is not the same as (a and b and c)? The disassembly of both looks identical. Or is it something that can happen with mixed and and or operators that means it’s better to group all the ands and all the ors?

At least in pyparsing, a left-associative rule generates a list of tokens like [test1, test2, test3] while a right-associative rule generates [test1, [test2, test3]]. It’s more work to create and implement short-circuiting on a left-associative binary tree than on a list.

I think the mathematical properties of BoolOp or and and, and BinOp | and & are identical, so there is no reason why you’d implement things one way or the other. The ast module seems to be built by many different people (essentially every time new syntax is added IIRC) over a long time so inconsistencies are to be expected.

Not really, booleans do short circuiting.

OK, I figured it out. What @effigies said makes perfect sense, and that’s why only Compare and BoolOp have lists: they both short circuit.

I’d still say it’s the same property, if you replace 1 and 0 with “any truthy” and “any falsy”.

x = a and b and c

and

x = (a and b) and c

produce different bytecodes. The truth value of a is always evaluated once in the former case, but can be evaluated two times in the latter case.

Boolean expressions in boolean context are optimized. Both

if a and b and c: x

and

if (a and b) and c: x

produce the same bytecode. The truth value of a is always evaluated once.

This is an implementation detail, and you code should not rely on it.

Other than this, the difference between multi-argument BoolOp and nested two-argument BoolOp is that the former uses iteration, and the latter uses recursion. The recursion level is limited (otherwise it can cause stack overflow), this limits the number of sequential and’s and or’s, the same way as the number of sequential arithmetic operators is currently limited.

3 Likes

Hmm. I’m looking at my current build of 3.12, and dis.dis() is showing the same values. At what point does this get optimized?

>>> dis.dis('x = a and b and c')
  0           0 RESUME                   0

  1           2 LOAD_NAME                0 (a)
              4 JUMP_IF_FALSE_OR_POP     3 (to 12)
              6 LOAD_NAME                1 (b)
              8 JUMP_IF_FALSE_OR_POP     1 (to 12)
             10 LOAD_NAME                2 (c)
        >>   12 STORE_NAME               3 (x)
             14 RETURN_CONST             0 (None)
>>> dis.dis('x = (a and b) and c')
  0           0 RESUME                   0

  1           2 LOAD_NAME                0 (a)
              4 JUMP_IF_FALSE_OR_POP     3 (to 12)
              6 LOAD_NAME                1 (b)
              8 JUMP_IF_FALSE_OR_POP     1 (to 12)
             10 LOAD_NAME                2 (c)
        >>   12 STORE_NAME               3 (x)
             14 RETURN_CONST             0 (None)

It’s also possible that something’s changed since March when I last built from source, or that my Python installation is broken.

1 Like

Current main and 3.12.

$ echo 'x = a and b and c' | ./python -m dis
  0           0 RESUME                   0

  1           2 LOAD_NAME                0 (a)
              4 COPY                     1
              6 POP_JUMP_IF_FALSE        6 (to 20)
              8 POP_TOP
             10 LOAD_NAME                1 (b)
             12 COPY                     1
             14 POP_JUMP_IF_FALSE        2 (to 20)
             16 POP_TOP
             18 LOAD_NAME                2 (c)
        >>   20 STORE_NAME               3 (x)
             22 RETURN_CONST             0 (None)
$ echo 'x = (a and b) and c' | ./python -m dis
  0           0 RESUME                   0

  1           2 LOAD_NAME                0 (a)
              4 COPY                     1
              6 POP_JUMP_IF_FALSE        2 (to 12)
              8 POP_TOP
             10 LOAD_NAME                1 (b)
        >>   12 COPY                     1
             14 POP_JUMP_IF_FALSE        2 (to 20)
             16 POP_TOP
             18 LOAD_NAME                2 (c)
        >>   20 STORE_NAME               3 (x)
             22 RETURN_CONST             0 (None)

Furthermore, the new bytecode is larger and slower. It is a regression.

2 Likes
1 Like

Ah, thanks Steven, that explains it. Guess it’s time for me to build a new Python!

1 Like