`x or x or x` versus `(x or x) or x`

Another way of looking at this is through the AST (3.12):

>>> print(ast.dump(ast.parse("x or x or x", mode='eval'), indent=2))
Expression(
  body=BoolOp(
    op=Or(),
    values=[
      Name(id='x', ctx=Load()),
      Name(id='x', ctx=Load()),
      Name(id='x', ctx=Load())]))
>>> print(ast.dump(ast.parse("(x or x) or x", mode='eval'), indent=2))
Expression(
  body=BoolOp(
    op=Or(),
    values=[
      BoolOp(
        op=Or(),
        values=[
          Name(id='x', ctx=Load()),
          Name(id='x', ctx=Load())]),
      Name(id='x', ctx=Load())]))

I’m not totally sure these mean the same thing. At least, it appears that in the triple-or, evaluation takes the truthyness of each in turn, returning the first truthy value, or the last value. In the nested case, it does this for the first pair (returning x) then has to compute the truthyness of that value (again).

Hm, I think you could make the argument that expression still evaluates correctly, but that future evaluations might not be what you expect because an assumption about short-circuiting was violated.

This is arguably a bug in that the specification states what will be evaluated, and it’s common to rely on that behavior (e.g. to avoid evaluating something that raises exceptions). Although this specific example is pathological and it’s not clear if a realistic example exists.

Python doesn’t know the types at compile time. But it might be able to apply some simple optimizations when it sees literals.

I wouldn’t say that the second evaluation is “extra” in 3.12, because it makes sense if you consider what the parse tree looks like. Instead, there’s a missed optimization that used to be applied, but I guess was overlooked now in the changes (yet again) to the bytecode spec and/or the code generator.

1 Like

It doesn’t do this if the first element is a truthy variable, either.

Oh, I think I understand what’s happening better. edit: this statement was incorrect, nvm

In 3.12 the evaluation of (x or x) doesn’t return True, it returns x. So you end up testing the truthiness of x again thus you call __bool__ again, printing aha or whatever.

In 3.10 it seems like the initial truthiness short-circuits the whole thing?

class B:
    def __init__(self, val):
        self.val = val
    def __bool__(self):
        print(f'aha: {self.val}')
        return True
a = True

(B(1) or a) or B(2)
# prints 
# aha: 1
# aha: 1

Because (B(1) or a) evaluates to B(1) by testing it for truthiness, and then B(1) or B(2) does the same.

In the end, this change in behavior happened from Python 3.11 to 3.12 because of the removal of JUMP_IF_FALSE_OR_POP, see Remove `JUMP_IF_FALSE_OR_POP` and `JUMP_IF_TRUE_OR_POP` · Issue #567 · faster-cpython/ideas · GitHub . I think the peephole optimizer hasn’t quite been updated to optimize the new sequence of instructions.

4 Likes

@pablogsal This thread is about a possibly incomplete change in parsing/optimizing in 3.12. Seems better to ping you here rather than suggest an issue.

4 Likes

Thanks for the ping! This looks more on the compiler side CC @iritkatriel

Not sure that’s a good example, many orms trigger side effects (eg database queries) when attributes are accessed.

1 Like

Yes, right, side effects in a general sense are fine. But two accesses to the same attribute should result in the same value each time.

[everything is this response is from 3.12.2]
AFAICT flat x or y or z is compiled in 3.12 same x or (y or z).

Python 3.12.2 (main, Mar 12 2024, 11:02:14) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> dis.dis('x or (y or z)')
  0           0 RESUME                   0

  1           2 LOAD_NAME                0 (x)
              4 COPY                     1
              6 POP_JUMP_IF_TRUE         8 (to 24)
              8 POP_TOP
             10 LOAD_NAME                1 (y)
             12 COPY                     1
             14 POP_JUMP_IF_TRUE         3 (to 22)
             16 POP_TOP
             18 LOAD_NAME                2 (z)
             20 RETURN_VALUE
        >>   22 RETURN_VALUE
        >>   24 RETURN_VALUE

That looks like a sensible optimization to me — ignoring “broken” objects, or is associative:

>>> for x,y,z in itertools.product([0,1], repeat=3):
...    print(x,y,z, '(x or y) or z =', (x or y) or z, 'x or (y or z) =', x or (y or z))
... 
0 0 0 (x or y) or z = 0 x or (y or z) = 0
0 0 1 (x or y) or z = 1 x or (y or z) = 1
0 1 0 (x or y) or z = 1 x or (y or z) = 1
0 1 1 (x or y) or z = 1 x or (y or z) = 1
1 0 0 (x or y) or z = 1 x or (y or z) = 1
1 0 1 (x or y) or z = 1 x or (y or z) = 1
1 1 0 (x or y) or z = 1 x or (y or z) = 1
1 1 1 (x or y) or z = 1 x or (y or z) = 1
  • Same for (x and y) and z vs. x and (y and z).
  • However mixing is NOT associative e.g. (x or y) and z != x or (y and z).

But it’s not enough to be “sensible optimization”; Python strives for predictable behavior, so does this match parser, and does this match official grammar?

Well, AST surprised me

it’s neither this associativity nor that, but a single flat “BoolOp with 3 children” :open_mouth::

>>> print(ast.dump(ast.parse('(x or y) or z'), indent=2))
Module(
  body=[
    Expr(
      value=BoolOp(
        op=Or(),
        values=[
          BoolOp(
            op=Or(),
            values=[
              Name(id='x', ctx=Load()),
              Name(id='y', ctx=Load())]),
          Name(id='z', ctx=Load())]))],
  type_ignores=[])
>>> print(ast.dump(ast.parse('x or (y or z)'), indent=2))
Module(
  body=[
    Expr(
      value=BoolOp(
        op=Or(),
        values=[
          Name(id='x', ctx=Load()),
          BoolOp(
            op=Or(),
            values=[
              Name(id='y', ctx=Load()),
              Name(id='z', ctx=Load())])]))],
  type_ignores=[])
>>> print(ast.dump(ast.parse('x or y or z'), indent=2))
Module(
  body=[
    Expr(
      value=BoolOp(
        op=Or(),
        values=[
          Name(id='x', ctx=Load()),
          Name(id='y', ctx=Load()),
          Name(id='z', ctx=Load())]))],
  type_ignores=[])
  • Similarly x and y and z is a flat “And with 3 children”.
  • (This is not new — python 3.11.8 also parses flat 3-children.)
  • Mixes can not be flattened. x and y or z is parsed as '(x and y) or z' BUT x or y and z is parsed as x or (y and z) — that is, and has higher precendence.

What about grammar?

  • 6.17. Operator precedence confirms and has higher precedence.
  • 6.11. Boolean operations explains that x or y, x and y do short-circuiting, but not how exactly it behaves for longer chains.
    It gives grammar:
    or_test  ::=  and_test | or_test "or" and_test
    and_test ::=  not_test | and_test "and" not_test
    not_test ::=  comparison | "not" not_test
    
  • Grammar agrees “and” has higher precendence.
  • Grammar gives no hint that “BoolOp with 3 children” is a possibility, but I’m not sure if precise AST structure must match grammar, as long as it’s equivalent?
  • Anyway, that grammar is left-recursive. IIUC it implies this parsing:
    x "or" y "or" z
    and_test(x) "or" and_test(y) "or" and_test(z)
    or_test(and_test(x) "or" and_test(y)) "or" and_test(z)
    or_test(or_test(and_test(x) "or" and_test(y)) "or" and_test(z))
    

:warning: So if I read that right, documented grammar requires (x or y) or z associativity which contradicts x or (y or z) execution.
If optimization is desired, it should be at least documented.

Well if x is falsey, I do expect x or x to call __bool__ twice.
AFAICT, what we see here is not about x or x happening to have same object on both sides (and I wouldn’t want the language to special-case that).

  • When (x or y) or z is executed and x.__bool__() is True on first call, it reduces to x or z so x.__bool__() gets called again.
  • Whereas x or (....) execution, when x.__bool__() is True, that’s the final answer.