The oparg of the new `COPY` opcode is not zero-based?

Apologies for yet another bytecode-related discussion. This time I was dealing with COPY as a replacement for DUP_TOP and I stumbled upon this unexpected behaviour. The following script (this requires the bytecode package to run)

from bytecode import Bytecode, Instr

print(
    [
        eval(
            Bytecode(
                [
                    Instr("LOAD_CONST", 24),
                    Instr("LOAD_CONST", 42),
                    Instr("COPY", i),
                    Instr("RETURN_VALUE"),
                ]
            ).to_code()
        )
        for i in range(3)
    ]
)

produces the output

[3, 42, 24]

Based on the documentation of COPY, which reads

Push the i-th item to the top of the stack. The item is not removed from its original location.

I would have expected COPY 0 to duplicate the TOS, which before the occurrence of the COPY opcode itself should have been the literal 42. Instead I do get a literal 3. I do get the expected result with COPY 1 though.

Based on the simple experiment above, I think I can conclude the following:

  • the COPY oparg is not zero-based;
  • COPY 0 does not crash the interpreter, but instead produces an unexpected TOS.
  • COPY 1 is the replacement for DUP_TOP.

This makes me wonder whether this behaviour is by design, or just accidental. Is there, perhaps, a special meaning for COPY 0 that is not documented in the section for the dis module?

This slightly modified example shows the full content of the stack

from bytecode import Bytecode, Instr

print(
    [
        eval(
            Bytecode(
                [
                    Instr("LOAD_CONST", 24),
                    Instr("LOAD_CONST", 42),
                    Instr("COPY", i),
                    Instr("BUILD_TUPLE", 3),
                    Instr("RETURN_VALUE"),
                ]
            ).to_code()
        )
        for i in range(3)
    ]
)

and produces

[(24, 42, 3), (24, 42, 42), (24, 42, 24)]

If you look at the source code you’ll see that COPY with oparg=0 is invalid, but this is only checked using a C assert() call. That only does anything when built in debug mode (./configure --with-pydebug). Other than that, COPY uses PEEK(oparg), and PEEK is indeed 1-based.

I really urge you to (a) read the source code and (b) use a CPython binary built in debug mode before posting questions here. :slight_smile:

1 Like

https://github.com/python/cpython/pull/96462 is an attempt at improving the documentation of the stack effect for several opcodes. I am awaiting for some feedback before dealing with the latest conflicts.

Best

Matthieu

1 Like

I really urge you to (a) read the source code and (b) use a CPython binary built in debug mode before posting questions here. :slight_smile:

I have started the OP by apologising, so I hope I can be pardoned this time! :slightly_smiling_face: I confess I didn’t go to the source code this time because the behaviour was already clear from my experiments, and I didn’t expect to find the rationale for a 1-based stack there either. Hence I thought I had better chances of getting an insight as to why these new opcodes behave like this here.

I think this could be regarded as one of those cases when no documentation is better than some documentation. For I went to the dis documentation and assumed that indexing was 0-based (how often do you deal with a 1-based collection?). Based on the fact that I have also been swapping the TOS with itself with SWAP 1, I think it’s fair to conclude that the indexing mentioned in the dis module is 1-based. It would have been great if that was mentioned somewhere, like @MatthieuDartiailh is doing in their PR. I look forward to those docs improvements! :slightly_smiling_face:

https://github.com/python/cpython/pull/96462 has been merged which should make the stack effect of opcode clearer and also clarify the meaning of several argument.

2 Likes