Removing the LOAD_CLOSURE opcode

Hi all,

I’m new to hacking on cpython and the Python source code in general. I’ve been working on Remove LOAD_CLOSURE #105775

From the ticket and poking around in compiler.c I know that LOAD_CLOSURE was added to keep the local/cell variable logic more readable. Particularly when we get to fix_cell_offsets, we don’t have information about how the variable being loaded was introduced to the scope, so labeling with LOAD_CLOSURE provided a kind of clean way to preserve that information throughout the instruction’s lifecycle.

My questions are the following (and apologies if this is not the right place for them, I wasn’t sure if they were better asked here, on the ticket itself, or one of the mailing lists):

  • Does it seem like fix_cell_offset mainly where we’ll need some change in logic to accommodate converting LOAD_CLOSURE to LOAD_FAST, or are there other considerations aside from coming up with some way to maintain information about the instructions provenance
  • When I first set up my cpython development environment, I followed the guidance on the developer’s guide and have been studying the compiler design page closely. From my reading of it, when I remove an opcode, I need to
  1. Remove references to the opcode from the code and update any logic impacted
  2. Remove the opcode from Lib/opcode.py and the dis module documentation
  3. Run make regen-opcode regen-opcode-targets to regenerate Include/opcode.h and Python/opcode_targets.h
  4. Update the magic number in Lib/importlib/_bootstrap_external.py and PC/launcher.c
  5. (See below)
  6. Run make regen-importlib

On the Compiler Design page under the Introducing New Bytecode section, it says

Changes to Lib/importlib/_bootstrap_external.py will take effect only after running make regen-importlib. Running this command before adding the new bytecode target to Python/ceval.c will result in an error. You should only run make regen-importlib after the new bytecode target has been added.

I didn’t understand the last two sentences here --it makes sense to me that if we’re introducing a new opcode, we should ideally have logic to handle the new instructions in ceval.c, but it’s not clear to me what error would be triggered if we’re not referencing the new instruction, so I feel I’m not understanding something

I’d really appreciate any guidance on these points, and again, I apologize if these questions aren’t appropriate for this forum!

I did a bit more poking around – I see that replacing LOAD_CLOSURE with LOAD_FAST would impact some of the optimization functions flowgraph.c like insert_superinstructions and fast_scan_many_locals. This is expected since this is the whole reason for this ticket, but I assume for this unit of work we would want to preserve the existing behavior to reduce risk, so my solution should probably have some check in those functions or earlier for the provenance of any LOAD_FAST instruction – i.e. did it originate in a closure as a cell reference or free variable. If so, we won’t do any optimizations since we aren’t currently in the case of a LOAD_CLOSURE op. Not sure if that seems reasonable – interested in any suggestions on how to proceed

Otherwise I’m still sort of wrapping my head around the local variable lifecycle

@polynomialherder
You’ll have a better chance of reaching the relevant people if you comment on the issue rather than post here (not everyone follows all the lists. )

There is also a mentorship section list for questions about contributing, if you prefer. It is followed by people who are interested in mentoring new contributors.

1 Like

Appreciate the tips! I’ll try those, thanks