I am interested in the the implementation of list comprehension, but I could not find it in GitHub - python/cpython: The Python programming language.
And I would like to know how does Python verify whether list comprehension grammar is implemented correctly and validate it generally runs faster than list initialization in for loop? And I look into the CPython source code, but I find it difficult for me to find the related test cases about the implementation.
Python code gets compiled to python bytecode. You can see the result of the bytecode of a function by using the
dis(“disassembly”) module. For example:
>>> def my_function(): ... return [x**2 for x in range(10)] ... >>> import dis >>> dis.dis(my_function) 1 0 RESUME 0 2 2 LOAD_CONST 1 (<code object <listcomp> at 0x000001D02A5D8030, file "<pyshell#2>", line 2>) 4 MAKE_FUNCTION 0 6 LOAD_GLOBAL 1 (NULL + range) 18 LOAD_CONST 2 (10) 20 PRECALL 1 24 CALL 1 34 GET_ITER 36 PRECALL 0 40 CALL 0 50 RETURN_VALUE Disassembly of <code object <listcomp> at 0x000001D02A5D8030, file "<pyshell#2>", line 2>: 2 0 RESUME 0 2 BUILD_LIST 0 4 LOAD_FAST 0 (.0) >> 6 FOR_ITER 7 (to 22) 8 STORE_FAST 1 (x) 10 LOAD_FAST 1 (x) 12 LOAD_CONST 0 (2) 14 BINARY_OP 8 (**) 18 LIST_APPEND 2 20 JUMP_BACKWARD 8 (to 6) >> 22 RETURN_VALUE
dis output will change from version to version, so your results here may be a bit different. The dis module docs can help you to understand the bytecode.
If you want to see how to get from python code to bytecode, you can check out compile.c, though that file is generally pretty hard to read, especially for newcomers.
If you want to see how bytecode actually gets executed, you can check out ceval.c. For example, here is the c implementation of the
LIST_APPEND instruction in Python 3.10:
There are actually a few steps involved here before we even get to compile.c.
- There’s first a tokenizer/lexer that converts a stream of characters into a stream of tokens. This figures out what’s an operator and what’s a variable name, etc.
- Then there’s a parser at Parser/parser.c that converts the stream of tokens into an “Abstract syntax tree” (AST), which figures out things like “this is a for loop inside of a function”. However, parser.c is a huge file that is literally not designed to be read by humans – it’s computer-generated c code, based off of the “grammar” at Grammar/python.gram.
- Then the AST gets fed into the compiler at Python/compile.c, which compiles into bytecodes and assembles all of those bytecodes into a bytecode “code object”.
- Then bytecode within code objects is interpreted by the main interpreter at Python/ceval.c, which actually carries out the actions that your original python code was supposed to do.
As to why list comprehensions are a bit faster than for-loop-and-append, we can check out the bytecode. Below is Python 3.10:
>>> def func1(): ... my_list =  ... for i in range(10): ... my_list.append(i**2) ... return my_list ... >>> def func2(): ... return [i**2 for i in range(10)] ... >>> import dis >>> dis.dis(func1) 2 0 BUILD_LIST 0 2 STORE_FAST 0 (my_list) 3 4 LOAD_GLOBAL 0 (range) 6 LOAD_CONST 1 (10) 8 CALL_FUNCTION 1 10 GET_ITER >> 12 FOR_ITER 9 (to 32) 14 STORE_FAST 1 (i) 4 16 LOAD_FAST 0 (my_list) 18 LOAD_METHOD 1 (append) 20 LOAD_FAST 1 (i) 22 LOAD_CONST 2 (2) 24 BINARY_POWER 26 CALL_METHOD 1 28 POP_TOP 30 JUMP_ABSOLUTE 6 (to 12) 5 >> 32 LOAD_FAST 0 (my_list) 34 RETURN_VALUE >>> dis.dis(func2) 2 0 LOAD_CONST 1 (<code object <listcomp> at 0x0000025DA56DD580, file "<stdin>", line 2>) 2 LOAD_CONST 2 ('func2.<locals>.<listcomp>') 4 MAKE_FUNCTION 0 6 LOAD_GLOBAL 0 (range) 8 LOAD_CONST 3 (10) 10 CALL_FUNCTION 1 12 GET_ITER 14 CALL_FUNCTION 1 16 RETURN_VALUE Disassembly of <code object <listcomp> at 0x0000025DA56DD580, file "<stdin>", line 2>: 2 0 BUILD_LIST 0 2 LOAD_FAST 0 (.0) >> 4 FOR_ITER 6 (to 18) 6 STORE_FAST 1 (i) 8 LOAD_FAST 1 (i) 10 LOAD_CONST 0 (2) 12 BINARY_POWER 14 LIST_APPEND 2 16 JUMP_ABSOLUTE 2 (to 4) >> 18 RETURN_VALUE
The inner loop of
LOAD_METHOD, LOAD_FAST, LOAD_CONST, BINARY_POWER, CALL_METHOD, POP_TOP where
func2 has just
LOAD_CONST, BINARY_POWER, LIST_APPEND. The difference essentially boils down to the fact that Python can know that the list comprehension is dealing with just a list, and so it doesn’t have look up what
append means over and over.
Thank you very much, the explaination is clear and friendly!
After a python code using list comprehension goes through “tokenizer->parser->complier->interpreter” process, we could get the result. Are there test cases or benchmarks about list comprehension like list.
append API to check whether the result is expected?
There are test cases yes, a quick search gives
test_listcomps. There’s probably more elsewhere.
Dennis, what version of Python are you running?
I’ve never seen some of those op-codes. PRECALL? RESUME?
I know that bytecode can change from release to release, but it has been mostly stable for a long time!
And I can’t even begin to guess what PRECALL does that CALL couldn’t do, or why the very first op-code is “RESUME”. Resume what?
Interesting, does the “elsewhere” means in CPython code of GitHub or other projects of Github?
Besides, it’s amazing, how did you search it successfully. What keywords should I search for and do I search in the GitHub?
I searched in the CPython with keyword “listcomp” as follows, but test_listcomps does not occur.
I didn’t use the search tools, instead I just looked in the
test package (since that’s where tests are), looking for something that matched.
You’ll be able to find the discussions/issues leading up to those by following the links in the PRs.
I’m not Spencer Brown, but sometimes the “go to file” button on the top level of the GitHub repo is more helpful than the actual search functionality,
Thank you, yeah, the actual search functionality seems not as good as I expected.