Hello, where is the source code of list comprehension of Python?

zzz · April 27, 2022, 1:44am

Hello,
I am interested in the the implementation of list comprehension, but I could not find it in GitHub - python/cpython: The Python programming language.
And I would like to know how does Python verify whether list comprehension grammar is implemented correctly and validate it generally runs faster than list initialization in for loop? And I look into the CPython source code, but I find it difficult for me to find the related test cases about the implementation.

sweeneyde · April 27, 2022, 2:44am

Python code gets compiled to python bytecode. You can see the result of the bytecode of a function by using the dis(“disassembly”) module. For example:

>>> def my_function():
...     return [x**2 for x in range(10)]
... 
>>> import dis
>>> dis.dis(my_function)
  1           0 RESUME                   0

  2           2 LOAD_CONST               1 (<code object <listcomp> at 0x000001D02A5D8030, file "<pyshell#2>", line 2>)
              4 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              1 (NULL + range)
             18 LOAD_CONST               2 (10)
             20 PRECALL                  1
             24 CALL                     1
             34 GET_ITER
             36 PRECALL                  0
             40 CALL                     0
             50 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x000001D02A5D8030, file "<pyshell#2>", line 2>:
  2           0 RESUME                   0
              2 BUILD_LIST               0
              4 LOAD_FAST                0 (.0)
        >>    6 FOR_ITER                 7 (to 22)
              8 STORE_FAST               1 (x)
             10 LOAD_FAST                1 (x)
             12 LOAD_CONST               0 (2)
             14 BINARY_OP                8 (**)
             18 LIST_APPEND              2
             20 JUMP_BACKWARD            8 (to 6)
        >>   22 RETURN_VALUE

This dis output will change from version to version, so your results here may be a bit different. The dis module docs can help you to understand the bytecode.

If you want to see how to get from python code to bytecode, you can check out compile.c, though that file is generally pretty hard to read, especially for newcomers.

If you want to see how bytecode actually gets executed, you can check out ceval.c. For example, here is the c implementation of the LIST_APPEND instruction in Python 3.10:

github.com

python/cpython/blob/dbe666d39880352360cb07787d47237448196bf4/Python/ceval.c#L2178-L2188

      
        
            case TARGET(LIST_APPEND): {
                PyObject *v = POP();
                PyObject *list = PEEK(oparg);
                int err;
                err = PyList_Append(list, v);
                Py_DECREF(v);
                if (err != 0)
                    goto error;
                PREDICT(JUMP_ABSOLUTE);
                DISPATCH();
            }

sweeneyde · April 27, 2022, 3:11am

There are actually a few steps involved here before we even get to compile.c.

There’s first a tokenizer/lexer that converts a stream of characters into a stream of tokens. This figures out what’s an operator and what’s a variable name, etc.
Then there’s a parser at Parser/parser.c that converts the stream of tokens into an “Abstract syntax tree” (AST), which figures out things like “this is a for loop inside of a function”. However, parser.c is a huge file that is literally not designed to be read by humans – it’s computer-generated c code, based off of the “grammar” at Grammar/python.gram.
Then the AST gets fed into the compiler at Python/compile.c, which compiles into bytecodes and assembles all of those bytecodes into a bytecode “code object”.
Then bytecode within code objects is interpreted by the main interpreter at Python/ceval.c, which actually carries out the actions that your original python code was supposed to do.

As to why list comprehensions are a bit faster than for-loop-and-append, we can check out the bytecode. Below is Python 3.10:

>>> def func1():
...     my_list = []
...     for i in range(10):
...         my_list.append(i**2)
...     return my_list
...
>>> def func2():
...     return [i**2 for i in range(10)]
...
>>> import dis
>>> dis.dis(func1)
  2           0 BUILD_LIST               0
              2 STORE_FAST               0 (my_list)

  3           4 LOAD_GLOBAL              0 (range)
              6 LOAD_CONST               1 (10)
              8 CALL_FUNCTION            1
             10 GET_ITER
        >>   12 FOR_ITER                 9 (to 32)
             14 STORE_FAST               1 (i)

  4          16 LOAD_FAST                0 (my_list)
             18 LOAD_METHOD              1 (append)
             20 LOAD_FAST                1 (i)
             22 LOAD_CONST               2 (2)
             24 BINARY_POWER
             26 CALL_METHOD              1
             28 POP_TOP
             30 JUMP_ABSOLUTE            6 (to 12)

  5     >>   32 LOAD_FAST                0 (my_list)
             34 RETURN_VALUE
>>> dis.dis(func2)
  2           0 LOAD_CONST               1 (<code object <listcomp> at 0x0000025DA56DD580, file "<stdin>", line 2>)
              2 LOAD_CONST               2 ('func2.<locals>.<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (range)
              8 LOAD_CONST               3 (10)
             10 CALL_FUNCTION            1
             12 GET_ITER
             14 CALL_FUNCTION            1
             16 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x0000025DA56DD580, file "<stdin>", line 2>:
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 6 (to 18)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LOAD_CONST               0 (2)
             12 BINARY_POWER
             14 LIST_APPEND              2
             16 JUMP_ABSOLUTE            2 (to 4)
        >>   18 RETURN_VALUE

The inner loop of func1 has LOAD_METHOD, LOAD_FAST, LOAD_CONST, BINARY_POWER, CALL_METHOD, POP_TOP where func2 has just LOAD_CONST, BINARY_POWER, LIST_APPEND. The difference essentially boils down to the fact that Python can know that the list comprehension is dealing with just a list, and so it doesn’t have look up what append means over and over.

zzz · April 27, 2022, 3:32am

Thank you very much, the explaination is clear and friendly!

After a python code using list comprehension goes through “tokenizer->parser->complier->interpreter” process, we could get the result. Are there test cases or benchmarks about list comprehension like list.append API to check whether the result is expected?

TeamSpen210 · April 27, 2022, 5:17am

There are test cases yes, a quick search gives test_listcomps. There’s probably more elsewhere.

steven.daprano · April 27, 2022, 5:30am

Dennis, what version of Python are you running?

I’ve never seen some of those op-codes. PRECALL? RESUME?

I know that bytecode can change from release to release, but it has been mostly stable for a long time!

And I can’t even begin to guess what PRECALL does that CALL couldn’t do, or why the very first op-code is “RESUME”. Resume what?

zzz · April 27, 2022, 5:51am

Interesting, does the “elsewhere” means in CPython code of GitHub or other projects of Github?

Besides, it’s amazing, how did you search it successfully. What keywords should I search for and do I search in the GitHub?
I searched in the CPython with keyword “listcomp” as follows, but test_listcomps does not occur.

TeamSpen210 · April 27, 2022, 7:08am

I didn’t use the search tools, instead I just looked in the test package (since that’s where tests are), looking for something that matched.

erlendaasland · April 27, 2022, 1:32pm

PRECALL was introduced in GH-30011
RESUME was introduced in GH-30364
PRECALL is being removed in GH-92925

You’ll be able to find the discussions/issues leading up to those by following the links in the PRs.

sweeneyde · April 27, 2022, 7:09pm

3.11 alpha7

I’m not Spencer Brown, but sometimes the “go to file” button on the top level of the GitHub repo is more helpful than the actual search functionality,

zzz · April 28, 2022, 2:53am

Thank you, yeah, the actual search functionality seems not as good as I expected.