Learning more about internal working of python indepth

karthick-12345 · March 10, 2022, 9:22am

Hi All,

I want help from you all on understanding python in more depth. I want to learn how python code gets Interpreted internally by Interpreter step by step. I have gone through lots of online courses, youtube videos etc… but in these they explained mostly on syntax and not on internally python, it would be more helpful if you help me with this.

Could you please guide me on learning more in depth knowledge.

Thanks

fungi · March 10, 2022, 12:03pm

Hopefully you’ve already found the Python Developer’s Guide, but if
not it has some sections along the lines of what it sounds like you
want: Python Developer's Guide

karthick-12345 · March 19, 2022, 5:07pm

Thank you for your help!!!

karthick-12345 · March 21, 2022, 7:02pm

HI fungi,

Sorry to say this, i have gone through python developer guide. But I couldn’t able to understand the concept. I don’t know where i lagging and don’t know what are the things i need to know first before going to developer guide. @fungi Could you please help me here.

jeanas · March 21, 2022, 7:27pm

I think the answer depends on what you want from this knowledge. Are you just curious to know the basics of how the Python interpreter works? In the CPython implementation (the standard one, though there are others), the code is first compiled into bytecode, which is then run by a virtual machine (VM). If you import a module, you will see a __pycache__ directory appearing next to the module file. This contains the cached bytecode. See also the dis module.

If you are looking for knowledge that will enable you to contribute to CPython, I can’t help you much. You probably need to ask a more specific question.

At any rate, CPython is just one implementation of Python. There are others, each with its own inner workings. Usually, you don’t need to know about them as a user.

karthick-12345 · March 22, 2022, 6:25am

Thanks @jeanas for your response.

Yes, you are correct. I am just curious to know how raw python code (.py) gets executed in Computer machine(like a windows machine).

To understand this concept, I started research on how raw code executed by python interpreted and windows machines.

From this analysis, few question raised in my mind mentioned below

How the raw(.py) file get loaded into python interpreter(either by mechanically or systematically ) →
Once the python code get loaded, as a first step what will happen inside the interpreter →
How our human written code get converted and make interpreter understandable →
What will be the different memory get allocated in the life cycle of program execution →
How line by line code get executed and if an error at any point of time, how end user error message getting thrown →
How memory get allocated or network get connected online (if required) →

Overall, I am interested in learning the end to end process or lifecycle or procedure.

Could you help me understand these concepts, if possible could you please give one line answer for each question above and also if you let me know or guide me to learn this much in-depth knowledge it could be very helpful for my career.

Thanks in advance.

steven.daprano · March 22, 2022, 9:22am

The answer depends on which interpreter you use.

At the highest level, the process will always be something like this:

Python source code → byte code → machine code → CPU runs machine code → microcode → hardware flips bits in memory

but the fine details are very complicated. Even in that simple version, I have skipped a lot of steps. (E.g. the Python source code has to be read, then parsed into a parse tree, then the parse tree has to be converted to an abstract syntax tree, which can be compiled into byte code.)

CPython (the interpreter you are probably using):

compiles Python source code to the CPython virtual machine byte code;

Jython:

compiles Python source code to Java Virtual Machine byte code;

IronPython:

compiles Python source code to dot-Net CLR (Common Language Runtime) byte code;

PyPy:

runs a “Just In Time” compiler, which in simple language means that parts of the code which are reused often will be automatically re-compiled to machine code when needed.

GraalPython:

Like Jython, but uses the Java “Graal” virtual machine instead of the JVM.

An example may help make things clear. The Python source code:

message = "Hello, world!"

print(message)

gets compiled into byte code like this in CPython 3.10:


b'd\x00Z\x00e\x01e\x00\x83\x01\x01\x00d\x01S\x00'

Of course nobody can make sense of that. Its a stream of bytes! (Hence the name “byte-code”.) Fortunately CPython comes with a disassembler that turns the unreadable byte-code into symbolic form that we can read:


  2           0 LOAD_CONST               0 ('Hello, World!')

              2 STORE_NAME               0 (message)



  3           4 LOAD_NAME                1 (print)

              6 LOAD_NAME                0 (message)

              8 CALL_FUNCTION            1

             10 POP_TOP

             12 LOAD_CONST               1 (None)

             14 RETURN_VALUE

Next, the interpreter runs the byte-code. Each byte-code instruction (like LOAD_CONST and CALL_FUNCTION) corresponds to machine code built into the interpreter. In the case of CPython, that machine code was built by the C compiler that make the interpreter. In the case of Jython, it was made by the Java compiler; and in the case of IronPython, it was made by one of the CLR languages, like C# or F.

However it was made, the interpreter has something that knows how to call a Python function. That CALL_FUNCTION byte-code causes the interpreter to grab the function (here, print) and its arguments, and run the function.

CALL_FUNCTION needs to be clever enough to understand functions created with Python itself def function... as well as built-in functions that are written in C (or Java, or C#, or F, or whatever language).

So once that happens, the next step is that the function’s code gets run by the CPU. The CPU sees the function, which is a series of machine code instructions, and runs them.

Back in the 1960s and 70s, CPUs were pretty simple, and each machine instruction corresponded to a physical circuit in the CPU that flipped bits. But in the 21st century, CPUs are much more complicated, and each machine code instruction corresponds to one or more microcode instructions, which in turn correspond to actual hardware that flips bits.

Trying to understand Python code at the level of flipping bits is almost impossible! That simple, two line “Hello World” program above would probably be tens of thousands of machine code instructions.

karthick-12345 · March 22, 2022, 11:18am

Thanks @steven.daprano for your answer, It’s really helpful and gives lots of insights.

My next topic on discovering is, How Traceback (most recent call last): work? which means how error getting generated and printed in user console, if any issue in code. Basically, I am having doubts on how errors get generated and printed in the console.

For simple program it is very easy, from console we can able to understand like mentioned below

ref
Traceback (most recent call last):
File “<pyshell#0>”, line 1, in
ref
NameError: name ‘ref’ is not defined

Ans: In the above case, I am trying to refer to the variable ref which has not been used so far. So I am getting this error message, So I can correct it.

But if we go for a large application or program, how can we back trace from an error in the console and identify the reason or root cause of the error. Because sometime I noticed while practicing it print error with some inbuilt python function as well, So in this case I got confused how we can back trace and identify the root cause.

If you explain on this topic it would be more helpful or guide me on the learning path on this topic.

TobiasHT · March 22, 2022, 12:46pm

I found the book by Anthony Shaw quite informative. You can try getting it. It’s called Cpython internals.

fungi · March 22, 2022, 2:15pm

At a simple level, the traceback you’re looking at may end in Python
builtins or stdlib objects, but there’s almost always an entry into
them from your script. Read the traceback from the bottom up,
looking for the first reference to a file included in the
application you’re troubleshooting, and the problem is likely
somewhere in the vicinity of that call (though it could be even
higher up in the traceback/earlier in the call stack depending on
what’s wrong with it).

steven.daprano · March 23, 2022, 6:07am

If you get a traceback, you have a bug in your code.

If doesn’t matter whether the traceback originated in a function you wrote, or a builtin, the cause is still the same: a bug in your code.

As an experienced coder who has been programming with Python for more than 20 years, if I get a traceback, 99.9% of the time it is my error. As a beginner, it will be more like 99.9999% of the time.

It is not impossible for a beginner to discover a bug in the builtins, or the stdlib, but if you get a traceback, you should not think “Have I found a bug in Python?”. You should think “Where is the bug in my code?”

Fortunately the traceback gives you lots of information to solve the problem. Often the traceback tells you everything you need:

len("a", "b")
# raises TypeError: len() takes exactly one argument (2 given)

but sometimes it is just the beginning of the process of working out where the bug is.

You don’t need to know anything about Python internals to fix bugs. You need to know about Python’s externals, the functions and how to use them, not how they are implemented internally.

karthick-12345 · March 23, 2022, 11:47am

Hi,

Could you help me to understand or learn how do debug the python program of any error occur.
What are the things I need to see around the error, when error occurred.

In simply, For become more expert in debug python programming, what should I want to learn?

CAM-Gerlach · March 24, 2022, 2:25am

The most important thing is to learn how to read, interpret and use an error traceback, as both @steven.daprano and @fungi explain above, which will inform you what the immediate error was, when/where it occurred, and often suggest why or how to fix it. Given there are an infinite variety of errors, there is no magic bullet that would resolve every error (if only…).

Beyond what @fungi suggests above, if you still can’t figure out what went wrong, a common beginner approach is to use print() statements (or logging calls, a debugger, dynamic introspection, etc) to see what your variables are and whether they match what you expect. In addition, you can carefully reference the documentation of whatever function/method/class is involved, try a simpler version of your code until it works (to isolate what’s causing the problem), and step through the code using a debugger. Failing that, as a beginner, 99% of the problems you’re likely to run into have already been seen by someone else in your shoes, so googling the error message usually can provide relevant help (though there’s a lot of bad advice out there, and a lot of it might not actually be applicable to your particular problem).

Best of luck!

karthick-12345 · March 24, 2022, 4:18am

Thanks guys for giving insights on my topics.

More answer or ideas or way of approaching or way of handling error are always welcome…