Best guide to the CPython Source Code?

Is the CPython source code deliberately obfuscated?

Is there a “best” guide to the CPython Source Code? I see Real Python has published CPython internals, but it’s for 3.9 and I’d have to catch up from 3.9. Is that a good option or am I better off reading the internals guide at devguide.python.org? Is there something that’s even better?

I’m looking for something more in depth than an Explain Like I’m 5 guide, but not quite “Computer Science Degree and 15 years in the industry.”

What’s my best next resource?

No, it’s not deliberately obfuscated. If you see ways to make it clearer or to improve the devguide, feel free to open PRs!

I haven’t read the Real Python internals book, but it’s likely to provide a broader overview and be more detailed and accessible than the devguide’s internals section. Looking at the table of contents, I wouldn’t worry that it’s written targeting 3.9 — it should still be plenty applicable to 3.12.

There are some interesting PyCon talks floating around, e.g. maybe you’d find Talk - Sebastiaan Zeeff: Demystifying Python’s Internals: Diving into CPython by implementing... - YouTube interesting.

1 Like

I’ve heard fairly good things about the Real Python book, since it’s written by a former Python core dev who knows his stuff. It and the devguide likely complement each other well, since both cover different things. However, if you’re a self-paced learner, you can also try just reading the devguide and a basic overview online, and then diving in to the source code yourself and seeing how you fare.

1 Like

When I go into the source code I notice things like there is no _thread.py that I can find, yet I can import _thread, which if I look at its code stub imports Threading which imports _thread. I don’t fully understand how that code stub gets made or where the source code for _thread actually sits. I haven’t checked this on macOS or Linux yet, only Windows, but seeing things like that led me to ask the question about obfuscation.

I will look at that talk on YouTube. Thank you!

Thanks! It’s good to know they complement each other well. I’m likely buying the book.

As described in any of the places where the top-level source tree layout is summarized, the Lib directory of the source tree contains those standard library modules implemented in pure Python, while the Modules directory contains the source files of the modules implemented in C, of which _thread is one (contained in the file threadmodule.c.

With low-level C code like this, there isn’t always quite the same 1:1 mapping between Python source files and loaded modules. However, in general, doing a fuzzy-search for the name of the module in question in both dirs will find where the code lives, and in the rare case it doesn’t, a search for a relevant function from the module will.

I didn’t see where you asked it originally, but I’m not sure why you’d think the core devs would deliberately make life harder and more confusing for themselves by “obfuscating” the code, rather than just the actual reason for the layout simply not yet being clear to you yet. Keep in mind that Python’s been around for over twice as long as you’ve been in the industry, longer than many other modern languages, and there may have been historical as well as current reasons for the specific implementation details of the reference interpreter.

1 Like

As a side point, would it be practical to have __file__ attributes on C-implemented modules? Even if it’s not 100% perfect, it might be a useful starting point.

1 Like

Yeah, I tried that first as a way to give the user an easier solution, but I assume it doesn’t work in this particular case because _threads is a low-level frozen built-in module rather than a normal C extension module. On other standard library C extension modules like e.g. threading and multiprocessing that aren’t built-in modules, __file__ works just fine.

I was talking to someone with significantly more experience coding, but not Python just programming in general, than I have about trying to learn the Python source and he mentioned that typically the deeper you get into the code the more obfuscated it gets.

I looked it up and it seems to be something that some folks do deliberately and there is even software that obfuscates code, even written for Python. Basically turns it into spaghetti so it’s difficult to follow, renames variables, etc.

It can also happen naturally, as programs get larger and harder to follow.

It’s good to know the core devs don’t deliberately obfuscate their code!

Both of those modules are actually Python-implemented (at least in CPython 3.12 - didn’t check others). The threading module imports a lot of its functionality from _thread. But yes, for extension modules, there is a reference (eg _decimal.__file__), so it’s only for the core frozen modules that it would be a concern.

Oh, oops—I just assumed they were extension modules without taking a second to check like I should. My bad; thanks for the catch!

Your point was still right even if your examples weren’t. No biggie :slight_smile:

1 Like