Couple of CPython API questions

Hello all… thanks for reading.

I’m porting some pure Python to a C extension. Have a couple of burning questions, for which I’m struggling to find answers:

1- Is there a good strategy for tracking PyObject reference counts during development that I am overlooking?

2- Why is there no PyList_Pop/Remove, et al (and I say “et al” b/c certainly this one seeminly odd omittence means there are probably others). When I first got my feet wet, it seemed like a magical land where the API lined up nicely with my knowledge of Python. But these sorts of omitteneces seem strange to me. And why would we have PyDict_DelItem() and not some basic list operations?

Right now, my workaround is (based on

static inline PyObject* list_pop(PyObject *lst) {
    return PyObject_CallMethod(lst, "pop", "n", Py_SIZE(lst) - 1);

static inline PyObject* list_remove(PyObject *lst, int i) {
    return PyObject_CallMethod(lst, "remove", "n", i);

The Stack Overflow answer mentions implementing the underlying code, which I can probably muster (if after this port it proves to be a significant bottleneck, which I doubt), but still: why no already-done implementation in the API? I looked at the underlying code and it seems like all the peices are there, but I mean, they’d have to be in order for the Python methods to work…

Just count incrementing/decrementing operations. Most C API functions, do not change the reference count of arguments and return an object with increased reference count (so you need to decrease it after use), but there are exceptions: functions which steal references and functions which return a borrowed reference. Every borrowed reference need to be increfed if you release the GIL directly or indirectly.

Because removing an item from the list is just PyList_SetSlice(list, i, i+1, NULL). No need to add a special C API for this. In contrary, PyDict_DelItem() is complex and can not be expressed in terms of other C API.

PyList_Pop is virtually few simple consequent operations: getting an item by PyList_GetItem(), getting an ownership by Py_INCREF() and removing it from the list by PyList_SetSlice(). Since PyList_GetItem() and Py_INCREF() are atomic, there is no a race condition.

Thank you very much! I definitely understand the PyList stuff now. Will re-implement that.

Just a follow up question about object references: is there a good way to know what functions will steal or borrow? I think that’s ultimately the source of my confusion. I have yet to see any indication in the API docs, but I can be pretty dumb at times lol… My strategy thus far has just been doing simple experiments and printing the count before & after. It’s effective, however tedious. Plus it’s easy to trip myself up and be unsure if I’m actually done with an object or not. Some of that, I’m just sort of anticipating I’ll figure it out when the port is complete and segfaults 1000 times before I untangle it >_>

Anyway, thanks again!

The API documentation notes whether a reference is borrowed (example: PyList_GetItem), versus a new reference (example: PySequence_GetItem). If you’re interested, those annotations are made from this file.

In general though, as Serhiy noted, most C API functions simply do not change refcounts of their arguments.

In general, it’s also a better idea to use Cython than to rewrite pure Python into a C extension. It’s rather less effort, less chance of introducing bugs, more resilient to interpreter updates, and will probably get you about the same performance boost as hand-written C.

On the other hand, if this is just a learning exercise that pretty much all goes out the window; have fun and good luck :slight_smile:


Thanks very much for the replies. I’ve learned a lot, and my questions have been answered! I’m not sure how I overlooked those hints in the docs - I swear I must have looked right at them several times.

As for Cython, I did try that first, but it took me several days just to break even on performance in my initial experiments, and quickly became clear for this specific project, it’d be faster/easier to do in pure C. Have to learn a bunch of Cython syntax + a bunch of C concepts anyway. So that plus, some experience in C# and Swift, made this a sensible choice in my mind.

And yes, I am definitely very glad to finally be learning this stuff, and the whole project started as a learning exercise. Being a self-taught programmer means sometimes the only way to learn is to just dive in face first (and ofc hope someone out there will throw me a bone when I get stuck)!

Just in case anyone is curious, the code I am porting is for a Minecraft-like world generator, starting with a project called Fogleman Minecraft. I just have not been able to get pure Python fast enough to get blocks from cache to screen, plus the generation algorithms were already at their limits being not even 1/4 complete…I saw a ridiculous gain when I ported the terrain generator over, so porting some more of the code, I’m hoping will ease the bottleneck enough. My only concern is a lot of it still has to go through the CPython API, just like the original Python code, and there’s only so much I’ll be able to split into pure C. if it’s a bust, oh well. I will have learned another very valuable skill in my career, and will have bits that I can carry on to port the whole thing to another C-based language if it comes to that.

It’s not surprising that it takes some time to learn a new language. Especially with Cython, it’s not easy to know how to write optimized code: there can be multiple ways to write some piece of Cython code, with dramatic speed differences. With only a little bit of effort, you can write Cython code that’s as fast as C code. And you can always access the C API from Cython, but that’s rarely needed.

It’s quite common (and easy) to mix Python, Cython and C code: use pure Python where performance doesn’t matter, use C for the low-level algorithmic code and use Cython as glue between the two.

1 Like

Well, Cython shouldn’t require learning a new language (or so I thought was the point). Going into that waste of time, I was lead to believe it was simply syntax sugar that was almost sure to speed up (or break even with) almost any Python code. The reality, however is much, much worse. To be honest, I was being pretty nice in my last reply about Cython, but my initial impression is not good. It seems like pointless overhead both in the learning curve, and in the output, with very few/rare outlying use-cases proponents seem to point at where it lives up to the hype (not to mention the many pitfalls where it can hinder performance).

Either way, I guess the point is, if I needed to master C anyway, then why not just skip the learning curve of Cython, and just learn/write C & an API? It furthers my career if nothing else (plus already knew enough C# that it was a cakewalk).

On top of all that, I’d rather work in just two languages, and have two directly intertwined code bases, rather than add a 3rd layer of unnecessary complexity. Being self-taught, sometimes I feel like I’m just either dumb or hilariously ignorant on these topics, but feel like unnecessary complexity happens a lot in this industry and is encouraged to a scary degree… so please feel free to correct me where I’m wrong…

Random thought while I’m here, My dream language would be fairly low level like C, but with Python-like syntax, strongly typed, with a solid and well-thought-out set of standard libraries/modules maybe not quite as all-inclusive as Python, but sharing many.

I’m quite experienced in C and C++, yet for the matters of interfacing C/C++ code with Python I prefer using Cython as it allows writing high-level abstractions easily while freeing me from most of the burdens that come with the CPython C API.

Yes, it’s better to understand C if you really want to understand what Cython does under the hood. But being able to write C code is not necessarily a good reason to write your software in C directly. Similarly, you may be skilled in assembler, yet not want to write software directly in assembler.

That’s actually pretty much correct in my opinion. I would be curious to see a case where Cython has worse performance than CPython for a piece of pure Python code. I’m pretty sure that Cython would consider that a bug.

Sorry, if I was unclear, but I meant Cython can be slower than pure Python. And from what I read as I was trying to learn, the pitfalls are plentiful. And indeed, I did manage to see negative performance until I spent days researching and tweaking, only to JUST break even with pure Python performance in the end on only a very minimal/partial test case. But I guess the code was way more readable, which is always a huge plus in my book.

At that point, I decided C would be easier/quicker. However, it’s not…It’s easier in some ways, especially already knowing a C-based language, but also a lot more difficult in other ways…Definitely not quicker.

I suppose this makes a great deal of sense, however, it’s really hard to know when Cython is going to actually help performance over pure Python. At least with CPython, I know the end result can’t be worse, or if it is, it’s my fault.

That being said, my approach so far has been to port as much as I can to pure C - no CPython at all - then write CPython functions to expose them to Python (basically just functions that unpack the args, calls the C function with them, then sets up a PyObject to return). This has proved effective, but extremely tedious on the more complex port I’m doing now. The first CPython port I did was simple and gave me false hope.

Anyway, I will give Cython another shot.

Thanks for the all the insights!

Would you care to share that minimal test case?

By the way, you may also want to take a look at C++ and the pybind11 project. Though if you don’t know C++ already, be aware that it is a rather large language to learn (but it’s really better than C in terms of abstractions and robustness).

Would you care to share that minimal test case?

Well, unfortunately, I don’t seem to have an example of the slow initial version, but for what it’s worth, this is the version where I managed to break-even compared to the pure Python after several days of poking and researching:

Here is the original pure Python:

Here is the CPython port (which is definitely much much faster, but I have not measured exactly by how much…and apparently I’m only allowed 2 links as a new user so sorry for the workaround):

bitbucket dot org/experimentfailed/testcraft/src/working/terrain_gen/terrain_gen.c
bitbucket dot org/experimentfailed/testcraft/src/working/terrain_gen/terrain_gen.h

And while I’m at it, for the bravest of souls, here is the more complex port I’m working on:

Cython: bitbucket dot org/experimentfailed/testcraft/src/working/old/world.pyx - I believe this was slower than the pure Python, and is about the point where I was exasperated into just porting the whole thing over to CPython

Pure Python: bitbucket dot org/experimentfailed/testcraft/src/working/

And finally, CPython (it’s a hot mess right now):
bitbucket dot org/experimentfailed/testcraft/src/working/world_c_test/world.c
bitbucket dot org/experimentfailed/testcraft/src/working/world_c_test/world.h

This part is on hold for more than one reason - first, the soul searching this thread has induced - thanks all for the help! Second, I need to completely re-think the entire module from top to bottom to cleanly separate out what could be done well in pure C, vs. what needs to be CPython. I have loads of ideas, but fighting a bit of burnout at the moment.

The issue with the pure Python (and Cython so far) is getting blocks from the cache dicts, to the screen. Filling the cache with the ported terrain_gen module, is orders of magnitude faster (about 1000 blocks/tick on my slowest machine) than the rest of the engine can keep up with. After this, if I’m still having an issue, I’m going to get more aggressive and bring Pyglet directly into my C port (or otherwise give myself direct access to opengl). I’m just not 100% sure how to do either yet.

The ultimate goal is to create a “make your own Minecraft-like game in Python” type of engine/system, featuring my somewhat unique approach to terrain generation. I came up with the general idea as I knew there was no way pure Python could ever hope to generate chunks like Minecraft, et al, fast enough (tried and failed hilariously). However, it also cannot quite keep up with my approach, tho it does impressively well.

That’s not really a “minimal” example (it’s actually multiple Python files with external dependencies), so it’s not so easy to say what went wrong for you.

Yeah, sorry…It’s the most minimal I’ve done. I guess my interpretation of “minimal” is a little different to most. Probably because I’m already intimately familiar with the project, so from my perspective, it’s only two relatively straight forward functions that needed attention.

The CPython port only took hours (no more than 8), including learning necessary bits of the API, brushing up on my C syntax, and modifying the Simplex Noise library’s C code to be a standalone header file. Compared to Cython which took several days just to break even on performance…As I think I mentioned, this experience is what lulled me into thinking that just porting the rest of the trouble areas to CPython would be easier/faster. And boy has that taken me for a ride…

Anyway, you can see why I stuck to very general questions. It’s not a simple project, and I didn’t feel it appropriate to ask for a free code review, or whatever. But hey - you asked, so figured it was worth a shot in case anything obvious stood out to you. If I get desperate enough, I’ll pay someone for a deeper level of help!

Thank you (and thanks to all) for the insights and advice, etc. I have learned a ton more than I set out to or expected from this thread!