As far as I understand, PyList_SetItem “steals” a reference to item. According to the CPython source code, even when the function returns an error (value < 0), it still “steals” the reference to item—meaning item’s reference count is already decremented internally in the error path.
If that is the case, then the additional explicit Py_DECREF(item) in the user code would cause a double decref. This could lead to the reference count dropping below zero and potentially trigger undefined behavior or memory issues.
My questions are:
Is the above pattern actually incorrect?
What is the correct way to handle item when PyList_SetItem fails?
If a double decref does occur, what kind of problems might it cause in practice?
I would appreciate clarification from the community or references to the official CPython documentation on this matter.
Usually you do nothing - you no longer own the reference.
It will usually crash. (Although of course SetItem rarely fails because you usually meet all the preconditions. So mostly you don’t see this code-path)
There’s generally an expectation that a call will be all-or-nothing, that if the calls fails, nothing will have changed.
PyList_SetItem doesn’t follow that expectation.
Yes, it’s documented, but given that the OP has seen many third‑party CPython extensions that get it wrong, maybe the documentation needs to be more explicit.
Instead of:
This function “steals” a reference to item
perhaps:
This function always “steals” a reference to item, even on error
The text at that link doesn’t unambiguously say what happens to the item’s refcount on failure (since the sentence also mentions something that only happens on success). We should document exactly what it does since we can’t possibly change it – we’d break all the existing code that uses it correctly.
I personally think that the current behavior (always DECREF) is the more ergonomic of the two design choices for this API, given the choice of DECREF’ing on success. (The choice to steal is understandable but confusing since so few functions steal an argument – in modern times we probably would have marked this fact in the API name, e.g. PyList_SetItemStealing. But it’s too late for that and it’s not worth deprecating anything for this small wart.)
I’ve noticed that a significant number of developers have made the same mistake. It’s highly likely because the documentation regarding whether the successful execution of this function affects the reference count can be easily misinterpreted.
For what it’s worth, if you look in the C error paths for almost any Python C API usage, I bet you’ll find bugs like this. Messing up error handling is one of the most common sources of C API bugs and it’s the number 1 thing I look for in NumPy C code reviews.
Error paths also tend to be poorly tested, since it often requires constructing an elaborate setup to trigger errors. Most people don’t bother, so we have lots of buggy error paths in the ecosystem.