Leaky abstractions and CPython performance

Here is a list of affected projects in top 1000 PyPI projects.

pypi/JPype1-1.4.0.tar.gz: JPype1-1.4.0/native/common/jp_exception.cpp: PyTracebackObject *tb_create(
pypi/JPype1-1.4.0.tar.gz: JPype1-1.4.0/native/common/jp_exception.cpp: PyTracebackObject *last_traceback,
pypi/JPype1-1.4.0.tar.gz: JPype1-1.4.0/native/common/jp_exception.cpp: PyTracebackObject *traceback = (PyTracebackObject*)
pypi/JPype1-1.4.0.tar.gz: JPype1-1.4.0/native/common/jp_exception.cpp: PyObject_GC_New(PyTracebackObject, &PyTraceBack_Type);
pypi/JPype1-1.4.0.tar.gz: JPype1-1.4.0/native/common/jp_exception.cpp: PyTracebackObject *last_traceback = NULL;
pypi/JPype1-1.4.0.tar.gz: JPype1-1.4.0/native/common/jp_exception.cpp: PyTracebackObject *last_traceback = NULL;
pypi/onnx-1.12.0.tar.gz: onnx-1.12.0/third_party/pybind11/include/pybind11/detail/type_caster_base.h: auto *trace = (PyTracebackObject *) scope.trace;
pypi/pybind11-2.10.0.tar.gz: pybind11-2.10.0/pybind11/include/pybind11/pytypes.h: auto *tb = reinterpret_cast<PyTracebackObject *>(m_trace.ptr());
pypi/Cython-0.29.32.tar.gz: Cython-0.29.32/Cython/Utility/Coroutine.c: PyTracebackObject *tb = (PyTracebackObject *) exc_state->exc_traceback;
pypi/Cython-0.29.32.tar.gz: Cython-0.29.32/Cython/Utility/Coroutine.c: PyTracebackObject *tb = (PyTracebackObject *) exc_tb;

Regarding tensorflow, the link you posted is a random old copy of tensorflow, the main branch of tensorflow is not affected. I especially checked the scientific projects before creating the issue and none of the numpy and friends are affected. JPype1 fix a is a simple one line change.
As for blender, they seem to be using private APIs for things for which public APIs exists. Regardless the change is a one liner.

Also if you consider the affected projects to a lot then also consider that the performance improvements will be observed by virtually every python program and application which uses try except and there are a lot of those too :slight_smile:

3 Likes

Agreed that this doesn’t look very serious. I wonder, would it be better to set it to zero to indicate “not yet computed” or is -1 better? Most of the time if people don’t fix their code they will just report a bad line number which isn’t terrible. Is there a difference between 0 and -1?

1 Like

Note that “no documentation” is not a very good argument, and PyTracebackObject looks like a good example for that.

According to git log, the Python API for tracebacks was first documented around 1992. The Python API’s tb_lineno attribute seems to have been fully documented since then.
In C, there’s a struct _traceback with a int tb_lineno member. What could it mean? I was 6 in 1992, but I imagine things were pretty clear to anyone working with tracebacks from C.

For those on this discussion, the steering council has rejected the proposal.

Fair enough. I don’t see how we can add a deprecation on a struct member name in C, so I think this means the whole proposal is scrapped.

You can surround the field like this:

and direct access will print a warning.

4 Likes

This is the macro, for reference:

And you’ll need to put the Py_DEPRECATED in #ifdef Py_BUILD_CORE. (I don’t think we have a macro for that, yet.)

Deprecating the whole struct – and adding setters/getters/initializer – is also an option.

True. But it does help stability a lot.
FWIW, the API for creating objects could be made more stable – by adding new function each time an argument is added, or by creating with good defaults for the new members and requiring setters. It’d be a hassle for code objects since they change too often, but I wouldn’t be afraid of it here.

I think it’s up to Kumar whether he wants to proceed.

2 Likes

I am strong -1 on adding any initializer or setters for PyTracebackObject. If you need to create a traceback, call the type with args just like the Python level TracebackType. There is no need for an initializer C API for every structure at C level unless necessary.

Just by adding function each time code object changes won’t make it a stable API. It will cause more issue than it solves both on the user side and on our side. We have code.replace which can be used. I am -1 on adding any function to set any code object fields or a C level initializer. Just call the Python level type with the C API.

2 Likes

The deprecation work is now tracked in GH-96709.

The JPype project has some isuses related to tracebacks to port the project to Python 3.11: Build for Python 3.11 ¡ Issue #1086 ¡ jpype-project/jpype ¡ GitHub IMO it would help to add a new C API function to Python for creating a traceback object. JPype needs such API:

PyTracebackObject *tb_create(
		PyTracebackObject *last_traceback,
		PyObject *globals,  // locals is set to NULL
		const char* filename,
		const char* funcname,
		int linenum);
2 Likes

Cython uses the PyTracebackObject* type to handle exceptions, see for example my rejected PR: https://github.com/cython/cython/pull/4672

I think all Cython needs from the PyTracebackObject is a way to get the frame. Although in principle it could probably do that via Python attribute lookup if it came to it.

What would the proposed tb_create() function do? What does it need globals for?

See the current JPype implementation of this function: jpype/jp_exception.cpp at 82a7658277b36c628ea305ea705cd012d81471dd ¡ jpype-project/jpype ¡ GitHub

Wow, the JPype function does a lot: it creates a dummy code object from file/func/line, then it creates a new thread state, linking it to the frame from last_traceback (if any), then it creates a new frame from that thread state, also passing it the dummy code object and the dict of globals, and finally it creates a traceback object linked to the frame and to _last_traceback.

I can totally see that this doesn’t work in 3.11 (the big comment in the function shows that the specifics were just reverse-engineered from the code and what works in 3.10). It also looks a bit like what _PyTraceback_Add does (but not quite). That function appears entirely undocumented; it was added for expat by @markshannon.

Maybe Mark has an idea of what we can offer JPype?

2 Likes

Cython and template engines like Jinja like to generate frames/traceback including their own filenames and line numbers. Having a simple C API for that would be nice.

So the API you propose is suboptimal—ideally we just give file, line and previous tb, and we modify things so a tb can hold those without dummy frame and code object.

I didn’t propose tb_create() API in Python. I just showed what’s needed in existing projects. If we add a new API, it should simplify tb_create() implementation.