PEP draft: Add PyResource callback C API to close resources [WITHDRAWN]

vstinner · July 26, 2023, 6:19pm

Hi,

I wrote a PEP draft, but I don’t know if what I propose even makes sense, so I was too shy to assign it a PEP number yet

=> You can read it at: PEP – Add PyResource callback C API to close resources.

Abstract

Add the PyResource structure to the C API: callback to close a resource.

Add new variants using PyResource of functions returning pointer:

PyResource_Close()
PyByteArray_AsStringRes()
PyBytes_AsStringRes()
PyCapsule_GetNameRes()
PyEval_GetFuncNameRes()
PyUnicode_AsUTF8AndSizeRes()
PyUnicode_AsUTF8Res()

These functions keep the resource valid until PyResource_Close() is being called.

References

See also proposed PyObject_AsObjectArray() function

See also my issue proposing to add PyObject_AsObjectArray() function which uses propoposed PyResource API to “close” the array (technically, it’s more a “view” on the array).

The implementation shows how PyResource API can be used to untrack/track an object in the GC if it was tracked, or use a different “close callback” if it was already untracked.

The point here is not to discuss if it’s good or not to untrack/track an object in the GC, it’s just to show that such API gives more freedom on what can be done when “creating” and “closing” a resource.

pitrou · July 27, 2023, 9:51am

You should perhaps invite the PyPy and HPy developers in this discussion.

vstinner · July 27, 2023, 2:38pm

I announced the PEP on the HPy Discord channel I exchanged with steve-s who also commented the PR directly.

encukou · September 7, 2023, 9:08am

I’m worried that it’s too much like Py_buffer. AFAICS, the difference in use cases is that for PyResource the data format is known, and doesn’t have to be described.

Is that enough a simpler struct (which will presumably be faster, since it doesn’t have to initialize as much data)?

One change that might be worth it is to add a separate field for the argument to close_func, so that it doesn’t have to be the same as data.

Why? I imagine this will be usually implemented in one tf two ways:

Getting data from objects that happen to have an immutable copy of it in the correct format:
- retrieving the PyResource increfs the exporrting object
- data points to the buffer
- close_func is Py_DecRef
- close_arg is the object
For other objects:
- retrieving the PyResource allocates a buffer and fills it
- data is that new buffer
- close_func is the corresponding free
- close_arg is data

Another change would be adding a field for the length. It seems that in most uses the *Res functions must be paired with a corresponding “get length” call so the buffer can be used safely. And in case the exporting object doesn’t have the data in the correct format, API to get the length will be tricky to implement. (And for strings, separate length API is a footgun: you need to get the number of utf-8 bytes rather than the str length…)

Those additions would push PyResource closer toward the complexity of Py_buffer. Will we still want to maintain a parallel API for the simpler case?

A shortcoming in this design is that in some uses the data will be copied twice. This will happen if:

retrieving the PyResource must allocate a buffer
the consumer needs it in a different buffer (for example, a Py_Bytes)

But most of the uses of PyResource involve small data, like function names, so maybe it’s not worth it to overengineer for this case. Types with bigger data can always provide ad-hoc “copy into" functions.

vstinner · September 8, 2023, 4:56pm

Oh, I closed my draft PEP PR two weeks ago, but I forgot to close this discussion. How can I close a PEP discussion?

vstinner · September 8, 2023, 4:58pm

I agree If I come back to this topic, I will consider investigating to use Py_buffer to expose an array of Python objects: PyObject**. Apparently, some projects already use Py_buffer for that!