API for mutating immutable objects is on the record for being problematic, it wouldn’t be added today. More importantly.
The protection is this:
The data must not be modified in any way, unless the object was just created using PyBytes_FromStringAndSize(NULL, size)
.
(Just a note in the docs – but this is C; mutating a read-only buffer is also “only” banned like this.)
The rule is that you can’t modify the data once you expose the bytes object to Python code. That precondition doesn’t make sense for an argument to a buffer export function.
One possibility for such a zero-copy function is this:
- It would “steal” its argument, so its caller can’t use it any more.
- To make sure a bytes object can’t be retrieved from
Py_buffer.obj
(or memoryview.obj
), it would set the type to a new class that has the same memory layout as bytes
, but no functionality.
- In ~Python 3.2 this would be safe (equivalent to destroying the
bytes
and creating a new object) but nowadays you’d need to check with faster-cpython and free-threading teams if there are new assumptions this would break.
- To make the optimization transparent externally, in the refcount>1 case it would need to copy the data.
I don’t see a way to expose that as a Python function, as those can’t steal their arguments.
If we require extension authors to call API mark bytes
as “finished”, we might as well use a “writer” pattern – start with a struct
with the same memory layout as bytes
; fill it up; then initilaize the PyObject
header.
Victor’s recent proposal for this was rejected for unrelated reasons.
Yeah. That side of the equation is pretty clear.
I should have said this earlier, but: thank you for looking into this!
Sadly, I don’t see a solution myself. I hope you arrive at one, and I hope that pointing out the issues I see at this stage is helpful.
bytes
data directly follows the header, so this would require reserving a bytes
header in every bytearray
. Or adding a pointer to every bytes
+ a pointer indirection to every operation on bytes
.
Looks like some variation of this is a possibility:
That’s a trade-off between speed and memory usage; the diff size doesn’t matter that much.
I don’t know whether the trade-off is worth it, but, here I don’t see any important invariants broken!