Much more minimal proposal now:
bytearraycontainsbytes- An explicit API that equates to
.to_bytes()+.clear(), my current favorite of the three bike sheds is.take_bytes([n]).
So there isn’t any implicit detach or need to keep track of extra references. With that, cases in asyncio and other code can remove the end of function copy if they measure and find it worthwhile. It would be nice to be able to do other cases, but as you pointed out there are complications.
Already, PyByteArray_AS_STRING() returns a non-modifiable block of bytes if the underlying size is 0 / default initialized (there is a shared buffer instance in that case). The buffer location changes on resize today (see discussion around memory allocation / PyMem_Realloc() a bit earlier in thread), so the return of PyByteArray_AS_STRING() is already non-constant over lifetime in presence of resizing. Given that, PyBytes_FromStringAndSize(NULL, size) + _PyBytes_Resize(bytes, size) I think can do what is provided today. The ob_start member used by PyByteArray_AS_STRING stays valid, and changes at the same times.
From my perspective, bytearray tends to be 1024+ bytes in size, DEFAULT_BUFFER_SIZE is 128kb, so memory overhead of an extra PyVarObject vs. pure buffer is non-zero but not a large memory overhead. There is more writes to set more fields in the PyVarObject case, but my measuring compared to the copy of large buffers which is today required, I think most the time it is better, particularly for large I/O blocks.
The memory layout of PyByteArrayObject gains a new pointer at the end. The ob_alloc field in it today technically becomes redundant (but I don’t remove it in my PR). So slightly bigger base object, but not a lot.