PyUnicode_FromKindAndData memory ownership semantics clarification

From the documentation:

PyObject *PyUnicode_FromKindAndData(int kind, const void *buffer, Py_ssize_t size)
Return value: New reference.

Create a new Unicode object with the given kind (possible values are PyUnicode_1BYTE_KIND etc., as returned by PyUnicode_KIND()). The buffer must point to an array of size units of 1, 2 or 4 bytes per character, as given by the kind.

If necessary, the input buffer is copied and transformed into the canonical representation. For example, if the buffer is a UCS4 string (PyUnicode_4BYTE_KIND) and it consists only of codepoints in the UCS1 range, it will be transformed into UCS1 (PyUnicode_1BYTE_KIND).

To be clear, the buffer will always be copied (to memory that is fully owned by the resultant PyUnicode object) even if it isn’t transformed, right?
If not, how am I intended to know whether to free a dynamically allocated buffer after the call? (For that matter, would it mean I can’t use an automatic-storage buffer that doesn’t outlive the PyUnicode object?)

1 Like

More correctly, the input buffer is copied and, if necessary, transformed into the canonical representation.


I hadn’t read the description that closely, so I didn’t notice that its phrasing is misleading. I just assumed, correctly, that it would always copy.

Good to know.

(Maybe this is a Documentation issue, then.)

I agree that the doc is misleading, “is copied” should be removed from the doc. Does somone want to propose a PR?

Vice versa, “is copied” should stay, but without any “if”.