I checked how duckdb uses PyUnicode_4BYTE_DATA(). They use it to create Unicode instance, not reading from.
Additionally, they can use PyUnicode_FromStringAndSize(). But they don’t because it is slow.
Maybe, we need to check that PyUnicode_FromStringAndSize() is really slow than their code and why. (UnicodeWriter? Checking lone surrogate?)