Deprecating the direct use of str internals, e.g. PyASCIIObject, PyCompactUnicodeObject, PyUnicodeObject structs

Yes, I know. But that’s not the point.

The point is that if we want go ahead hiding type structs from C extension writers, we need to provide an alternative way of adding more data to such objects at a C level, both for object types which do extend the size of the object to store variable sized data and for the more common ones which don’t.

If I understand correctly, you want to put the new data between the end of the static entries in PyUnicodeObject and the variable sized part, right ?

I don’t think that’ll work, since the standard Unicode APIs will happily overwrite your added data, since they believe the variable sized part starts right where you just put your new data.

A better way is to not touch the initialization logic and add your data at the end, after PyUnicode_New() has done its work and the object has been finalized by adding data to it. This will require replacing the object type with the subtype (aka subclass) and then possibly reallocating the object to make room for the extra data after the variable sized part (this can be avoided by asking for some extra room in the object when allocating it).

But regardless, we need a generic non-hacky solution for these things.

1 Like