Buffer protocol and arbitrary (data) types

Right, I don’t want to push that here, and it wouldn’t be something for Cython to support. One could do such extensions (just like supporting to export device memory) by extending the Py_buffer struct with new fields.
You can do that safely in a backwards compatible way that works on current Python, NumPy, Cython by introducing a PyBUF_EXTENDED and pre-initialize any new fields. All current consumers just ignore the flag (which is fine).
(Downstream can backport the flag and extended struct to old Python versions.)

That would be a distinct discussion. But, based on that thought: if we have concerns about safety, one could add a PyBUF_EXTENDED_FORMAT request flag to ensure new format strings will never be seen by current consumers.
The downside would be that memoryview(requires_special_format) fails. That isn’t a show blocker but wrapping in memoryview is a nice pattern to simplify ownership tracking (numpy does this).

but it becomes the user’s own problem to work out if they need to allocate memory when they modify the struct).

Right, for types without references this wouldn’t matter. I guess it would be cool if it is at least plausible for Cython to be extended in a way to help dealing with embedded references (reference counting for embedded objects or even custom allocations).

In the simplest case, Cython would have to check if the format matches a user provided format string exactly. It would be cool if there is a plausible extension that numpy.datetime64 can match any unit and expose it neatly to the author of def func(datetime64[:] times).
In the case of datetime64, the possible units are limited so this could also be solved with a union type, though.

I think it would be better if the Python standard could be extended (even if it was just something like a code to indicate “mystery structure of size X”)

I agree that it would be best if Python prescribes it. We can already spell “X random bytes with name Y”, but to me it seems safer to:

  • Use a currently invalid format string (“mystery struct” isn’t just random bytes with a name)
  • Prescribe a naming scheme to ensure clashes don’t happen (e.g. include the defining module name)