Buffer protocol and arbitrary (data) types

@ngoldbaum and I have written a PEP draft for this proposal with some small changes/extension and @da-woods was so kind to help with a Cython PoC implementation.

You can find the draft here: PEP Draft buffer protocol custom dtypes - HackMD
The PoC implementations can be found here for numpy and here for cython (NumPy one requires the Cython one to build right now, but this is not strictly necessary).
(Just to note, I have thoughts on further extensions [1] but that is for a different thread!)

To summarize the main points (of course flexible on details):

  • We use [] for such a custom dtype. To deal with aliases, we decided to include ; as a way to include multiple aliases within the brackets (hopefully not used much!).
  • Each type identifier always starts with unique_name$, e.g. numpy$... after which we have arbitrary printable ascii characters (minus ;[]). Pointers, etc. will need to be encoded.
    For example, NumPy can then define that a type name follows after the $. (EDIT: Finish sentence)
  • We have double checked that none of the large packages (Cython, NumPy, Python, torch, …) have problems with this.[2]

We would be happy for feedback, or hoping to create a proper (pre?) PEP out of this soon!


  1. I am thinking about extending the protocol further to allow storing more things, including non CPU memory, here is an earlier start. But I think this is much simpler and more directly useful with NumPy/Cython (and further extensions become more useful if they can use this). ↩︎

  2. Some tend to ignore the format fully, this is already unsafe e.g. for objects arrays. ↩︎

3 Likes