PEP draft: Safer mutability semantics for the buffer protocol

Thoughts on the protocol addition:

  1. If you ensure the returned buffer also flags immutable you can successfully gamble that no current consumer cares about additional flags being passed. This is true at least for most big producers: NumPy/Cython and also Python itself, I am very certain.
    I can see Python not wanting to condone such behavior, but there is little to lose if you want to early opt in.
  2. I very much like the idea of adding new capability flags: it is low cost and would allow adding new features without Python API changes in the future (users can backport new flags). I think this alone is reason enough to just add that new flags slot (even without a concrete plan).
  3. Having overlap detection helpers or flags in Python seems unnecessary to me. Not saying it can’t be useful, but in pratice overlap is rare and complicated overlap even more so. In NumPy I prefer at least defaulting to read-only for self-overlapping arrays (not that it is always true).
  4. I am curious if there are thoughts on alternative ways to handle the locking (effectivel?). This does not allow holding a view for a long time. (I am totally fine with that, though. The use-case here is simply not taking a long lived view.)
  5. I do wonder a bit if “exclusive” is just immutable+writable, although maybe it is clearer even if it isn’t?

I can see immutable as a useful concept in general. Because it is stronger than not writable (that forbids writing, but doesn’t strictly guarantee it cannot be changed: I suspect everyone will currently consider that a user problem, but I could see that changing eventually. Especially since copy on write semantics are becoming more common with pandas adopting it) and JAX is immutable also (but probably doesn’t care about pushing the subtleties to the user).

About NumPy itself:

  1. NumPy does not have a locking mechanism on array data. We do have a check for writability that which may be a possible locking point. I suspect we may need such a mechanism for object arrays in a no-gil world, although I am not certain that would actually live at the same place (it might fit better on the dtype rather than the views).
  2. NumPy does not currently keep track of views. Keeping track of arbitrary views and their overlapping would require some new data structure to keep track of a tree of views?!

In other words, it seems like a fair bit of work necessary to have enough bookkeeping so that NumPy could guarantee immutable/exclusive access in all but the simplest cases.
That isn’t a problem, just a warning that the API addition here seems like the easy part.

PS: In general, I would be a lot more interested in extensions around additional datatypes or device support but that is just nerd sniping to see if I get some inertia to drive those.

2 Likes