I am the author of tinyarray, an extension module that provides small arrays of numbers that behave like built-in tuples of numbers (immutable & hashable), but offer numerical operations à la NumPy. In addition, tinyarrays are significantly faster and more memory efficient than both tuples and NumPy arrays.
Tinyarrays, like tuples, are variable-length objects and contain a PyVarObject structure. This avoids a second memory allocation per object.
In current Python, PyVarObject is just PyObject with one additional field: ob_size. It seems that this field is supposed to hold the number of elements of the variable-size object, and this is how this field is used by tuples for example. However, other types like ‘PyLongObject’ use to to store, among other things, the sign of the number. From the point of view of the interpreter this is just arbitrary data.
Inspired by longobject, tinyarray uses ob_size in similarly creative ways and this has been working great for many years so far. Indeed, as far as I can tell the CPython interpreter does not use ob_size in any way by itself. It seems to be just a field that may be used in any way a type likes. This seems to be confirmed by the (somewhat scarce) documentation.
Now for my question: I noticed a (laudable) ongoing effort to Make structures opaque in the Python C API. As far as ob_size is concerned the aim is for all access to this field to go through the functions/macros Py_SIZE and Py_SET_SIZE. Is my understanding correct that it is still OK to store arbitrary data there as long as it fits into a Py_ssize_t?
In other words, in the process of making objects opaque, why not eventually get rid of ob_size completely and make storing the number of elements of a variable-size type an implementation detail of that type, as it (informally?) seems to be the case anyway?
I would like to change the internal layout of tinyarray’s data and before I do that I would like to verify I understand how ob_size is supposed to be used.
We can’t prevent you from putting anything there, but if you’re getting creative, it would be best if you use your own field for this – there’s no advantage to letting others see the value if they can’t interpret it :)
AFAIK, Python assumes the field contains the size in some non-critical situations, like the default implementation of __sizeof__. (You do override that, right?)
That documentation is quite misleading, sorry. It’s for type objects, rather than instances. Docs for instances are missing entirely :(
I’m trying to become an expert on these matters, but I don’t think I can document ob_size well – though I doubt anyone can. This post is just my opinion, but if anyone thinks it’s incorrect I’d love to hear it. Eventually I will find the time and courage to document ob_size :)
This is fine. We’re considering doing just that for a future refactoring of the int type (internally PyLong). The use of ob_size to store the size (what __len__ returns) is just a convention. You may have to override a few behaviors whose default implemenation uses ob_size, you can use the current int type as an example (since it already abuses that field).
Well, yes, but then the space taken by ob_size will be wasted, at the very least the sign bit. (And it’s not about a single bit: due to alignment storing an additional bit takes up to 8 bytes.) Or are you suggesting implementing something equivalent to PyVarObject without incorporating that structure and not calling PyObject_NewVar in the constructor?
Sure, it has been working like this since 2012 or so. It’s just that now I’m looking for a way to cheaply store another bit somewhere, and seeing the effort for making structures opaque I started asking myself whether I’m not tricking too much.
It sounds like that would be best, but, the API is not quite there to make it easy. Let’s treat it as a CPython API design idea rather than than a recommendation to you :)
I mentioned it in a recent related discussion.