Improve the numpy API to make it more flexible, compatible, object-oriented

Hello everyone.
I write this because even through I’m using numpy and scipy a lot in my programs, I don’t find it very satisfying in every use case.

So I started thinking about ways to make it better. I wrote this manifest.

To start thinking about the API I asked the following questions: What in definitely is a numpy array ?
answer: a buffer
It’s just an object to put numbers in in an optimized way that allows compiled operations on it, thus fastening computation and storage.

This was the mainline

The manifest is more complete than this post, but here are the features the proposed API should add:

  • extensible (list-like) optimized arrays

  • stacking arrays without copying

  • arrays not owning the data
    allowing to use any object with the buffer protocol as an array, without any copy

  • array elements type (dtype) can be any user-defined class
    including python class or compiled library type (think about

  • multiple array types
    with common functions and additional methods specialized for the particular array

  • object oriented and subclassable types

  • possibility to add custom optimized operations
    so if you want to rewrite some operations like array.add or some math functions using a JIT compiler or a C-module, you can

  • non-buffer arrays, like databases from files

participation, crisitcism and ideas are welcome

The main motivation for this is to try provide a more convenient and powerfull numpy to the community :slight_smile:

What do you think of such an API ?
Do you have folks some ideas of what to add to this API ? (can be just function name changes, or fundamental design concerns)
Is some interested in implementing this together with me in a future ?

There is a numpy mailing list which would be the best place to direct this sort of idea. If it gains traction you can work on a Numpy Enhancement Proposal (NEP). See here for more details:

Oh :open_mouth: thanks for the advice ! I didn’t noticed numpy used a similar system to PEP

The thing is this new API is hardly incompatible with the existing structures in numpy. I mean for instance dtype is intended to serve the same purpose but the features the API propose with it makes it impossible to implement as an extension of the current numpy. same thing for extensible arrays
This API is mostly made of breaking changes.

I guess it is still a good idea to submit it to them :slight_smile:

You really need to go to the numpy community, do some more research, and then go from there. But a couple comments:
numpy arrays are more than an API: the implementation is very much built in, and changing it would make for massive incompatibilities. I like to think of numpy arrays as two things:

  1. A wrapper around a block of data (as you say, a buffer) – more specifically, “strided” data. It can be used as a way to interact with code written in other lanagues, C, C++, Fortran, more recently Julia, Rust, …

  2. A Python nd-array object – this is the API that you see from Python.

If you change the Python API much, then it will no longer be compatible with mountains of Python code. If you change the underlying representation it will be incompatible with mountains of extension code. So the numpy community has a challenge when trying to move the library forward!

(note: you could make a new Python ndarray object, and if it uses the enhanced buffer protocol, you could at least get access to many compiled extensions)

As to your specific ideas: Some are already done, some are being worked on, and some are essentially impossible without a massive (breaking) restructuring.

Thanks for the answer and for having read my specific ideas !

I totally agree with you description of the two sides of numpy (internal structure and python API). And of course some of my ideas are just incompatible with the current internal structure of an ndarray. :slight_smile:

But I think if some people are interested in this project we can work together to create something like an alternative that can become a new way to go in the new programs. Of course the current numpy will last long as it’s widely adopted.
This API and structure change is so big that it’s pointless to try to move numpy to it. Better to build something new aside.

I think that fundamentally, buffers of arbitrary elements like structured numpy arrays, and ndarrays for maths are different problems.
It shouldn’t be handled by the same class.

That may be what made numpy grow in complexity with time, and that now would makes it difficult to handle completely by some extensions (I’m thinking to rust) and uneasy to extend and subclass :thinking: