Some silly type annotations

I like using type annotations to document physical units.

def get_length() -> 'μm':
    ...
def get_volume() -> 'μm³':
    ...
def get_temperature() -> '°C':
    ...
def have_a_nice_day() -> '😀':
    ...
2 Likes

I actually really like this idea. Taking this a little further, you could actually make this compatible with type checkers and have your units too!

from typing import Annotated

def get_length() -> Annotated[int, 'μm']:
    ...
def get_volume() -> Annotated[int, 'μm³']:
    ...
def get_temperature() -> Annotated[int,  '°C']:
    ...
def have_a_nice_day() -> Annotated[str,  '😀']:
    ...

Taking this even further, you could actually use NewType and subclassing to actually have the type checker enforce your units and detect where they were being used incompatibly, at least simple combinations…in fact, with enough effort with generics and covarience/contravariance you might even be able to do so fully. I seem to recall a popular package out there previously that actually used annotations for this and thought it was pint, but it doesn’t so not sure what that was.

Makes me wonder if type annotations could be married with a package like magnitude (magnitude · PyPI) to transparently enforce proper units at runtime.

Now I’m going to describe another, more serious, use of type annotations.

I wanted to write a python program that could run very fast, and utilize both a CPU and a GPU. The basic technology to do this has been in place for some time: CUDA & CuPy manage the GPU, while the Numba Just-In-Time (JIT) compiler has backends for both CPUs & GPUs.

However, writing programs for GPU’s was still not a fun experience. The biggest issue is that to use a graphics card, your programs internal data must be structured in a certain way. Whereas normal object-oriented programs use arrays of structures, graphics programs should really use structures of arrays; (AoS and SoA - Wikipedia). Using the SoA format is tedious, error prone, and unfamiliar to most of intended users (who -no offense- are not experience programmers).

So I wrote a program to solve this problem! I wrote an in-memory database, drawing inspiration from Django. My database implements the users classes, except instead of storing the class’s data in python’s __dict__ it stores the data in a private SoA and provides public setter/getter properties.

Here is an example usage:

db = Database()
neuron_data = db.add_class('Neuron')
neuron_data.add_attribute('voltage', initial_value = -70)
Neuron = neuron_data.get_instance_type()
my_neuron = Neuron()
print(my_neuron.voltage) # -70
# But under the hood "my_neuron" does not store the voltage!
# Instead it stores an index into the database:
print(my_neuron.get_unstable_index()) # 0
# And the database stores all of the voltages for all of the Neuron instances:
print(neuron_data.get_data('voltage')) # [-70.]

Then I made JIT compilation wrappers that understand how to use my database. Under the hood I use numba to do the actual compilation. This turned out better than I had hoped. The steps to JIT compile are as follows:

  1. You apply the @Compute decorator to your function or method.
  2. The @Compute wrapper reads your python source code using inspect.getsource()
  3. Parse the code into an Abstract Syntax Tree (AST)
  4. Transform the AST; rewrite all references to objects which are stored in the database.
    This is where the type annotation come in! The type of a methods self argument is obvious; but all other references must be annotated with the name of the objects type.
  5. Compile the AST into code and run it through numba’s JIT compiler.

For example:

class Neuron:
    __slots__ = () # Required BC all data will live in the database, not here.
    @Compute
    def advance(self):
        self.voltage += 1.0

db = Database()
neuron_data = db.add_class(Neuron)
... # The rest of the example is the same as above.

is transformed into GPU friendly code:

@numba.cuda.jit
def advance(index, Neuron_voltage_array):
    Neuron_voltage_array[index] += 1.0

And finally, here is an example which requires type annotations:

class Neuron:
    @Compute
    def foobar(self, other_neuron: 'Neuron') -> float:
        my_self: 'Neuron' = self
        return my_self.voltage - other_neuron.voltage


Another use for type annotations is for returning data from GPU compute kernels.
Normally, GPU kernels can not return anything; only “device” functions can return values and they can only return to other GPU kernels or device functions (but not back to the CPU caller).

My @Compute wrapper fixes this by reading the return type annotations, allocating a buffer to hold the returned values, implementing shim-code to call the users GPU kernel and store the returned values in the buffer, and finally it returns the whole return-value buffer (which is still located in GPU memory).

A bit of a n00bish side question, but aren’t widely used data structures like Numpy ndarrays, Pandas DataFrames, xarray xarrays, etc. effectively SoA, greatly accelerated over their non-vectorized counterparts (since all the expensive ops are already highly optimized, compiled C, and well supported by Numba (which Travis, also being the creator of Numpy, built with scientific computing in mind) alongside HPC/compute libraries like Dask and various GPU acceleration packages? How does the performance, usability and capability of the approach you’ve outlined (which is interesting, to be sure) compare to something more standard like that (and that I’m naturally a lot more familiar with, heh)?

I’m glad you asked!

I found that as I was writing my application: the SoA/GPU related code was tedious, error-prone, and it touched everything that used the data. The worst part is that most of that code is really really boring, most of it did not in any way relate to the application other than saying “plz use the GPU”.

The primary motivation behind the database was that no one likes programming with structures of arrays. The existing solutions did not spark joy, so I put together my own.

  • Many secondary goals materialized as I started to pull all of my data into a centralized place.
    There are a lot of features on the database which can be applied to any to its contents. New features can be added the the database and easily applied to all of my data. Some combinations of features have synergy. I don’t want to start listing specific features because that’s off topic and some would need explanations too.

Performance: @Compute should perform exactly the same as a hand written equivalent Numba/SoA program, because under the hood it is just using Numba. Hopefully there are fewer bugs in the computer-generated code than in my old hand written code.

Usability: This is where my database & @Compute really shine! My goal with the database is that users should not need to understand “SoA” in order to use it. To this end I’ve carefully presented two distinct sets of APIs: one using object oriented notation, and a second which provides the raw data arrays.

Capabilities: The @Compute is mainly for writing GPU kernels & device functions. It is not capable of making complex GPU programs: it lacks synchronization primitives. Also, @Compute can not interact with lists or sparse matrices. The database on the other hand provides an API for getting data arrays on the GPU, so in theory you could interface your own GPU kernels with the database.

Here is a link to my project, see module “neuwon.database




PS: Here is another paper cut from using numba. Naturally, my @Compute wrapper fixes this issue. Numba provides two different JIT wrappers for CPU and GPU code. There is no easy way to convince a method annotated with one to execute on the other, so you end up with code looking like this:

def foobar(): ...
foobar_cpu = numba.jit(foobar)
foobar_gpu = numba.cuda.jit(foobar)

My database allows for switching between the CPU & GPU as easily as:

with database.using_memory_space('cuda'):
    ... # Inside of this code block, everything will happen on the GPU.

# Outside of the code block it reverts back to whatever it was before.

And of course the database keeps track of where each array is located and moves it between CPU & GPU when it really needs to (it is a lazy operation, performed when requested).

1 Like