Typing enhancements for type checking values based on physical dimensions: Sorted

Ricyteach · April 26, 2024, 3:43pm

Yesterday I had an idea to create a units library tool (or add type hinting to an existing units tool, like pint) that can detect the erroneous addition and subtraction of incompatible dimensions (of unit-bearing values) at type checking time. For the purposes of engineering/scientific computing, this could be extremely powerful. Currently all such tools I know of only detect those kinds of errors at runtime. CAVEAT: I’m only a civil engineer and am in no way a professional programmer. But otoh I’ve been writing python for 10+ years now.

I am imagining some final code using that unit library could look like this:

from my_unit_library import N, m

x = 1*m
assert str(x.dimensionality) == "[Length]"
f = 3*N
assert str(f.dimensionality) == "[Length]*[Mass]/[Time]**2"

z = x + f  # type checker throws error here because the types of the dimensionality for the two values aren't compatible

The type checker would know that the dimensions for x is [Length], and the dimensions of y is [Length]*[Mass]/[Time]**2 (which is a force), and that you can’t add those two unlike dimensionalities together.

I tried to write a proof of concept of this idea, but I think it isn’t possible with current python type checking tech.

Such a tool would need to know how to combine dimensions correctly when they are multiplied and divided. And it would need to know that order doesn’t matter when comparing dimensionality; for example, Mass X Length is the same as Length X Mass.

Thinking through it, I think all of this could be possible of there were some new type checking tools/“functions” added to typing. I’ll go over that at the end. First more about the motivation.

The idea behind all of this is one could create some kind of DimMultiply and also some kind of DimDivide type, so that you can write code like this (using 3.12 type hint syntax):

class Dimensionality[T]:
    def __mul__[O](self, other: O) -> DimMultiply[T, O]:
        ...
    def __truediv__[O](self, other: O) -> DimDivide[T, O]:
        ...

Unless I’m wrong: all physical dimension combinations in principle can be represented as a single numerator and a single denominator (both optional).

So DimDivide and DimMultiply would work something like this:

DimDivide[A, B]
# returns Tuple[Numerator[A], Denominator[B]]
DimMultiply[A, B]
# returns Tuple[Numerator[A, B]]
DimDivide[DimMultiply[A, B], C]
# returns Tuple[Numerator[A, B], Denominator[C]]

Above, Numerator and Denominator are not much more than aliases for Tuple, but with names to distinguish them as the numerator and denominator of the dimensions.

Then I could use that code like this:

class Length: ...
class Mass: ...
class Time: ...

# this would return Tuple[Numerator[Length], Denominator[Time, Time]]
Acceleration = DimDivide[Length, DimMultiply[Time, Time]]

# this would return Tuple[Numerator[Length, Mass], Denominator[Time, Time]]
Force = DimMultiply[Mass, Acceleration]

# this would return Tuple[Numerator[Length, Length]]
Area = DimMultiply[Length, Length]

# this would return Tuple[Numerator[Mass], Denominator[Length, Time, Time]]
Pressure = DimDivide[Force, Area]

In order to generalize this behavior across all kinds of dimension combinations, one would need to write some kind of DimMultiply and DimDivide function/type that magically creates the replacement types (“aliases”…??), and for the type checker to recognize the result coming from that function/type.

But unless I’m wrong, with current tools it isn’t possible to write a DimMultiply and DimDivide that will allow the type checker to understand that, for example, DimDivide[Force, Area] evaluates to an “alias” (maybe that’s the wrong word) for Tuple[Numerator[Mass], Denominator[Length, Time, Time]].

The algorithm for DimMultiply would be like this (DimDivide would be similar):

arguments: tuple_a_or_atomic, tuple_b_or_atomic
    step 1: determine if a and b are tuples or atomics; if either is atomic, swap it to a single-member tuple inside of a Numerator[]
    step 2: check to confirm that the only contents of the two tuples a and b are at most a single Numerator and a single Denominator
    step 3: combine the a and b numerators, and a and b denominators, together
    step 4: sort the numerator and the denominator
    step 5: return a tuple of the numerator and denominator

There is probably more than one “thing” missing from the type hinting system that one would need to write this algorithm. And maybe the entire thing could be simplified/changed to get rid of some of the complications in the algorithm above, so that it is more possible with python’s existing type hinting tools.

However I think at minimum we would need a new type hinting “function” I will call Sorted[].

Sorted[] would receive a tuple of types as the argument, and return them as a new tuple in a sorted order. This allows for comparing objects with a Tuple tupe containing all the same types, even if they aren’t in the same order. I don’t think it matter what the order is, it just needs to be consistent.

We need this because the type checkers (correctly) care about the order of types inside of Tuples. But when checking dimensionality compatibility, the order doesn’t matter. So I suggest to address this, you could provide the ability to sort tuples and that this ability would allow a library like this to work: it would just sort all the numerators and all the denominators so they compare consistently.

Looking at the rest of the algorithm, there are probably other things we would need to make this possible… but I think this is enough for now.

What do people think? How much of this is possible with current type hint tools? Could it be done without adding anything? I’m pretty sure the answer to that last is no.

Ricyteach · April 26, 2024, 4:15pm

BTW: another idea I had for doing this was to do away with Numerator and Denominator, and instead compose the dimensionality tuples like this:

DimDivide[A, B]
# returns Tuple[A, DIVIDE, B]
DimMultiply[A, B]
# returns Tuple[A, B]
DimDivide[DimMultiply[A, B], C]
# returns Tuple[A, B, DIVIDE, C]

I think this would require some other new type hinting functions: Join[] and Split[]. These would do what they sound like: join and split tuples (similar to joining and splitting strings).

This could help by doing away with the steps in the algorithm I gave above where you have to take things out of and put things into the Numerator and Denominator tuples… which I think would probably require the ability to do multiple unpackings of TypeVarTuples…? Which from what I understand, isn’t likely to happen soon (might be wrong).

mikeshardmind · April 26, 2024, 4:34pm

I don’t think adding a unit-aware numeric type to the standard library is likely to happen (I’d love to be proven wrong, this is my favorite thing about F#), and it will be very hard to argue that dimensional analysis should be supported in the type system in this way (there are a lot of other more broadly applicable things currently missing from python’s type system, some of which would make supporting this in the type system more feasible later on)

I prototyped something out with this a while ago to help with dimensional analysis: https://github.com/mikeshardmind/unit-aware, and you get a runtime error if you try to add/subtract things with the wrong units, units convert properly with compatible operations. It only works with coherent measurement systems (well it works for others, but you’d need to manually apply a scale factor)

In [1]: from unit_aware import Units, UnitAwareValue, UnitsVector

In [2]: UnitAwareValue(120, Units.area) + UnitAwareValue(120, Units.volume)
---------------------------------------------------------------------------
UnitMismatch                              Traceback (most recent call last)
Cell In[2], line 1
----> 1 UnitAwareValue(120, Units.area) + UnitAwareValue(120, Units.volume)

File ~\GitHub\unit-aware\unit_aware.py:289, in UnitAwareValue.__add__(self, other)
    287     if self.units == other.units:
    288         return type(self)(self.value + other.value, self.units)
--> 289     raise UnitMismatch
    290 return NotImplemented

UnitMismatch: Can't add/subtract values with different units

Is this (or something like it) something that would work for what you had in mind?

Ricyteach · April 26, 2024, 4:43pm

That’s not really what I have in mind: I’m not proposing to add unit-aware numeric types or dimensional analysis to the typing system or to the standard library.

I’m suggesting it could be possible for third parties to write such a thing if some of the more broadly applicable tools you referred to are added to the typing system… potentially Sorted[], and maybe Join[] and Split[] (like in my reply to myself above).

There are quite a few libraries now that support this at runtime, such as pint, which is probably the most popular. I’m talking about making it possible for this to work at type check time.

Ricyteach · April 26, 2024, 4:53pm

And I guess I’m asking for help in thinking through how much of this can already be accomplished with current type hint tools, and what specific tools would need to be added in order to be able to write this kind of thing.

I’m pretty sure the most important bit is the Sorted[] bit. But I think there are probably others missing too.

If these tools existed… I wonder if writing DimMultiply and DimDivide might look something like this (some of this syntax is made up!!):

DimMultiply = Tuple[
    *Join[DIVIDE,
    *Sorted[
        *Split[DIVIDE, *tuple_a][0],
        *Split[DIVIDE, *tuple_b][0]
    ],
    *Sorted[
        *Split[DIVIDE, *tuple_a][1],
        *Split[DIVIDE, *tuple_b][1]
    ]]
]
DimDivide = Tuple[
    *Join[DIVIDE,
    *Sorted[
        *Split[DIVIDE, *tuple_a][0],
        *Split[DIVIDE, *tuple_b][1]
    ],
    *Sorted[
        *Split[DIVIDE, *tuple_a][1],
        *Split[DIVIDE, *tuple_b][0]
    ]]
]

But there are lots of bugs in there.

mikeshardmind · April 26, 2024, 5:00pm

I guess what I was getting at is that I don’t think this approach to adding that to the type system is likely going to work well without other things beyond what’s in your original post.

You can think of most units of measurement as a 7-dimensional vector (corresponding to the exponent on the SI representation of that unit)

Addition and subtraction are only valid when the unit vectors are equivalent, Multiplication adds the vectors, and division subtracts them. (there’s more valid operations and transformations, but sorting, joining, and splitting should not need to come into play, nor should representing units with a numerator or denominator…)

Sorted, Join and Split were not the “more broadly applicable” typing features I was considering, but a combination of refinement types, dependent types, and higher kinded types

I’m leaving out a lot here because this is (in my estimation) 3 major typing features away from being supportable, and not all of those have syntax even proposed yet

type Area = Units, when [some expression confirming the specific dimensions]
type Mass = Units, when [some expression confirming the specific dimensions]

You wouldn’t want to be trapped in only the units that got explicitly supported (either in the type system, the libraries you use, etc), you’d want to be able to express the conversions in the type system.

pf_moore · April 26, 2024, 5:02pm

I don’t have any particular view for or against dimensional analysis applications, but the sort of type analysis time calculations you’re suggesting here sound to me quite reminiscent of C++ “template metaprogramming” - which is not intended as a compliment

I’d be concerned to see that sort of complexity get baked into Python’s typing system, which (IMO) already feels very daunting to the non-expert. On the other hand, if what you’re suggesting can be implemented in a way that’s accessible to the end user, then maybe that would be OK. But we’re getting to the point where I imagine people wanting composability, user-defined functions, etc - and that’s how C++ ended up going from templates down the metaprogramming route.

Ricyteach · April 26, 2024, 5:05pm

Both of those responses make sense to me.

In other words: this kind of thing would require a lot of complex features to be added to the python type hinting system. And that would come with downsides.

Hopefully this post will just add to the discussion of trying to balance how many features to add to python type hinting with the kinds of really great things people could do with them if they were added.

I for one would love to be able to have something like this. Even better would be if it could be made possible with a minimum of additional type hinting feature complexity.

a-reich · April 26, 2024, 6:01pm

You might be interested in PEP 646’s notes on possible application of variadic generics to type-checking arrays based on the annotated semantics of their dimensions.

I haven’t thought through your example code so I don’t really understand all the features you’re looking for, but I suspect it’s possible for a library to create a system that has many of your desired functionality just by making extensive use of the currently standardized type system.

Ricyteach · April 26, 2024, 6:04pm

Thanks; yes I have read that closely. It helps to get us there- much closer than before, anyway- quite a bit and I skipped over the variadic issues until the very very end of my fourth post (where I hand-waive away some of the unpacking syntax and ignore how you’d actually have to write DimMultiply and DimDivide using TypeVarTuple).

Still need some more features though! Again, the biggest one being, something like Sorted[]:

a: Sorted[tuple[str, float]] = (1.0, 'x')  # no error

barry-scott · April 26, 2024, 6:28pm

I recall something like this was discussed before a couple of years ago
and ran into a lot of problems. Sorry do not have a link.

A web search for “python units library” will point you to solutions that exists.
For example Pint · PyPI - it’s not type based.

Ricyteach · April 26, 2024, 6:43pm

Thank you. Yes I use pint a lot and it is great. But catching these kinds of errors at type check time would be greatly preferable/extremely powerful. Pint can only catch errors at runtime.

MegaIng · April 26, 2024, 6:52pm

It should be possible to use unary tuple arithmetic to represent the 7 dimensions requires for SI units. This examples only implements three dimensions (length, time, mass), but more dimension should be easy to add.

from __future__ import annotations
from typing import reveal_type, cast, overload

type M = tuple[tuple[None], tuple[()], tuple[()]]
type S = tuple[tuple[()], tuple[None], tuple[()]]
type KG = tuple[tuple[()], tuple[()], tuple[None]]


class Value[U]:
    def __init__(self, val: float):
        self.val = val

    def __add__(self, other: Value[U]) -> Value[U]:
        return Value(self.val + other.val)

    def __sub__(self, other: Value[U]) -> Value[U]:
        return Value(self.val - other.val)

    def __mul__[*aL, *aT, *aM, *bL, *bT, *bM](self: Value[
        tuple[tuple[*aL], tuple[*aT], tuple[*aM]]
    ], other: Value[tuple[tuple[*bL], tuple[*bT], tuple[*bM]]]) -> Value[tuple[tuple[*aL, *bL], tuple[*aT, *bT], tuple[*aM, *bM]]]:
        return Value(self.val * other.val)


a_m = Value[M](1)
b_m = Value[M](2)
a_s = Value[S](1)
b_s = Value[S](1)
reveal_type(a_m)
reveal_type(a_m + b_m) # No error
reveal_type(a_m + b_s) #  Error (works correctly already)
reveal_type(a_m * b_s) # No error, resulting unit type is `tuple[tuple[None], tuple[None], tuple[()]]`

Sadly, pyright is standard compliant and complains about tuple[*aL, *bL] because there are two unpacked TypeVarTuples in there. From what I can tell, this restriction is unnecessary here and could be replaced by a wording “Type Checkers should complain if they can’t uniquely determine how the types are split into tuples”.

Extending this to also handle divide and negative exponents would require a bit more work by choosing some kind of representation for negative numbers that can be worked with - But it should be possible adding no/few new typing features, just removing restrictions from existing once.

I am not really sure if this is something that could be used in production. It would catch the mistakes, but the error messages would be very hard to comprehend.

Ricyteach · April 26, 2024, 6:55pm

THIS IS COOL!! I will play with it.

Ricyteach · April 26, 2024, 7:00pm

And it’s a good point about the error messages. That’s not something I even thought of until now, but probably all attempts to solve this problem in this way would suffer from awful to understand error messages.

Maybe that’s a job for a plugin to solve.

Ricyteach · April 26, 2024, 7:29pm

EDIT: I just realize that I was definitely misunderstanding some things here. Should probably just delete this response.

But I’m still not sure how you could also get division working using this approach, though.

------------ORIGINAL RESPONSE BELOW, PROBABLY USELESS--------------

Cornelius: this is really neat but it only seems to work for dimensions in the numerator. I can’t see how it could work to implement division. There is no representation of the denominator, right?

BTW I am trying to think through this on my own since the “multiple unpacking” syntax isn’t supported so I might be incorrectly understanding how this works.

But if I understand correctly, Area would look like this:

type Area = tuple[tuple[None, None], tuple[()], tuple[()]]

Where the tuple[None, None] in the first position means “length squared”.

I guess you could duplicate those 3 positions to represent negative exponents (so the first 3 positions are positive exponents, the second 3 are negative exponents):

type M = tuple[tuple[None], tuple[()], tuple[()], tuple[()], tuple[()], tuple[()]]
type S = tuple[tuple[()], tuple[None], tuple[()], tuple[()], tuple[()], tuple[()]]
type KG = tuple[tuple[()], tuple[()], tuple[None], tuple[()], tuple[()], tuple[()]]
type M_inverse = tuple[tuple[()], tuple[()], tuple[()], tuple[None], tuple[()], tuple[()]]
type S_inverse = tuple[tuple[()], tuple[()], tuple[()], tuple[()], tuple[None], tuple[()]]
type KG_inverse = tuple[tuple[()], tuple[()], tuple[()], tuple[()], tuple[()], tuple[None]]

But after doing some arithmetic, I think you could end up with something like this:

tuple[tuple[None, None], tuple[()], tuple[()], tuple[None], tuple[()], tuple[()]]

Which is “Length^2/Length”. But that should be reduced to just a length:

tuple[tuple[None], tuple[()], tuple[()], tuple[()], tuple[()], tuple[()]]

So I’m not sure how introducing negative exponents can be solved with that unary tuple arithmetic approach.