Use * operator to type fixed-size collections

loic-simon · November 12, 2023, 2:01pm

Every now and then, I happen to work with fixed-size homogenous collections… and find no satisfying way to annotate them.

Let’s say I have a function to compute in-place a Fibonacci generalized sequence depending on the first two terms:

def compute_fibonacci_terms(seq, n_terms):
    a, b = seq
    for _ in range(n_terms):
        a, b = b, a + b
        seq.append(b)
    return seq

>>> compute_fibonacci_terms([3, 4], n_terms=5)
[3, 4, 7, 11, 18, 29, 47]

To annotate this function, the two choices I see are

def compute_fibonacci_terms(seq: list[int], n_terms: int) -> list[int]: ...

def compute_fibonacci_terms(seq: tuple[int, int], n_terms: int) -> list[int]: ...

The first solution allows to pass list with any number of items, resulting in a ValueError in this (admittedly contrived) example. The second solution is even worse, disallowing passing lists (which is why this function was designed that way) and rightly reporting an error on seq.append.

Even when working with tuples, the situation is not ideal when it comes to large tuples:

def get_weekday_names(locale: str) -> tuple[str, str, str, str, str, str, str]:
    ...

Here type-checking works well, but is quite cumbersome to read and error-prone.

Proposal

What I’d like to propose here is to overload the __mul__ operator of type to denote fixed size homogenous collections.

So the above functions could be written as

def compute_fibonacci_terms(seq: list[int * 2], n_terms: int) -> list[int]:
    ...

def get_weekday_names(locale: str) -> tuple[str * 7]:
    ...

At runtime, type (as well as of some typing special forms usable as type annotations) would define __mul__ to ony accept an integer and return some special typing construct, similar to typing.GenericAlias. Type checkers would only allow literal integers.

Considerations

I believe type checkers should only allow this syntax inside generic types derived of Collection ^[1].
Type-checking wise, tuple[int, int, int] and tuple[int * 3] should be considered fully equivalent.

Since type and typing special constructs cannot currently be multiplied by anything, this should be fully backwards compatible. One case that could be problematic are defered anotations, since list["LateBind" * 3] would become list["LateBindLateBindLateBind"] , but I believe type checkers can just flag this as invalid (and suggest using list["LateBind * 3"] instead).

Future improvements this may open the way for...

Typing of heterogenous sequences (list[int * 3, str, int * 2]), even it I find that less readable and probably a way less common use case;

Bounded iterable size: for example, itertools.tee could have an overload

def tee[I, N: Literal[int]](iterable: Iterable[I], n: N=2) -> tuple[I * N]: ...

or even, to get back to my example,

compute_fibonacci_terms[N: Literal[int]](seq: list[int * 2], n_terms: N) -> list[int * (N + 2)]: ...

Some way to denote “at least X” / “at most X” items ?

Has something in these lines been discussed before? I’m pretty sure I read a discussion about this issue somewhere here, but I couldn’t find it and I don’t think it really had a proposal.

NB: I hesited between Typing and Ideas for this post, I preferred the later since this has implications on runtime and does not rely on advanced typing notions, but feel free to move it!

maybe arg: int * 3 could be made sugar for arg: Collection[int * 3], but I’m not sure I’m fan of the idea. ↩︎

AlexWaygood · November 12, 2023, 2:12pm

There was some previous discussion on this problem at Feature Request: Type hint for Fixed Length Homogeneous Sequences · Issue #786 · python/typing · GitHub

Tinche · November 12, 2023, 2:39pm

This is an interesting idea, but…

I don’t think lists could ever work in this context. First, I think it’d be extremely difficult for a typechecker to know how many elements might be in a given list; these are basically dependent types, right? That’s a very advanced typing concept.

Secondly, assume you have a class that has a field of type list[int * 4] and you happen to have a list of 4 ints, the checker knows it, and you pass it in. Then a few lines later you pop an element and now it’s a list[int * 3], but the class instance still thinks it has 4 elements.

I think for this to be sound the type should be immutable, so tuples. Maybe tuple[int * n] (or tuple[int; n]) could just be a shorthand for a homogenous tuple of n elements?

loic-simon · November 12, 2023, 2:56pm

I see your point. Indeed, I did not considered the type checker work that would be needed outside of the “passing a literal list to a typed function”, but I suppose allowing list[int * 5] would not make sense within the current scope of type checkers.

I think this “syntax” may be useful even limited to frozen collections (aka tuples and frozensets?) though! This is more in line with the discussion linked by Alex above, indeed

ajoino · November 12, 2023, 3:57pm

This sounds akin to design-by-contract which I think is beyond the scope of the type system. However, there are some libraries on PyPI that implements this but I’m not sure if they do static analysis.

apalala · November 12, 2023, 5:29pm

I like the proposal. The implementation is simple, and type-checkers should detect the odd cases.

This is already detected as a typing error by mypy:

y: list[int, str] = []

error: “list” expects 1 type argument, but 2 given [type-arg]

flyinghyrax · November 13, 2023, 2:58am

A neat idea to ease an ergonomic pain-point. I’d agree that it’s tedious to write tuple[T, T, T, T]. However I’d personally prefer no more “special cases” be added until more rigorous foundations are specified for the type system in general. The usage of the subscript syntax in annotations feels overloaded and inconsistent enough as it is.

loic-simon · November 13, 2023, 7:31pm

Yeah, I totally understand that! That’s more or less the reason why I didn’t push my proposal about extracting kwargs from function signatures further. Hope this project and related debates will progress quickly and lead so exciting features!

Do you think special-casing tuple[SomeType, Literal[int]] could still be relevant? Or even tuple[SomeType, TypeVar(bound=Literal[int])], which is AFAIK not doable today!

flyinghyrax · November 13, 2023, 8:24pm

This is the part of the proposal that feels most consistent to me, at least as far as consistency with the existing tuple annotation syntax. For example:

tuple[T, U, V] - heterogenous tuple of known length
tuple[T, …] - homogenous tuple of unknown length
tuple[T, 5] - homogenous tuple of known length

(1) and (2) we already have; (3) seems like a reasonable extension of (2) and (I imagine) could be implemented as sugar for (1).

Topic		Replies	Views
How to type annotate mathematical operations that supports built-in numerics, collections and numpy arrays? Python Help help , typing	1	4501	February 5, 2022
Typing of nonhomogeneous tuples of undefined length Typing	2	266	January 18, 2024
How to type homogenous tuples of a given length Python Help typing	12	2015	July 14, 2023
More consistent typing annotations for tuple Ideas	2	351	February 2, 2024
An interesting pytype experiment, and a possible extension to strings Ideas	95	3687	February 22, 2023

Use * operator to type fixed-size collections

Proposal

Considerations

Related Topics