Use * operator to type fixed-size collections

Every now and then, I happen to work with fixed-size homogenous collections… and find no satisfying way to annotate them.

Let’s say I have a function to compute in-place a Fibonacci generalized sequence depending on the first two terms:

def compute_fibonacci_terms(seq, n_terms):
    a, b = seq
    for _ in range(n_terms):
        a, b = b, a + b
        seq.append(b)
    return seq
>>> compute_fibonacci_terms([3, 4], n_terms=5)
[3, 4, 7, 11, 18, 29, 47]

To annotate this function, the two choices I see are

def compute_fibonacci_terms(seq: list[int], n_terms: int) -> list[int]: ...
def compute_fibonacci_terms(seq: tuple[int, int], n_terms: int) -> list[int]: ...

The first solution allows to pass list with any number of items, resulting in a ValueError in this (admittedly contrived) example. The second solution is even worse, disallowing passing lists (which is why this function was designed that way) and rightly reporting an error on seq.append.


Even when working with tuples, the situation is not ideal when it comes to large tuples:

def get_weekday_names(locale: str) -> tuple[str, str, str, str, str, str, str]:
    ...

Here type-checking works well, but is quite cumbersome to read and error-prone.

Proposal

What I’d like to propose here is to overload the __mul__ operator of type to denote fixed size homogenous collections.

So the above functions could be written as

def compute_fibonacci_terms(seq: list[int * 2], n_terms: int) -> list[int]:
    ...
def get_weekday_names(locale: str) -> tuple[str * 7]:
    ...

At runtime, type (as well as of some typing special forms usable as type annotations) would define __mul__ to ony accept an integer and return some special typing construct, similar to typing.GenericAlias. Type checkers would only allow literal integers.

Considerations

I believe type checkers should only allow this syntax inside generic types derived of Collection [1].
Type-checking wise, tuple[int, int, int] and tuple[int * 3] should be considered fully equivalent.

Since type and typing special constructs cannot currently be multiplied by anything, this should be fully backwards compatible. One case that could be problematic are defered anotations, since list["LateBind" * 3] would become list["LateBindLateBindLateBind"] :sweat_smile:, but I believe type checkers can just flag this as invalid (and suggest using list["LateBind * 3"] instead).

Future improvements this may open the way for...
  • Typing of heterogenous sequences (list[int * 3, str, int * 2]), even it I find that less readable and probably a way less common use case;
  • Bounded iterable size: for example, itertools.tee could have an overload
    def tee[I, N: Literal[int]](iterable: Iterable[I], n: N=2) -> tuple[I * N]: ...
    
    or even, to get back to my example,
    compute_fibonacci_terms[N: Literal[int]](seq: list[int * 2], n_terms: N) -> list[int * (N + 2)]: ...
    
  • Some way to denote “at least X” / “at most X” items ?

Has something in these lines been discussed before? I’m pretty sure I read a discussion about this issue somewhere here, but I couldn’t find it :pensive: and I don’t think it really had a proposal.

NB: I hesited between Typing and Ideas for this post, I preferred the later since this has implications on runtime and does not rely on advanced typing notions, but feel free to move it!


  1. maybe arg: int * 3 could be made sugar for arg: Collection[int * 3], but I’m not sure I’m fan of the idea. ↩︎

There was some previous discussion on this problem at Feature Request: Type hint for Fixed Length Homogeneous Sequences · Issue #786 · python/typing · GitHub

2 Likes

This is an interesting idea, but…

I don’t think lists could ever work in this context. First, I think it’d be extremely difficult for a typechecker to know how many elements might be in a given list; these are basically dependent types, right? That’s a very advanced typing concept.

Secondly, assume you have a class that has a field of type list[int * 4] and you happen to have a list of 4 ints, the checker knows it, and you pass it in. Then a few lines later you pop an element and now it’s a list[int * 3], but the class instance still thinks it has 4 elements.

I think for this to be sound the type should be immutable, so tuples. Maybe tuple[int * n] (or tuple[int; n]) could just be a shorthand for a homogenous tuple of n elements?

2 Likes

I see your point. Indeed, I did not considered the type checker work that would be needed outside of the “passing a literal list to a typed function”, but I suppose allowing list[int * 5] would not make sense within the current scope of type checkers.

I think this “syntax” may be useful even limited to frozen collections (aka tuples and frozensets?) though! This is more in line with the discussion linked by Alex above, indeed :smile:

1 Like

This sounds akin to design-by-contract which I think is beyond the scope of the type system. However, there are some libraries on PyPI that implements this but I’m not sure if they do static analysis.

I like the proposal. The implementation is simple, and type-checkers should detect the odd cases.

This is already detected as a typing error by mypy:

y: list[int, str] = []

error: “list” expects 1 type argument, but 2 given [type-arg]

A neat idea to ease an ergonomic pain-point. I’d agree that it’s tedious to write tuple[T, T, T, T]. However I’d personally prefer no more “special cases” be added until more rigorous foundations are specified for the type system in general. The usage of the subscript syntax in annotations feels overloaded and inconsistent enough as it is.

5 Likes

Yeah, I totally understand that! That’s more or less the reason why I didn’t push my proposal about extracting kwargs from function signatures further. Hope this project and related debates will progress quickly and lead so exciting features!

Do you think special-casing tuple[SomeType, Literal[int]] could still be relevant? Or even tuple[SomeType, TypeVar(bound=Literal[int])], which is AFAIK not doable today!

This is the part of the proposal that feels most consistent to me, at least as far as consistency with the existing tuple annotation syntax. For example:

  1. tuple[T, U, V] - heterogenous tuple of known length
  2. tuple[T, …] - homogenous tuple of unknown length
  3. tuple[T, 5] - homogenous tuple of known length

(1) and (2) we already have; (3) seems like a reasonable extension of (2) and (I imagine) could be implemented as sugar for (1).