Passing class argument and checking for list type

Lecris · August 18, 2023, 9:45am

I am trying to write a normalizer that would convert a given data (e.g. either str or list[str]) to a given output (let’s consider pathlib.Path and list[pathlib.Path]). The interface that I have in mind is something like:

def normalize(cls: Type[T], data: str | list[str]) -> T:
  ...

input_str = "./path"
input_list_str = [ "./path" ]

out = normalize(pathlib.Path, input_str)
# normalize(pathlib.Path, input_list_str) # Fail
normalize(list[pathlib.Path], input_str) == normalize(list[pathlib.Path], input_list_str)

but I am having trouble with introspecting if the input cls is list or a regular type object. What I could do is to use the protected typing._GenericAlias, but is there no better equivalent?

def normalize(cls: Type[T], data: str | list[str]) -> T:
  if isinstance(cls, types.GenericAlias) and cls.__origin__ == list:
    obj_cls = cls.__args__
    if not isinstance(data, list):
      data = [data]
    return [obj_cls(d) for d in data]
  return cls(data)

Also how can I type-hint T such that it is Any except list?

flyinghyrax · August 19, 2023, 3:10pm

This doesn’t directly solve your problem, but in case it’s useful, here is how I would annotate something like this:

from typing import TypeVar, Callable, overload

# define generic type variable
_T_data = TypeVar("_T_data")

# type alias for a function (str) -> T
StrConv = Callable[[str], _T_data]

# split type signatures, otherwise the typer checker can't catch mismatched arity
# like `result: list[U] = normalize(str2u, "not a list")`
@overload
def normalize(convert: StrConv[_T_data], data: str) -> _T_data: ...

@overload
def normalize(convert: StrConv[_T_data], data: list[str]) -> list[_T_data]: ...

# in implementation, check for list vs not list
def normalize(convert: StrConv[_T_data], data):
    if isinstance(data, list):
        return list(map(convert, data))
    else:
        return convert(data)

The type checker then recognizes the two overload signatures as valid:

# (1/2) 
(function) def normalize(
  convert: StrConv[_T_data@normalize],
  data: str
) -> _T_data@normalize
# (2/2) 
(function) def normalize(
  convert: StrConv[_T_data@normalize],
  data: list[str]
) -> list[_T_data@normalize]

And it can infer the correct return types:

one = normalize(int, "1")
# (variable) one: int

one_two_three = normalize(int, ["1", "2", "3"])
# (variable) one_two_three: list[int]

# even in fun cases like this:
letters = normalize(list, "abcdef")
# (variable) letters: list[str]

print(letters)
# ['a', 'b', 'c', 'd', 'e', 'f']

However, this determines whether the output is a list based on whether the input is a list, whereas I see you want to be able to determine the output type based on the type of the first argument (converter) and not the second (data):

I’ll play around with that and see if I can come up with something.

One specific thing I can answer:

I believe this (type from set difference, I’m sure there is probably a formal technical term for it) is not possible with Python’s current type annotation system.

flyinghyrax · August 19, 2023, 4:48pm

Having experimented some more, I think I have something that more closely matches your initial requirements.

normalize(int, "1")
# 1
normalize(int, ["1", "2", "3"])
# ValueError: "can't convert list of data with non-list type parameter <class 'int'>"
normalize(list[int], "1")
# [1]
normalize(list[int], ["1", "2", "3"])
# [1, 2, 3]

It’s been a fun exercise but I’m not 100% satisfied with what I came up with.

First some things I found interesting about list[T] and Type[T] ^[1].

A parameter with type annotation Type[T] can accept - as the typing docs put it - “the class object of T” (or a subtype of T). When used as an annotation, list[T] is a type from the perspective of a static analysis tool (e.g. Pylance). But when passed as a parameter at runtime, it’s not actually a type per-se:

It’s class is not type
It’s not an instance of a metaclass
It isn’t type itself

At runtime we’re no longer dealing with the static type system but with Python’s actual dynamic type system, and in that scope an argument list[T] is a value (an object that is an instance of the class types.GenericAlias) and not a ‘type’ (because it can’t be used to construct instances.) ^[2]

(…in hindsight this feels obvious, but I had to walk myself there.)

So in some sense, I almost feel like list[int] isn’t a valid value for a parameter of Type[T], because it isn’t a class. But hey, generics are weird, so there’s another level of indirection happening that I don’t fully grok the implementation of… and by intuition it seems like the correct annotation, so here we are.

Anyway, all that said, here’s what I came up with:

from typing import (
    Type,
    TypeVar,
    get_args,   # !
    get_origin, # !
)
from types import GenericAlias

T = TypeVar("T")

def normalize(cls_ish: Type[T], data: str | list[str]) -> T:
    match cls_ish:
        case GenericAlias():
            container_type = get_origin(cls)
            if container_type is not list:
                raise TypeError(f"unsupported container type {cls}")
            item_type, *_ = get_args(cls)
            ndata = data if isinstance(data, list) else [data]
            return container_type(map(item_type, ndata))
        case Type:  # no parens '()' !
            if isinstance(data, list):
                raise ValueError(f"can't convert list of data with non-list type parameter {cls}")
            return cls(data)

I’ve left the Type[T] annotation as is, since functionally it does what is intended.
get_args and get_origin are fun little helper functions I didn’t know about before: typing — Support for type hints — Python 3.11.4 documentation
It’s not incredibly robust, in particular there’s nothing constraining Type[T] to classes that accept a string as an initializer argument.
It might be nice to generalize beyond list to support say a collection type like Sequence, but this is tricky because strings are sequences of characters, which makes it hard to distinguish when data is a single value vs when data is a collection.

…that might be obvious to some people, but weren’t to me at first. ↩︎
i.e. all types are objects, but not all objects are types? ↩︎

Lecris · August 21, 2023, 11:37am

Thanks for the help. That is more or less what I had so far. Seems like there is no other elegant way of introspecting this.