Function that creates types compatible with static type checking

I would like to implement a function that returns a new type. And the returned type should be compatible with static type checkers. The initial idea I had was for the function to have a -> type return. However, this does not work because type checkers consider the assigned value to be a variable. The following code snippet illustrates it:

def create_custom_type(name: str, **kwargs) -> type:
    # Here logic that creates custom_type
    return custom_type

my_type = create_custom_type("my_type", ...)

def my_function(
    arg: my_type  # this gives type error, but want it to be okay!
):
    pass

my_var = my_type("some value")

my_function(my_var)  # want this to be okay!
my_function(0)  # want to get here unexpected argument error!

Is there currently way to implement a function that creates types compatible with static type checking? If there isn’t, would there be the possibility to add this to the typing system?

One idea could be that the function have a return type as -> NewType. This way static type checkers know that the assignment of the return of the function should be considered a new type, and not a variable.

For such a function to have its return value be visible to static type checkers, the type checkers would have to run the function. I proposed something similar here. It’s a well worth reading that thread. In the end, the problem that I wanted to solve would be best solved with a kind of type-table.

1 Like

The entire point of using a type checker is to be able to figure out whether the types are compatible, before running the code.

The entire point of using type as a constructor is to not have to have the type even exist, until after the code has started running.

These two ideas are not compatible.

3 Likes

Not too fast… :slight_smile: The type checker processes type representations. The primary problem is that a type-creation function would not be able to really express the type in the function signature - but isn’t that the same as the forward reference problem? (For which one solution is the stringification of types, and a later solution also has its own PEP) If so, then the type checker just needs to have a standard way of mapping type representations to types - which already implies that it needs to be able to map representations (and types) to creation functions for types. If this is correct, then I’m not convinced there is a real contradiction or incompatibility here…

I don’t see why type checkers would need to run the function. Let me explain it a bit differently. Suppose we have:

from some_module import MyType

def my_function(arg: MyType):
    ...

my_var = MyType(...)

my_function(my_var)  # okay!
my_function(0)  # type error!

If it turns out that the MyType is implemented as class MyType: ..., then current static type checkers work normally. However, if MyType is created by a function, e.g. MyType = create_function(), then would be nice to have a way to annotate that function to let static type checkers know that MyType is a new type. Not different than if it were a simple class.

What I suggested was something like:

from typing import NewType

def create_type() -> NewType:
    ...

MyType = create_type()

With this, static type checkers could already allow MyType to be used in type annotations and check that arguments of type MyType only receive instances of MyType. No need to run any function.

There is one limitation. The type checkers would not know how to check if instances of MyType are being created correctly. E.g., in var = MyType("value"), is it correct to instantiate giving a string? Independent from this limitation, type checkers would already add value by being able to check the use of the instances, while not their instantiation.

Extending this to also allow telling type checkers valid ways of instantiating MyType could be desired. But even so, I don’t think this requires type checkers to run a function. For example, could be via a protocol, like:

from typing import NewType, Protocol

class MyTypeInit(Protocol):
    def __init__(self, value: str):
        ...

def create_type() -> NewType[MyTypeInit]:
    ...

MyType = create_type()

I guess type checkers wouldn’t have an issue with using the protocol to check if the instantiation of MyType is being given correct arguments.

1 Like

The checker also needs to know that it isn’t a subclass of int…

Doesn’t that problem (similar problems with classes defined inside wrappers) already exists in the current system? And would it not also be solved by PEP 649?

Why a subclass of int? What I am thinking about is a completely new type. Does not inherit from existing classes. The same as class MyType: ... does not inherit.

Because you said

my_function(0)  # type error!

But if MyType is a subclass of int, this wouldn’t be a type error.

Wouldn’t (or couldn’t) that naturally flow out of the create definition? Isn’t this proposal “just” a generalization of how namedtuple works - using objects instead of tuples? (With “just” I’m not taking a stance on if this is worthwhile, just on whether it’s feasible.)

No, my_function(0) should give a type error because it was given an int, but that parameter only accepts instances of MyType which is not supposed to inherit from int.

Because it’s “static” type checking. Anyway, I think you can achieve what you want to, although it’s not clear to me why you want to do this:

class MyStaticType: pass  # Dummy class
MyType = cast(Callable[..., MyStaticType], create_type(...))
my_var = MyType(...)

def my_function(arg: MyStaticType):  pass  # works, as desired
my_function(my_var)  # passes, as desired
my_function(0)  # fails, as desired

But MyStaticType is an empty class, so you can’t do much with it except pass it around.

With the above solution, you simply define the __init__ you want under the MyStaticType class. I’m not sure that type checkers would (or at least should) do anything with it though since __init__ doesn’t obey LSP, and there’s no way to know that my_var isn’t being constructed using a derived class.

Similarly, I don’t think your protocol having __init__ makes sense (again since __init__ does not obey LSP.)

My motivation is functions that create types in jsonargparse. People can create different path types. For example, to create a path that must be an existing file, writable, either a local path or remote fsspec, one could do:

from jsonargparse.typing import path_type

Path_fsw = path_type("fsw")

Then this type is supposed to be used in type annotations of functions or classes. The possible combinations of path types is large so it does not make sense to pre-create them all. And also there are other functions that create types, e.g. strings that must follow a regex pattern, or a number with some restrictions.

The cast idea seems interesting. However, asking developers to create a dummy class and then do some cast that most people will not understand is not really developer friendly. My idea would be that developers only need to call the function to create the type.

Fair enough! Then I think your best bet is to find some really motivating examples to convince people that the two lines should be written as one line.

1 Like

Everything user-defined from Python inherits from object, which is an existing class.

The cast idea doesn’t work. People would need to use one name in type annotations MyStaticType but then a different name for creating instanes MyType. Simply not the same as creating a type.

Yes, you need to have a pair of names since the type checker needs a static type whereas you want to dynamically create a runtime type.

I think your best bet to get around the huge number of types you say you have is to programmatically generate the Python code.

2 Likes

The only way for a type checker to understand a function like that (which returns a type) is if the type system itself is a dependent type system. Python’s type system is not that powerful. And IMHO I suspect it won’t ever be, for various practical reasons.

If you want to play with dependent types, there are a few languages that support them. I might suggest taking a look at Idris. Here’s an insertion sort written in Idris.

2 Likes

Thanks for the info, this helps. Thought, note that I am not proposing that the type system interpret the arguments given to the function that creates the type. Let me have a third try at explaining. This time using an example from the typing docs.

Currently there is typing.NewType which can be used like:

from typing import NewType

UserId = NewType('UserId', int)
some_id = UserId(524313)

If a function that expects a UserId argument is given an int, then the type checker will show an error.

Suppose that someone needs many user id types, and also want the instances to be a bit more useful, e.g., method to get the user name and restricting the range of int values it can have. This could be:

class UserIdBase(int):
    def get_user_name(self) -> str:
        ...

def create_range_user_id_type(min: int, max: int):
    def RangeUserId(UserIdBase):
        _min = min
        _max = max

        def __new__(cls, v):
            v = int(v)
            if v < cls._min or v > cls._max:
                raise TypeError(f"Outside of range: {v}")
            return super().__new__(cls, v)

    return RangeUserId

HumanUserId = create_range_user_id_type(100, 199)
PetUserId = create_range_user_id_type(200, 299)

How to add a return type annotation to create_range_user_id_type for static type checkers to identify problems? What I was proposing is that if annotated as -> NewType[int], then the following two cases would be equivalent from the perspective of static type checkers:

PetUserId = NewType('PetUserId', int)
PetUserId = create_range_user_id_type(200, 299)

Also would be possible to annotate as -> NewType[UserIdBase] in which case static type checkers would be aware of methods such as get_user_name.

If the behavior is just like the already existing NewType, then something like PetUserId(500) would not be an error identified by static type checkers, and it would fail at runtime. But just because a dependent type system is complex and will not be implemented, it is not a reason to deprive the python type system from having a less ambitious but still useful support for functions that create types.

I think I get it. The Protocol example posted earlier “works” in the sense that a type checker will know that the result will be Type[FooProtocol]. However, the result:

  • will not be considered distinct from the result of a subsequent call to the factory function.
  • will not be recognized as a valid type that can be used in subsequent annotations.

I’m almost sold. But at the same time

class PetUserId(RangeUserId):
   def __init__(self) -> None:
       super().__init__(200, 299)

has the desired effect. I concede that it is more code for the end user and less “clean”. But at the same time, it works today and is only 3 lines. It also allows the end user to make further customization of the type if they want to.