Removing type checker internals from typeshed

typeshed still contains a few items in its builtins.pyi and typing.pyi stubs that don’t exist at runtime, but are needed by type checkers. For example builtins.function and typing.AwaitableGenerator. We would like to remove them in the foreseeable future. The current plan is to move these definitions to a new private module _typeshed._tc so that type checkers that still rely on them can switch over. Of course, removing them – or at least some of them – fully from typeshed would be even better, for example by using types.FunctionType instead of builtins.function.

If you are a type checker author, please contribute to the discussion at Remove type checker-specific symbols from builtins.pyi and typing.pyi · Issue #7580 · python/typeshed · GitHub and a first PR at Copy typechecker-internal symbols to `_typeshed._tc` by srittau · Pull Request #13816 · python/typeshed · GitHub.

7 Likes

We have now added the new _typeshed._type_checker_internals module to typeshed. If you are a type checker author, please update the symbols you use as follows, since we plan on removing the old symbols in the near future:

  • builtins.functiontypes.FunctionType
  • typing._promote_typeshed._type_checker_internals.promote
  • typing.AwaitableGenerator_typeshed._type_checker_internals.AwaitableGenerator
  • typing.NamedTuple_typeshed._type_checker_internals.NamedTupleFallback
  • typing._TypedDict_typeshed._type_checker_internals.TypedDictFallback

We added this new module as a temporary convenience, since we assumed that just moving the symbols would be easier for type checker developers in the short term. Long term, we’d appreciate it if type checkers would move away from using these “fake” types in the _typeshed namespace. That said, we plan to keep supporting this module until type checkers have found an alternative, so no immediate action - apart from the renaming above - is needed.

4 Likes

I started to make the changes in pyright, but I ran into some issues. I have a couple of questions about the proposal.

  1. Historically, builtins.AwaitableGenerator has been publicly exported from builtins.pyi as a type that is decorated with @type_checking_only. That means it’s not legal to use in runtime code, but it is legal to use in type annotations if they are not executed (if they are quoted or in type stubs). That makes AwaitableGenerator different from the other fake, private symbols like _promote and _TypedDict. Perhaps it would be best to leave builtins.AwaitableGenerator unmodified. It’s marked @type_checking_only, so I don’t see what harm it’s doing in its current location. If it is going to be moved to a different location, perhaps it should be aliased in the existing location for compatibility?

  2. Similarly, builtins.function has historically been publicly exported from builtins.pyi and is decorated with @type_check_only. If it is equivalent to types.FunctionType (currently the two definitions differ slightly, BTW), would it make sense to have a common definition but alias it to retain backward compatibility?

  3. I don’t think it makes sense to move typing.NamedTuple or replace the current class definition with a function. It needs to remain a class because it can be used in a base class list when using the class syntax to define a named tuple. What is the motivation for modifying typing.NamedTuple? Unless there’s a really compelling reason to change it, I’m thinking that it would be best to leave it alone.

  4. @srittau, you mentioned that “long term, we’d appreciate it if type checkers would move away from using these “fake” types in the _typeshed namespace”. What alternative approach do you recommend? I don’t think it would benefit anyone (type checker maintainers or users) if knowledge of _TypedDict (or TypedDictFallback) were hard-coded in individual type checkers. We need a definition of this somewhere in typeshed’s stdlib stubs that type checkers can use.

1 Like

Historically, Any was not marked as a class in typeshed, but pyright, mypy and other type checkers implemented special casing to enable it to be used as a class base anyway. TypedDict is similarly not a class in typeshed, but pyright, mypy and other type checkers allow classes to inherit from it. I think type checkers should implement similar special casing for NamedTuple.

Having NamedTuple as a class makes little sense. It is a function at runtime, and having it as a class in the stubs leads type checkers to make many incorrect inferences. For example, pyright believes that NamedTuple has a __mro__ attribute (it does not at runtime), and believes that NamedTuple does not have a __kwdefaults__ attribute (it does at runtime). It also believes that NamedTuple is a valid type expression, even though it is impossible to construct any object at runtime that is an “instance of NamedTuple” (since NamedTuple is a function), and it believes that type[NamedTuple] is a valid type expression, even though it is impossible to construct a class at runtime that has NamedTuple in its MRO (because NamedTuple is a function – classes that “inherit from NamedTuple” are in fact direct subclasses of tuple at runtime).

I strongly agree that we should keep TypedDictFallback and NamedTupleFallback in the _typeshed namespace indefinitely. I stated this in Remove type checker-specific symbols from builtins.pyi and typing.pyi · Issue #7580 · python/typeshed · GitHub as well, and @rchen152 agrees too.

3 Likes

Any and TypedDict are defined as special forms in the stubs, not as functions. Since pyright models all special forms as classes, these can be used as base classes without any special treatment. A function, on the other hand, would require significant special-casing to be allowed in a base class list. Perhaps NamedTuple could be modeled as a special form? This would be more consistent with the other examples you provided, and it would require less hoop-jumping, at least in pyright. The fallback type then could have a __call__ method that defines the callable interface of NamedTuple.

Pyright’s current implementation of named tuples also has some pretty deep reliance on NamedTuple appearing in the mro for a named tuple class. I realize that deviates from the runtime behavior. Changing it is possible, but it’s quite a bit of work — way more than just changing the location of an import.

I wasn’t aware that NamedTuple is a function at runtime. I would naively think that it would not be possible to use a function as a base class in a class definition. Clearly there’s some black magic at work here. I’m curious how that works. Does the NamedTuple object implement the required attributes and methods of a type object such that the class construction machinery is fooled into thinking that it’s actually a type object?

It’s by way of setting the __mro_entries__ attribute on the function object. When an object with that attribute appears in a base class list, Python replaces the object with the return value of that method.

This was added to support inheriting from e.g. list[int] (in PEP 560).

4 Likes

I would prefer to have NamedTuple be a function in the stub, the same as it is at runtime, but I’d be okay with compromising on it being a _SpecialForm instance. I’d be able to fairly easily implement the necessary special casing in ty to treat this particular _SpecialForm instance as a function. Note that TypedDict is also a function at runtime, FWIW.

I’d be okay with compromising on it being a _SpecialForm instance

I think it would be better to model it as a _SpecialForm in the stubs. While it is implemented as a function, it’s not a normal function. It implements some additional magic. Also, since the implementation of NamedTuple is similar to TypedDict, then it seems we should model the two consistently rather than using _SpecialForm for one and a function for the other.

I strongly agree that we should keep TypedDictFallback and NamedTupleFallback in the _typeshed namespace indefinitely.

OK, I’m glad to hear that. If that’s the case, is _type_checker_internals the sub-namespace that we want to adopt permanently? Or should these definitions go into the top-level _typeshed namespace? What’s the benefit of creating a sub-namespace here?

1 Like

I think the idea here is that, unlike most types in the _typeshed namespace, the intended consumers of these types are type checkers themselves rather than typeshed developers or users of type checkers

1 Like

Some thoughts:

  1. AwaitableGenerator

I think it’s still cleaner if this class doesn’t exist in typeshed; @type_check_only functions are confusing for users and ideally limited to private implementation details. But I agree that the existence of the @type_check_only decorator makes it less urgent to move.

(Minor correction: it’s in typing, not builtins.)

  1. function

I think having a name in builtins that doesn’t actually exist is confusing for users, so I’d strongly prefer to get this out of the builtins namespace.

currently the two definitions differ slightly, BTW

The current differences are:

  • __kwdefaults__ has | None in its type on FunctionType but not function. This seems like an oversight we should fix.
  • __new__ is missing on function but present on FunctionType. Again, this should probably be fixed.
  • __call__ is also missing on function but present on FunctionType. Of course, functions really are callable, but it’s typed as (*args: Any, **kwargs: Any) -> Any, and I think if we add that to function mypy will start thinking any function can be called with any arguments.
  • __get__ is typed differently, with some comments about how mypy special-cases the descriptor.

Ideally we should resolve these differences and just use types.FunctionType. (Edit: removed two in https://github.com/python/typeshed/pull/14094 .) One practical issue is that stubs just use def for all functions, but builtin functions are not in fact instances of types.FunctionType, and don’t have some of the same attributes.

This leads pyright to accept the following program, which fails at runtime (because len is a builtin function and doesn’t have a __defaults__ attribute):

def f(x: function):
    print(x.__defaults__)

f(len)

Ideally this should be fixed so we represent the runtime objects more precisely, but it doesn’t feel high priority.

  1. NamedTuple

I think ideally we’d have NamedTuple as a function in the stub to represent the runtime more precisely, but I’m fine with the other solutions suggested in this thread.

  1. Long-term plan to stop relying on _typeshed

One motivation for doing this is that the current type checker internal classes appear to be tuned for mypy in a way that doesn’t necessarily translate to other type checkers. For example, typing._TypedDict has several comments talking about how things have to be a certain way for a mypy plugin to work correctly.

Typeshed should ideally be type checker-agnostic, not tuned to the oddities of particular type checkers. If we kept typing._TypedDict in typeshed, how would we know that the implementation is “correct” if different type checkers might interpret it in different ways?

1 Like

I suppose for your example you mean

def f(x: function):
       print(f.__kwdefaults__)

f(len)
1 Like

Thanks for working on this! About the individual points:

  1. As Jelle has pointed out, @type_check_only is not terribly user friendly, which is why we usually prefix those items with an underscore, but is not the case here. I’m also not aware of any tools supporting the decorator at the moment. If the removal of some of those items prove problematic, we can postpone it, but removing them is the ultimate goal.
  2. The same is true here. We can keep builtins.function for a limited time, but tools should stop relying on its existence.
  3. A lot has been said about NamedTuple and its strangeness, and I agree with Jelle that ideally we’d type it as a function, but I’d also be okay with it being typed as _SpecialForm. __mro_entries__ was new to me as well. Ideally type checkers would support that, but considering the esoteric nature of it, I understand why they wouldn’t. Another alternative could be making NamedTuple a class in CPython.
  4. Ideally, we’d add these symbols that are used by all type checkers to the CPython standard library. But until then, we can keep them in the _typeshed namespace indefinitely. But personally I think we should minimize stub-only symbols (and the use of the _typeshed pseudo-module) if possibly and practical.
2 Likes