Type placeholders

Daverball · March 2, 2024, 10:13am

A common thing to encounter in typed Python code is some generic user-extensible API interface like json.dumps/json.loads where we cannot provide a more specific annotation than Any for the value type, because individual users can extend the serializer/deserializer with the ability to handle additional types, so when they want to increase strictness on an API of this kind they have no other choice than to write a wrapper and force everyone to use that wrapper instead of the actual API, if they want more accurate types. The other motivation for doing this is API ergonomics, so less asserts are necessary, but even here there may be users that rather would deal with a more accurate type, even if it is more annoying to use the API that way.

I think it would be nice to provide a common interface for library authors to export a set of placeholder types that can optionally be filled in with a different type by library users. This would also reduce the need for custom type checker plugins.

from typing import Any, TypePlaceholder

JSONSerializable = TypePlaceholder("JSONSerializable", Any)
JSON = TypePlaceholder("JSON", Any)

def dumps(value: JSONSerializable) -> str: ...
def loads(value: str) -> JSON: ...

Or alternatively for better backwards-compatibility^[1]:

from typing import Any, TypePlaceholder

JSONSerializable: TypePlaceholder = Any
JSON: TypePlaceholder = Any

def dumps(value: JSONSerializable) -> str: ...
def loads(value: str) -> JSON: ...

I think it would be best for type checkers to provide their own way for how to fill in a TypePlaceholder through their individual configuration formats, but if that’s a point of contention it could also be specified through code, although a per-project configuration option seems more sane to me (maybe we could also use py.typed for this?).

This may be less compelling for high level libraries using low level libraries, since end-users may yet again override/extend the interface they themselves already extended, but it should definitely prove useful for application authors that have no downstream dependencies.

Sharing Placeholders

Sometimes multiple libraries will talk about essentially the same placeholder type, but they don’t necessarily want to introduce an explicit dependency on one another (or the placeholder they want to refer to is trapped in a stub file and not available at runtime, but they need it to be available at runtime). One possible way to resolve this, would be, to allow specifying a fully qualified name for the first parameter of a TypePlaceholder.

Auditing internal/external consistency

This would also empower library authors by being able to internally use the types they’re supposed to support by default (even if it’s just object instead of Any) and making sure there aren’t internal consistencies where the implementation fails for certain types in unexpected ways, that would otherwise be hidden through use of Any, without hampering library ergonomics for end-users.

As a library user you could audit the external library against your own type definitions and make sure the library is actually able to handle the types you want it to be able to support.

this way type checkers can just define TypePlaceholder = TypeAlias in their copy of typeshed to support the annotation without implementing the feature ↩︎

mikeshardmind · March 2, 2024, 10:56am

I’m not sure I understand the use case here. If you aren’t validating it, Any is the right type. Users don’t need any casts or a new placeholder type, Any already can act as a placeholder that can be replaced by the user immediately:

known_data: SomeTypedDict = json.loads(some_external_object)

If you are validating it, shouldn’t the method require knowing the structure being validated?

def parse(data: bytes, typ: type[T]) -> T:
    ...

you shouldn’t really validate after parsing a structure, but during…, that said, TypeGuard allows writing a function that actually checks that the type is the type if you validate as a separate pass.

Daverball · March 2, 2024, 11:22am

The use-case is that you can optionally add validation to something that you otherwise wouldn’t be able to validate, because it’s user-extensible. As you pointed out you can certainly do that by writing type guards or writing a checked wrapper API, which calls the unchecked API underneath, but in both cases you have to manually use the new API and/or type guards, which is easy to forget and a bit of a pain to statically enforce/validate yourself.

So I’d compare this feature to the strictness flags the type checkers provide, where you optionally can use a more strict interpretation of a common concept, this just extends that to user-definable types.

To provide some additional motivation specifically for the JSON case. There’s many APIs that use json.loads/json.dumps underneath such as requests.Request.json(). If all those APIs used the same PlaceholderType you could specify one configuration option and turn all those Any returns into a more concrete type, rather than having to find all the places that could return that type and adding an annotation.

mikeshardmind · March 2, 2024, 11:37am

I guess I have multiple further questions because as explained, this seems like either that those apis should provide a more accurate type, or if they can’t because they don’t actually know the type, that anything other than Any or a generic would be incorrect, and that either of those options allows composing type information in code that actually handles the types rather than in configuration which does not.

I’d rather not be moving in the direction of a supported plugin method and instead find better ways for people to ergonomically express their intent within the type system itself.

Daverball · March 2, 2024, 11:48am

Consider a fairly common API pattern: Some kind of registry where users can register handlers for various types. The generic API has no way of knowing what kind of handlers have been registered by downstream code, so they can’t provide a better annotation than Any for the API function that invokes those handlers.

But the end-user with no additional downstream dependencies has that information. One way to solve this would be to make everyone create their own registry and API instance and make those generic, so you can bind your own set of supported types, but this would change the API quite drastically and it would make it more difficult to share a registry with another library, so it doesn’t really seem like an ideal solution. Or to change the API completely and no longer use a registry approach, but then everyone has to change their code in order to get better type information.

So having some way to share a parameterizable type that can be set to what you know to be true for the whole project seems certainly useful to me. How this should work exactly (be it through configuration or through code) does not matter as much to me, but when it is through code, we run into scenarios where the type checker first has to scan the whole project to know the actual type of a PlaceholderType or needing to specify the placeholder types in every source file that could make use of them directly or indirectly.

mikeshardmind · March 2, 2024, 12:09pm

If you have an application that defines configuration and that application uses 2 libraries that each use a shared registry, and substituting the type out from under them would not break either of them, this sounds like type variable defaults (defaulting to Any) paired with subscriptable functions would allow this to be typed properly without changing any APIs.

This can be done now in a very very roundabout way with Callable protocols as exported types in a stub, but I wouldn’t recommend that.

Daverball · March 2, 2024, 12:19pm

In any of those ways you would still have to change your code and write static analysis to make sure you’re using the proper parametrization everywhere. The point of PlaceholderType is to make this as easy as possible, you can change the type and can immediately see what issues it uncovers, no code changes necessary. The only way to do this currently is to ship your own stubs for those libraries and manually change the types.

mikeshardmind · March 2, 2024, 12:54pm

I feel like what I’ve understood from your explanation of what you want to be typed doesn’t match something about your intent here. In what I’ve understood, to go from not specifically or generically (or “replicable with a placeholder”) typed to being more specifically known, at least 1 party has a required code change.

Even in the best case for this proposal:

your type checker of choice has to support swapping the type
you have to maintain a separate configuration file that has effects on types of code
the library in question has to support this by setting up groups of placeholders that replace to the same thing…

So…

that last part is just typevariables possibly mixed with type aliases, which already exist and can do more to compose with the rest of the type system.
type checkers already have to support them
You can use a type alias to parameterize so that in all of the places you parameterize to the same type and want this to be in sync. This is then a 1 line change, type alias only, rather than treating configuration this way.

oscarbenjamin · March 2, 2024, 1:07pm

I know of examples of this. I think that this pattern though is often better handled by @singledispatch or even better would be some kind of multiple dispatch.

Do type checkers understand @singledispatch?

Does @singledispatch handle the cases you are interested in (besides lack of multiple dispatch)?

Daverball · March 2, 2024, 1:17pm

Yes, an alternative way to implement this would be to add support to type checkers to substitute any globally accessible symbol in a library with a different type^[1]. So in that sense this does not need a new construct.

But to me this is about signaling intent and telling users which parts of your API are fluid and could be user-configured downstream without having to completely redesign the API (which may not even be an option in case of a stubs package) and to encourage consistent use of PlaceholderType in such APIs, rather than using plain Any.

You are correct that this would require support and some amount of coordination from the typing ecosystem and wouldn’t magically work the day it’s accepted, but you are also still making the assumption that changing one TypeAlias in an external library is a natural thing to do, you actually have to either change the library’s code after you install it or provide your own stubs, both of which come with their own problems and it assumes the library authors already did the work of extracting that type into a TypeAlias even though it’s currently just Any and there’s no incentive to do so, because the type system does not officially support substituting an entire type project-wide.

without having to ship your own complete stubs for that library that you then have to stubtest to ensure they stay in sync ↩︎

Daverball · March 2, 2024, 1:20pm

No, they don’t, anything that dynamically registers handlers cannot be understood statically unless you first scan the entire project and look for everywhere a handler was registered and this is very expensive and can lead to deadlocks, if you’re not careful.

No, it doesn’t. This is also not about writing such libraries, it’s about accurately typing code from libraries that have already been written and likely will not make any large changes at this point.

mikeshardmind · March 2, 2024, 1:31pm

Okay, so we’re 100% not on the same page here. I was suggesting the type alias to the application as a 1 line change in the application, rather than doing it via configuration, paired with the library gaining generic support rather than the library gaining replacement support as a separate construct that needs additional means of being supported.

Daverball · March 2, 2024, 1:38pm

I see, then you still have to do all that work of parametrizing those generics first. What if you’re not sure if your code would benefit from this and you don’t want to spend the next several hours refactoring the type annotations in your entire code base just to check whether going more specific than Any would be worthwhile? Not to mention that this process would be error prone and not future proof without adding additional static analysis to your codebase.

This way you also lose the ability to potentially audit how your dependencies would cope with that type being substituted. Maybe while the library is written fairly generically, it actually doesn’t handle some of the possible cases gracefully. This could be uncovered through specifying the type, although you will probably also see a lot of false positives, so I wouldn’t have those errors turned on all the time.

As a library author you could even audit your own type hints this way by checking your library with the type substituted with the union of default types that should be supported.

mikeshardmind · March 2, 2024, 2:07pm

I feel like you’re approaching something in terms of testability and type exploration here that might have some value, yet is beyond what I think is useful and definable as part of the type system

If this already requires opt in from libraries, it might be possible to support such type exploration with just a type alias instead, no modifications of the type system

# library/encoder.py
type SomeAlias = Any

Without any changes to the type system, tooling could explore with just this right? That is, a tool could have it’s own configuration that allows you to check with different assumptions.

overrides = [
    library.encoder.SomeAlias=dict[str, int]
]

Daverball · March 2, 2024, 2:12pm

Yes, that is what I was saying in this reply:

I like PlaceholderType slightly better because it signals intent and encourages use of this feature in libraries, but I’m not married to it. I would be happy with anything that allows this sort of thing in a more ergonomic way and encourages adoption in the typing ecosystem.

An alternative way would be the TypeAlias route and just make TypePlaceholder a valid annotation that could default to an alias to TypeAlias so it’s backwards compatible with type checkers that don’t support it. You would lose the ability to specify a dotted name to share placeholders across libaries, but you could still import them.

kknechtel · March 2, 2024, 5:27pm

I’m afraid I don’t understand more than the very basics of the typing system, because I scarcely use it myself (basically just to create dataclasses and to provide a little documentation). But isn’t this (part of) what TypeVar is for?

Daverball · March 2, 2024, 6:07pm

Sort of. TypeVar is for enabling generics more than anything else and they have to be bound to an actual type everywhere they’re used, sometimes manually, sometimes automatically through inference. A TypePlaceholder would be a specific type, it never changes during type analysis, but it is configurable on a per-project basis.

So it’s the difference between a global variable determining some configurable portion of an API and a function parameter that you have to opt-in at every call-site if we were to use non-type system terms.