Brainstorming: pick types

Howdy.

An important use case that I feel has been very underserved by typing is handling only a subset of a classes fields.

For example, imagine you have a SQL table modeled as a class, with columns mapping to fields (a very common modeling approach). Your class has 30 fields, but you only need 3. Fetching all 30 is more work for the database, the networking stack, and your app, for no reason.

Another example is hitting a 3rd party HTTP endpoint. You’ve modeled the data coming back as a class, the class has a ton of fields but you only need a few. Even independently of whether you can get the API to only send a subset, you’ll be spending unnecessary resources constantly decoding data you don’t want.

Enter a TypeScript concept called pick types. The idea is to create a new class from an existing class by picking a subset of the existing fields.

A hypothetical example:

from dataclasses import dataclass

@dataclass
class Model:
    an_int: int
    a_float: float

# We only want the int though

from typing import Pick
loaded = fetch(Pick[Model, "an_int"])  # We pick only the "an_int" field

# loaded is an instance of Pick[Model, "an_int"]
reveal_type(loaded.an_int)   # int
reveal_type(loaded.a_float)  # Attribute error

The main benefit is that if you refactor the an_int field to have a different type, the picks will automatically pick that up (pun intended). So it’s DRY, succinct and type-safe.

Speaking on behalf of attrs, I think this is easily doable at runtime. The main issue would be type checking.

I think we would need a degree of structural typing for this to be usable. Model would need to be treated as a subtype of all of its Picks, and I guess a Pick[Model, "an_int"] needs to be treated as a supertype of Pick[Model, "an_int", "a_float"] (and any other picks where the fields are a superset of this pick’s fields.

Is this a good idea for Python?

4 Likes

I’m not sure I like this applied to dataclasses, as it implies a lot of under-the-hood behavior that very quickly raises harder questions. How should such a new type be constructed? Is it the same dataclass, minus fields that weren’t picked? Why are we specifying the constructed type this way rather than specifying it where constructed? (This one in particular if you have a strong answer for doing this dynamically at runtime, but providing the pared down type as a different type eases most of my hesitation, but leaves open questions about subtyping and the derived type that I think need very very good answers too. I don’t think it should be a subtype, but what should the type hierarchy be detected as for this? What about successive uses of pick, are multiple new types being generated? (This is important compared to TS which relies on only structural subtyping))

Does it work on things similar to a dataclass? you mentioned attrs being able to support it? How should it behave at runtime and where does the type actually come from at runtime?

What about dunders and even just user-specified methods, do those need to be explicitly Picked too to be part of the derived type? Is the runtime behavior allowed to vary from this, but leave only the picked parts as known to the type checker?

I wouldn’t mind it in TypedDict as a means to create a new TypedDict with fewer fields, and TypedDicts have a closer behavioral equivalence to the object’s TS’s pick types are built around, but I wonder if this would be resolved better for the TypedDict case in other ways. (I don’t have a strong opinion against it for TypedDicts, but I think it’s worth exploring more if that direction is desired)


One of the suggested cases is with database use. Speaking only for myself here, I generally select what I need and just handle it, and I’d rather have a type per distinct query than a complex abstraction over something that doesn’t change frequently. This means I may not be the target audience for this and someone who is might have a better supporting argument for it.

4 Likes

This is a typing feature that I found myself wondering for some time what it would be like in the context of Python, since I work more with TypeScript nowadays.

Although we don’t have the object literal construct that JS/TS has, Pick - and its opposite, Omit - could be useful for reusing definitions from Protocols, ABCs, TypedDicts that have many properties/items.
At the same time, I feel that it would be a somewhat limited use feature since Python type hints do not have a “merging” type operator like the TS intersections (e.g.: type C = A & B).

Regarding your example, it seems to me that the result of Pick is being used as a value instead of a type. Was that really the intention?

1 Like

Plus, just to add to the fun, Omit could cover some of the missing and difficult cases for ParamSpec.

1 Like

This is exactly what I was thinking, using it with typeddicts would be pretty neat. Thinking back on the times I’ve used typeddict, I think the Omit type would be more useful, but both could be nice additions to TypedDict

I imagine you could define a Protocol with the subset of the fields that you wanted to pass around. A sketch:

from dataclasses import dataclass
from typing import Protocol

@dataclass
class Movie:
    name: str
    year: int

class TimelessMovie(Protocol):
    name: str

TypedDicts are structurally typed by design, so you don’t need to do anything special:

from typing import TypedDict

class Movie(TypedDict):
    name: str
    year: int

class TimelessMovie(TypedDict):
    name: str

I was re-reading the thread and noticed this point, which gave me mixed feelings.

Since we are using TypeScript as an example, Pick and Omit are basically shortcuts for creating type aliases, defining/describing a “shape” from a subset of the properties of a type T. The resulting type alias has no relation to T, nor can it be used for anything that a concrete class can be used for (use as a value, extend etc), or you will get an error.

I see Pick as something along the lines of TypeVar, ParamSpec and the like.

Creating a concrete class from a subset of properties from another class seems like a very broad scope for a utility type to me. I end up having the same questions that @mikeshardmind raised in the comments above.
For the use case of needing a concrete class, I believe the best option would be to use Protocols or dataclasses to explicitly describe the minimum expected shapes, or delegate the transformation to a specialized lib such as Pydantic and cattrs.

Good questions, some of which I don’t have answers for. I was also worried about things like the dataclass post_init blowing up in picked classes due to missing fields. All these difficult questions mean we should probably reframe the discussion in different terms; I’ll post another comment below.

You’re right that the concept is more cleanly applied to TypedDicts since those already lean on structural subtyping and are closer to TypeScript classes anyway. Unfortunately, while I think TypedDicts have niche uses, due to some usability problems elsewhere I wouldn’t build new Python features on top of them; I think we can do better.

Could you elaborate on this?

I think if Pick existed we would have it introspectable so a library could examine it at runtime and adapt. In that scenario, the pick type would be used as a value.

Yeah, reading the replies to the OP is actually making me reframe my thought process here.

In the OP I mentioned attrs being able to create a Pick type at runtime; let’s set aside that line of thinking for now and think of Pick as purely a typing construct. Once that is in place, a hypothetical database library could use whatever it wanted to actually implement it.

In this world, Pick sounds like a shorthand for creating protocols. I think I really like this framing.

Continuing the example from the OP:

@dataclass
class Model:
    an_int: int
    a_float: float


ModelSubset = Pick[Model, "an_int"]

# this is shorthand for:
class ModelSubset(Protocol):
    an_int: int

This is neat. It solves the issue of an instance of Model being a subtype of ModelSubset (it is, structurally). It might seem like a small win but it’s not really:

  • it’s less verbose, so can be used inline. Especially with more fields
  • typecheckers can check if an_int is an actual field on Model, to protect against typos and refactoring errors
  • typecheckers can ensure ModelSubset.an_int changes type automatically when Model.an_int changes type, again ensuring safety
4 Likes

I’ll generally have all my database queries for a project in their own file. If I need to wrap it in something structlike, I will. This gives a level of separation so that anything changes with the underlying database, it only needs to be handled there, without a dependence on a full ORM and all that that entails.

I’ll happily have something structlike (I quite like msgspec’s structs for this, as it’s a small dependency, and plays well with serialization needs for network) on a per-query basis even if queries currently return the same things. The wrapping struct for a query keeps the same identifier always, I’m not playing with trying to have fewer of them, but then hop around the code to see what else needs changing.

It’s simple, it works, and it minimizes the effect of changes, even in a place they are expected to be infrequent (somewhat, especially because of this, as something that doesn’t change often sometimes has small implementation details end up relied upon)

Back to the topic at hand, something which could work here is if Pick was added (I don’t think this would work for Omit well due to dunders and so on) that it only created a Protocol built from what was picked, not a new type that appears as if it is a runtime (nominal) type. Then any runtime type that matches the protocol is valid. Pairing this with the work on intersections, a library like attrs could do something like:

T = TypeVar("T")

def pick(type[T], *fields: str) -> type[AttrsBaseBehaviorProtocol & Pick[T, *fields]]:
    # implementation here may be too dynamic and require parts to be type ignored
    return SomeNewType

(assuming typing.Pick would be [Type, *fieldnames_as_strs])

(edit: to be clear, in agreement with the immediate above post, but showing how this could be used by a library, as well as including how this could compose with other potential future typing features that are currently being discussed. If Pick was added but intersections weren’t, simply removing the base protocol and intersection, would show how a library could type at least the fields in a semi-dynamic manner.)

This approach allows it to work properly without as many messey questions and without it being specific to one library or only the standard library’s dataclases. Paired with intersections (which are in being discussed but don’t have a PEP yet), it would allow keeping some base of behavior.

4 Likes

That’s a lot.

Personally I don’t see how this different than the static typing behavior required for pandas data frames where you want to static type that the dataframe being passed has the following columns

Actually I’ll back track on this. Since an ORM table has its column/fields predefined whereas a dataframe doesnt.

I just want to specifically address this note about structural subtyping - Python as a language was (at one time) all about “duck typing”, which is mostly structural subtyping by another name. And yet most of the current typing tools lean heavily into nominal subtyping. Protocol exists but has a number of edge cases that require years of experience to work through, and frequently Protocol is only used as a way to duplicate an existing concrete type for the purpose of breaking through a nominal typing barrier posed by a function’s arguments.

I strongly support systems that push the pendulum back in the direction of structural subtyping, which I believe to be a much more natural fit for Python, as well as in many respects a more natural and powerful approach to type checking.

4 Likes