Introduce Partial for TypedDict

USSX-Hares · February 6, 2024, 12:20pm

The Problem

TypedDicts are awesome when you are working with a data model you do not own (i.e., you received some JSON and are required to add/remove one specific field, preferably keeping the order of items). However, sometimes, you want to provide a patch only, or, in other words, partial dict. This would imply that ALL fields are NotRequired, even those marked with Required. TypeScript and some other languages have the Partial semantics.

Also, this could be very helpful for the dataclasses, but I assume it is harder to implement.

Example

from typing_extensions import TypedDict

class Structure(TypedDict):
    x: int
    y: int

def apply_patch(data: Structure, patch: Partial[Structure]) -> Structure:
    ...
def common_of(*items: Structure) -> Partial[Structure]
    ....

apply_patch(Structure(x=1, y=2), { 'x': 3 })  # -> { 'x':3, 'y':2 }
common_of(Structure(x=1, y=2), Structure(x=1, y=3)) # -> { 'x': 1 }

Current Solution

Currently, there is no Partial semantics in the language.
There are two workarounds:

Use total=False. This would allow the creation of partial data but will do so in all cases, even if the complete data structure is required.
Use two different classes, one incomplete and one complete (complete derived from partial, or vice versa). This would require a ton of additional classes and sometimes would not be possible.

Similar

Topic in Help: Is it possible to add a type annotation for partial/not-total typeddicts?
Question in StackOverflow: type hinting - Python typehint subset(partial) of an TypedDict - Stack Overflow

Updates

Update 1

Code snippet changed: Foo → Structure (artifact of other code snippet)

apalala · February 6, 2024, 2:00pm

The above begs the question of What is the type of Partial[Structure]? May I query x on it? May I query y?

How does the caller know what the returned “partial” actually contains? What does the calling code look like for the given examples?

Note that adding Partial[] would not prevent a client from calling:

apply_patch(Foo(x=1, y=2), { 'foo': 3 })

Also, where is Foo delcared in the above?

Daverball · February 6, 2024, 2:35pm

It would be exactly the same behavior as if the TypedDict had originally been constructed with total=False (or rather as if every key had been annotated with NotRequired).

The behavior of a Required vs NotRequired key in TypedDict is already well defined, so it doesn’t make sense to discuss those semantics.

This is mostly a utility to create a new TypedDict from an existing one, where all the keys are now optional.

I don’t have a strong opinion on this, but I think this would be better suited as a subclass parameter like total. The use of Partial seems too narrow to justify yet another type modifier, unless you can provide other examples of other use-cases where it could be used in the future, like e.g. protocols.

pf_moore · February 6, 2024, 3:06pm

Why can’t this be done already, using subclassing, via:

class PartialStructure(Structure, total=False):
    pass

I’m pretty sure the runtime semantics here would^[1] be as you’d expect. Is this simply something where type checkers don’t implement the full runtime semantics?

or could, if the metaclass currently doesn’t allow this ↩︎

USSX-Hares · February 6, 2024, 3:09pm

The meaning of Partial is that all fields are optional. In other words, none, any, or all keys are present. dict() is a valid Partial[Structure], as well as { 'x': 1, 'y': 2 }.

This is important when partially processing a data structure. Here’s a use case I usually encounter.

Let’s imagine that we are loading some dict-like structure, i.e. configuration file. This structure has entries that inherit other entries (i.e., defaults).

conf.json

{
    "name": "Super Server",
    "default_host": { "port": 80, "protocol": "http", "bind_address": "127.0.0.1" },
    "hosts":
    [
        { "name": "main-http", "bind_address": "example.com" },
        { "name": "main-https", "bind_address": "exmaple.com", "port": 443, "protocol": "https", "ssl_key_file": "/etc/ssl/example.com.pem" },
        { "name": "local" }
    ]
}

After loading and parsing, we want config to match the following structure:

class Host(TypedDict):
    name: str
    port: int
    protocol: str
    bind_address: str
    ssl_key_file: NotRequired[str]

class Config(TypedDict):
    name: str
    hosts: List[Host]

As you can see, none of the .hosts[*] or .default_host match the Host structure, both are actually Partial[Host]. Yes, you can create a special class PartialHost, but if you have this pattern in your code in different places, what you actually want is to make a more generic solution.

A typo in a code snippet, fixed.

Partial, as well as TypedDict does not prohibit extra keys. It only defines which keys may be present, but they don’t have to. It’s just a typing hint that suggests the correct usage of the structure.

USSX-Hares · February 6, 2024, 3:23pm

If you are desigining a very generic method, this won’t work:

def apply_patch_generic[T: Structure](base: T, patch: Partial[T]) -> T:
    ...

And yes, this is a real use case when desiging a data-processing framework.
I have struggled a lot due to lacking of that feature when reading complex configuration files or XMLs or OpenAPI specs.

Have a look on Google’s gerrit-repo tool: repo Manifest Format

It has a XML manifest file with nodes 3 types that matter
remote-s define a repository group or owner, a URL prefix before repo name
project-s define reposotories to download
default defines any keys project is missing.

I want to extract a revision of a project with a given name. To do so, I must first look if the project defines a revision, then look if the remote is defined for that project, use it or default remote, from that remote check if the revision is present, and if not select revision from default.

And note that even though revision is required to be present somewhere it’s not defined where it actually is. Furthermore, default defines (basically) the same keys as project, but, except project, all keys are optional.

And this is the most basic scenario I encountered recently that makes me want to have Partial-s.

Actually, this won’t work if the Structure is already defined with total=False and have some of its keys marked with Required.

apalala · February 6, 2024, 3:26pm

FWIW, I handle cases as the one described using a @dataclass with default values defined.

With that, something like this is usually good enough:

s = StructureDataClass(**json_input_dict)

(my apologies for the abstract example. I can provide an actual one if that is useful)

EDIT:

This is an example of TatSu using dataclass to deal with partials:

github.com

neogeny/TatSu/blob/master/tatsu/infos.py#L22


      
          
          
          class UndefinedStr(str):
              pass
          
          
          _undefined_str = UndefinedStr()
          
          
          @dataclasses.dataclass
          class ParserConfig:
              owner: Any = None
              name: str | None = 'Test'
              filename: str = ''
              encoding: str = 'utf-8'
          
              start: str | None = None  # FIXME
              start_rule: str | None = None  # FIXME
              rule_name: str | None = None  # Backward compatibility
          
              comments_re: str | None = None

Daverball · February 6, 2024, 3:26pm

total only applies to keys that were added in the subclass, it doesn’t apply to keys from parent classes. We can’t really change that at this point, since it would break existing code:

Currently it’s also disallowed to override keys that have already been specified by parent classes to prevent confusion^[1]. PEP 705 alleviates some of that with ReadOnly keys which can be more freely overridden in subclasses, but it would still disallow this use-case, going the other way is fine though^[2].

even though structural types don’t really care about inheritance, it would still be confusing if the subclass wasn’t structurally a subtype of the parent ↩︎
i.e. start with fully optional and end up with partially or fully required ↩︎

USSX-Hares · February 6, 2024, 3:32pm

Well, I tried that too. As soon as you declare default value to your ParserConfig, it immediately stops being Partial. The core idea behind the partial is the fact keys can be missing – not having their default value, but missing. In your example, if you instantiate a partial ParserConfig, is it possible to determine if the name="Test" is a lack of value or a defined value that is equal to the default value?

Daverball · February 6, 2024, 3:50pm

Another way to go about this would to add a synthesized property to the resulting TypedDict, so you can do Structure.partial.

I’m not a big fan of either solution, but it still seems better to me than adding an entire type modifier just for this one relatively small quality of life improvement. It would be easier to justify if we had support for NotRequired on Protocol, but even then it’s unlikely to be useful since a fully optional Protocol doesn’t really make that much sense. I can’t really think of any other cases where this could be applied.

apalala · February 6, 2024, 4:29pm

The “partial” part in that particular example is provided by the **settings argument.

I didn’t intend to solve your use case, but just to provide an example of a similar one.

In ParserConfig any key may be missing, but keys not defined in the dataclass are not allowed (at runtime).

As others have mentioned, could get away with a Structure, total=true, and a PartialStructure, total=false, and well defined merge and project methods over them (merges produce total, projections produce partial).

alexmojaki · February 7, 2024, 10:14am

+1. Here’s another request for it that’s somewhat popular: Support for Partial types · Issue #13695 · python/mypy · GitHub

samuelcolvin · February 7, 2024, 12:56pm

Hi, I’d like to strongly support this.

We (pydantic) have had numerous requests for Partial support in Pydantic over years. I’ve always refused them since there was no way to represent a partial model/typeddict in typing canonical way.

I would suggest:

Adding Partial which can be used only on TypedDicts as suggested here
possibly in future extending Partial to be usable on a dataclass or anything which uses dataclass_transform - the meaning of Partial[MyDataclass] would be to make any required field optional with a default of None - e.g. field: int would become field: int | None = None

To be clear, typeddict support would be a massive win without the dataclass support.

Evidence of lots of people needing this in the real world:

Pydantic issue with 13
FastAPI issue asking for Partial
3rd party pydantic-partial package
PR in instructor to add Partial support

Daverball · February 7, 2024, 2:01pm

Partial on a dataclass seems weird to me, since it is not a structural type, but a nominal one, this means that Partial would need to actually create a real new dataclass that can be instantiated, otherwise it would be completely useless.

That’s also part of why I’m not a big fan of making this a type modifier, for nominal types it doesn’t make sense unless it gives you back a runtime usable type and for structural types you’d probably still find yourself wanting a usable type^[1].

I think this makes more sense as a type constructor, so it shouldn’t use a subscript. Unless you can provide examples where a modifier would be more useful than a constructor.

Also just for the record, I support the addition of something like this, it would get rid of a lot of redundant TypedDict definitions, I just think the current proposed syntax is the wrong direction for this feature.

especially for TypedDict, less so for Protocol unless you wanted it to be runtime checkable or subclassable with default implementations for some of the methods, although I would question how either of those would interact with Partial ↩︎

samuelcolvin · February 7, 2024, 3:11pm

I just think the current proposed syntax is the wrong direction for this feature.

What syntax would you propose?

My proposal would be that Partial[X] returns a wrapper for X as parameterising a generic does today, you could even cause wrapper.__init__ to raise an error so users were discouraged from using it at runtime.

The question then becomes “but how do I actually instantiate a partial X”, you could solve this with factories, something like:

T = TypeVar('T')

def partial_creator_factory(t: type[T]) -> Callable[[Any], Partial[T]]:
    ...

In Pydantic, we would support Partial via TypeAdapter, e.g.:

from pydantic import TypeAdapter

ta = TypeAdapter(Partial[MyTypedDict])
my_typed_dict = ta.validate(...)

We’d of course also support using Partial[MyTypedDict] as a type annotation in a model or dataclass etc.

(If dataclass/dataclass_transform were supported, MyTypeDict could be replaced with MyModel or MyDatacalss)

To be clear, I would love Partial for just TypedDict, I don’t want my comments about dataclasses etc. to derail the conversation or reduce the chance that Partial is accepted just for typed dicts.

Daverball · February 7, 2024, 3:24pm

I’ve proposed a couple of options earlier in the thread:

I.e. it would not be its own construct, it would be an extension on TypedDict. For dataclasses I would go the same way. It will be much easier to make use of at runtime.

The use of Partial seems too narrow to justify yet another type modifier, unless you can provide other examples of other use-cases where it could be used in the future, like e.g. protocols. ↩︎
I’m not a big fan of either solution, but it still seems better to me than adding an entire type modifier just for this one relatively small quality of life improvement. It would be easier to justify if we had support for NotRequired on Protocol, but even then it’s unlikely to be useful since a fully optional Protocol doesn’t really make that much sense. I can’t really think of any other cases where this could be applied. ↩︎

USSX-Hares · February 7, 2024, 3:37pm

Should we do anything at runtime at all? For me, this should act like typing generics/specials, something like this:

@_SpecialForm
def Partial(self, parameters):
    item = _type_check(parameters, f'{self} accepts only single type.')
    return _GenericAlias(self, (item,))

It’s just a special typing form, no additional logic is required at runtime. Yes, the type checkers would have to support this, but the program loading time would be almost the same ^[1].

About the same as Final[MyTypedDict] and even faster than Optional[MyTypedDict] ↩︎

Daverball · February 7, 2024, 4:01pm

Considering Required, NotRequired, ReadOnly all have runtime implications for TypedDict^[1] it would seem strange to me if Partial was purely a typing._GenericAlias, it would also limit its usefulness, since you would now have to write a type constructor to create a runtime usable Partial yourself.

It seems like a very high cost, considering the only valid type inside Partial would currently be a TypedDict. For nominal types the use-cases seem extremely limited and the only other structural type Protocol would give you a type checking only Protocol, so it doesn’t compose with runtime_checkable or default implementations.

I realize something like TypedDict.partial would incur runtime overhead^[2], but it seems trivial compared to the additional complexity in runtime analysis you otherwise would have to deal with, especially once you cache the constructed type^[3].

I think a typing special form has to be useful in more than one place and have considerable advantages over a runtime type constructor in order to be justifiable.

they change internal attributes on the class that can be used for introspection ↩︎
although the transformation would be pretty simple, so probably not that much ↩︎
creating a typing._GenericAlias isn’t free either ↩︎

USSX-Hares · February 7, 2024, 5:09pm

Dived into the source code of typing. Actually, they don’t affect runtime by themselves ^[1]. It’s _TypedDictMeta who does the heavy lifting ^[2]. Since the TypedDict class is already created, no additional work is required.

Edit #1

Could you provide an example where using MyTypedDict, MyTypedDict.partial, DerrivedFromMyTypedDict or any other analogue would perform better than Partial[MyTypedDict]?

Edit #2

Speaking of runtime. When you instantiate a TypedDict subclass (i.e., MyTypedDict(x=1, y='2'), the type() and .__class__ of the resulting object is builtins.dict.

github.com

python/cpython/blob/fedbf77191ea9d6515b39f958cc9e588d23517c9/Lib/typing.py#L3053


      
          
              td = _TypedDictMeta(typename, (), ns, total=total)
              td.__orig_bases__ = (TypedDict,)
              return td
          
          _TypedDict = type.__new__(_TypedDictMeta, 'TypedDict', (), {})
          TypedDict.__mro_entries__ = lambda bases: (_TypedDict,)
          
          
          @_SpecialForm
          def Required(self, parameters):
              """Special typing construct to mark a TypedDict key as required.
          
              This is mainly useful for total=False TypedDicts.
          
              For example::
          
                  class Movie(TypedDict, total=False):
                      title: Required[str]
                      year: int

↩︎

github.com

python/cpython/blob/fedbf77191ea9d6515b39f958cc9e588d23517c9/Lib/typing.py#L2886


      
          
          _NamedTuple = type.__new__(NamedTupleMeta, 'NamedTuple', (), {})
          
          def _namedtuple_mro_entries(bases):
              assert NamedTuple in bases
              return (_NamedTuple,)
          
          NamedTuple.__mro_entries__ = _namedtuple_mro_entries
          
          
          class _TypedDictMeta(type):
              def __new__(cls, name, bases, ns, total=True):
                  """Create a new typed dict class object.
          
                  This method is called when TypedDict is subclassed,
                  or when TypedDict is instantiated. This way
                  TypedDict supports all three syntax forms described in its docstring.
                  Subclasses and instances of TypedDict return actual dictionaries.
                  """
                  for base in bases:
                      if type(base) is not _TypedDictMeta and base is not Generic:

↩︎

Daverball · February 7, 2024, 6:12pm

Yes, the heavy lifting is done by the metaclass, but that’s an implementation detail and doesn’t really change the end result. Same goes for TypedDict instances being a dict at runtime^[1].

The long and short of it is, if you have code that can currently understand and analyze the type constructor returned by _TypedDictMeta in order to e.g. generate a validator that checks if required keys are present, you would have to now write new code to perform the transformation that Partial would perform manually, whereas if it’s built into the type constructor it works the same either way and you don’t have to change any code.

For static analysis there’s little difference between the two options, but for runtime analysis you add more complexity, that’s not really giving you anything in return. We already have quite a few type modifiers and special forms, so I think if you’re going to add more of them, the value has to outweigh the implementation cost for every runtime library to add support for it.

this makes a lot of sense, once you remember that TypedDict is a structural type, so the nominal type would have to be something else ↩︎