Why does `typing.Union` (`types.UnionType`) not implement the `collections.abc.Collection` (or `Set`) Protocol?

randolf-scholz · September 6, 2023, 2:03pm

It seems extremely natural that the union type should support __iter__, __len__ and __contains__, what’s the design rationale behind not supporting these operations?

kpfleming · September 6, 2023, 2:04pm

A Union is an object of a type selected from the list of specified types. It’s not a container.

randolf-scholz · September 6, 2023, 2:07pm

But why not? What’s the rationale? Not allowing it seems to make introspection unnecessarily complicated.

kpfleming · September 6, 2023, 2:14pm

I don’t understand the question. Does an int support the Collection protocol? An object, with a type hint based on Union, could very well contain an int.

Also ‘introspection’ typically refers to type analysis, not content analysis. The Collection protocol is used to iterate over the contents.

Can you provide an example of how this would be beneficial to you?

randolf-scholz · September 6, 2023, 2:22pm

Also ‘introspection’ typically refers to type analysis, not content analysis. The Collection protocol is used to iterate over the contents.

The contents of a union are the constituents of the union. For example, one may want to write code with runtime behavior based on type hints, like for example @dataclass does.

From a design rationale, what’s gained from disallowing tuple(int | str | float)? Instead, one has to use typing.get_args(int | str | float), but it would be natural from a basic mathematical point of view for union to provide collections.abc.Collection Protocol.

AlexWaygood · September 6, 2023, 2:27pm

The object returned from typing.get_args(<union_object>) is a container that you can easily inspect.

At runtime, I don’t think it would be the worst thing in the world if you could do this:

X = Union[str, int]

if str in X:
    ...

That’s not code that a type checker is ever going to be happy with, though (unless they add a bunch of extra special-casing), due to how heavily they already special-case Union. If you want to do introspection on special forms in a way that won’t make the type checker complain, I suggest you use the public-API helper function get_args. Most users of the typing module find it important to be able to write code type checkers are happy with, so I think it’s unlikely that we’ll be adding these dunders.

randolf-scholz · September 6, 2023, 2:34pm

Wouldn’t this be easily solved if Union was its own UnionType? Like this whole typing._SpecialForm seems like the black arts from time to time.

Like, why did it need PEP604 to allow isinstance(x, Union[A,B]), when the UnionType could simply implement custom __isinstancecheck__?

kpfleming · September 6, 2023, 2:34pm

I don’t think this is correct. A Union in Python is not at all like a C/C++ union. A Python Union is a type hint indicating that the object is one of the specified types, but that’s it. It does not participate in converting the content of the object from one type to another, which can be done in C/C++, nor does it mean that the objects would share the same storage in memory.

There are no ‘constituents of the union’, in Python. There is a single object, which may be one of a limited number of types.

randolf-scholz · September 6, 2023, 2:36pm

I don’t think this is correct. A Union in Python is not at all like a C/C++ union.

It’s still a set-theoretical union of types. It has constituents, which one can acquire in an awkward way via typing.get_args.

kknechtel · September 6, 2023, 8:32pm

I can’t understand what it is what you want to allow.

You say that you think a union type should support __iter__. What should be returned? What should happen the first time that you call next on this result? The second time, etc.? According to what logic?

You say that you think a union type should support __len__. What should be the result, and why?

You say that you think a union type should support __contains__. What logic should it use in order to decide whether something is in the union?

Could you give some examples?

Did you mean "I want to annotate that some variable will store a tuple, and every element of the tuple is either an int, a str or a float? That annotation is spelled (in a sufficiently recent version of Python) tuple[int | str | float], with square brackets. For example, in one of my 3.11 virtual environments:

>>> tuple(int | str | float)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'types.UnionType' object is not iterable
>>> tuple[int | str | float]
tuple[int | str | float]

tuple(int | str | float) means “take every element that is in the union type int | str | float, and create a tuple object, right now, that contains those elements”. It is completely unrelated to a proper type annotation, and it is also complete nonsense. Even if we agreed “okay, the union type int | str | float conceptually ‘contains’ the int type, the str type and the float type”, the result would be an instance of tuple, not any kind of type.

randolf-scholz · September 6, 2023, 8:53pm

You say that you think a union type should support __iter__ . What should be returned?

The constituents of the union

You say that you think a union type should support len. What should be the result, and why?

The number of elements of the union

You say that you think a union type should support __contains__ . What logic should it use in order to decide whether something is in the union?

If it is an element of the union. y in x ⟺ any(y==x for x in union_object).

Here’s an example:

import types
import typing

def int_in_union(union_object: types.UnionType) -> None:
    # if int in union_object:  # sad_trombone.mp3
    if int in typing.get_args(union_object):  # ugly
        print("int is in the union")
    else:
        print("int is not in the union")

int_in_union(float | str)  # prints "int is not in the union"

types.UnionType is a natural set-like container type. This follows immediately from the properties:

Union[x, x] = Union[x]
Union[x, y] = Union[y, x]
Union[x, Union[y, z]] = Union[Union[x, y], z] = Union[x, y, z]

So if it looks like a duck, quacks like a duck, why isn’t it a duck?

Did you mean "I want to annotate that some variable will store a tuple, and every element of the tuple is either an int , a str or a float ?

No. I want to introspect type-annotations at runtime.

It is completely unrelated to a proper type annotation

It is not supposed to be a type annotation?? This is about runtime introspection of type-hints.

Even if we agreed “okay, the union type int | str | float conceptually ‘contains’ the int type, the str type and the float type”, the result would be an instance of tuple , not any kind of type.

Yes!

bryevdv · September 6, 2023, 9:47pm

How are these supposed to work

Union[int, str, "MyClass"] # you want the string back?

if TYPE_CHECKING:
    from foo import Bar # not otherwise imported

Union[Bar, None] # what is the first iteration value?

flyinghyrax · September 6, 2023, 10:43pm

I really like this proposal and agree with it conceptually. I’ve been experimenting with introspecting GenericAlias recently and this makes a lot of sense to me.

That said… oof, that snippet @bryevdv posted makes this look kinda intractable.

sirosen · September 7, 2023, 12:24am

This request “makes sense” at an intuitive level, but I think it’s messy enough in practice to not be worth the marginal gain.

Optional, nested Unions, TypeVarTuple, NewType, and other “corner cases” appear, IMO, to be too numerous to safely add this.
Implementing __contains__ would require a lot of decisions and tradeoffs.

I’m not sure the other dunders make sense if containment isn’t included.

randolf-scholz · September 7, 2023, 9:09am

That code snippet is not executable, so it’s meaningless to ask about what the first iteration value would be. The first value should be the same what we get back if we iterate typing.get_args(union_object). Since Union is mathematically a set-like container, iteration order probably would not be guaranteed (it is currently by implementation since Union stores a tuple, but treats it like a set whenever necessary.

For instance, _UnionGenericAlias casts to set for equality comparison:

github.com

python/cpython/blob/3e53ac99038920550358c1ea0212c3907a8cb385/Lib/typing.py#L1535-L1538


      
          def __eq__(self, other):
              if not isinstance(other, (_UnionGenericAlias, types.UnionType)):
                  return NotImplemented
              return set(self.__args__) == set(other.__args__)

Implementing contains would require a lot of decisions and tradeoffs.

Given that Union already implements set-equality, there is only one sensible definition of __contains__, which is the one that makes it compatible with __eq__, in the sense that Union1 == Union2 if and only if all(x in Union2 for all x in Union1) and all(y in Union1 for y in Union2)

bryevdv · September 7, 2023, 4:26pm

My question was not about the order, it was about deferred annotations. Here is a complete, valid, runnable script:

from __future__ import annotations

def foo(arg: Bar | None): pass

What are you hoping to see reported for “Bar” at runtime.

Maybe a sensible answer is in PEP 649, which will change how deferred annotations work, but I think the onus is on you to demonstrate that.

randolf-scholz · September 7, 2023, 4:42pm

The same thing that typing.get_args(union_object) would report. If that changes with PEP649, so be it.

bryevdv · September 7, 2023, 4:46pm

Using your code above, with deferred annotations, get_args returns nothing:

In [1]: from __future__ import annotations
   ...:
   ...: import typing
   ...:
   ...: def foo(arg: Bar | None = None):
   ...:     print(typing.get_args(arg))
   ...:
   ...: foo()
()

Edit: I guess you actually pass int | str as a parameter in your code? That’s not actually how anyone uses type annotations. Perhaps more relevantly, deferrerd annotations are just strings (currently, pre-PEP 649). There is no “union object” to inspect:

In [8]: def foo(arg: Union[Bar, None]): pass

In [9]: get_annotations(foo)
Out[9]: {'arg': 'Union[Bar, None]'}

randolf-scholz · September 7, 2023, 4:54pm

~~That might actually be a bug. I am not using deferred annotations a lot, but I’d expect in this case that typing.get_args would give something like ("Bar", int).~~

EDIT: deferred annotations simply turns everything into strings.

randolf-scholz · September 7, 2023, 4:59pm

Edit: I guess you actually pass int | str as a prameter in your code? That’s not actually how anyone uses type annotations.

That’s just not true lol, popular libraries like pydantic introspect type-annotations at runtime to modify runtime behavior.