Add `iter=True` to dataclass decorator

I add a __iter__ method to many of my dataclasses:

from dataclasses import astuple, dataclass

@dataclass
class Point:
    x: float
    y: float
    def __iter__(self):
        return iter(astuple(self))

Edit to note: the above implementation is problematic in many non-trivial cases. See this post for a correct implementation of __iter__. I see the trickiness of correctly implementing this as a mark in favor of this feature existing.

This allows for tuple unpacking:

>>> p = Point(1, 2)
>>> x, y = p

It would be nice to be able to do this instead:

from dataclasses import dataclass

@dataclass(iter=True)
class Point:
    x: float
    y: float

I also suspect that this may make the transition from named tuples to dataclasses easier for folks who use named tuples solely for the sake of tuple unpacking.

It also might be nice to be able to set iter=False in dataclass fields:

@dataclass(iter=True)
class Point:
    x: float
    y: float
    color: str = field(default="black", iter=False)

An example demo of an iter argument for dataclass

An example of iter supported by field

14 Likes

I was thinking about pushing back because although the example is demonstrative it just didn’t really seem worth it over astuple or just doing what you provided

But one thing that I realized that makes this (and anything dataclass-related) nice is the typing aspect.

In a world with this feature, the type checker can provide an accurate return type, such that the elements of the “deconstruction” each have the correct type (and correct amount) and the code author didnt have to repeat the field types.

With that in mind it honestly tips the scales in my book. If this makes it to a PEP I think that angle should be stressed.

2 Likes

You can use namedtuple for this. It was explicitly designed to have this works-like-a-tuple behaviour.

One of the more common reasons I see given for not using namedtuple is apparently because people don’t like this unpack/iter behaviour. That suggests that many would not like this although I suppose it is different it is opt-in.

I was going to say that it would be bad to add too many optional parameters in @dataclass but I just checked and it already has 10 which is way past my threshold of sensibility so I guess why not have 11.

2 Likes

A minor problem with namedtuple is it makes having index, count as attribute names impossible. I wish there was a way around that, maybe there is and I am not aware.

I have used this kind of construction myself, and it can proof useful. For me the main draw was being able to call functions as f(*Point).

However, reading though the documentation of astuple, I notice it calls copy.deepcopy on all attributes, which I wouldn’t have wanted when I used this pattern. Because I did care about performance, and I’m careful enough with never modifying function inputs.

Additionally, if you want to provide support for *Point, would you also consider also adding support for **Point? It is arguably more robust (because it doesn’t care about re-ordering of the attributes), and doesn’t have the problem of accidental unpacking.

(Using dataclass.asdict to define the behaviour of f(**Point) is probably a bad idea, which is an argument in favour of declaring this out-of-scope.)

I’m kind of neutral on this, I don’t think dataclasses needs it. There’s the question of whether the code should actually be more like the rest of dataclasses’ generated methods, so that the generated code would be more like this for your example:

def __iter__(self):
    yield self.x
    yield self.y

This is roughly what I generate in ducktools.classbuilder, although I don’t think I’ve ever ended up using this feature.


Also worth noting, I would be wary of using astuple to unpack dataclasses in general as alongside deepcopying some objects it also converts any other dataclasses it encounters into tuples which might not be what you wanted.

from dataclasses import dataclass, astuple

@dataclass
class A:
    a: str = "A.a"
    b: str = "A.b"

@dataclass
class B:
    a: A
    b: str = "B.a" 

print(astuple(B(A())))

Output:

(('A.a', 'A.b'), 'B.a')
2 Likes

I’m also neutral, but if we go this route I’d suggest an implementation that doesn’t use astuple (as @DavidCEllis suggests) and the option for the user to omit fields from the iteration. e.g.

from dataclasses import astuple, dataclass, field, MISSING, fields


def iter_field(*, default=MISSING, default_factory=MISSING, init=True,
          repr=True, hash=None, compare=True, metadata=None, kw_only=MISSING,
           iterate:bool = True):
    md = metadata if metadata is not None else {}
    md['iterate'] = iterate
    return  field(default=default,default_factory=default_factory,init=init, 
            repr=repr,   hash=hash,compare=compare,metadata=md,kw_only=kw_only)

@dataclass
class Point:
    x: float
    y: float
    z : float = iter_field(iterate=False)

    def __iter__(self):
        for f in  fields(self):
            if (i := f.metadata.get('iterate',None)) is None or i:
                yield getattr(self,f.name)



p = Point(1, 2, 3)

x, y = p
print(x,y)

for v in p:
    print(v)



If you look at the linked proposals, you’ll see they already don’t use astuple.

Not sure if this was also directed at me but I did see that, my comment on that issue was a response to the original post giving an example that did use astuple noting that in some cases this won’t give you what you might expect.

How commonly does this come up? I get that the OP says he’s doing it a lot, and that seems fine and legitimate to me. But is this super common? I can’t remember having written or read a dataclass with iter prior to this thread.

I agree, but I’m less willing to give up just because there are already too many options.

The current ones mostly pertain to the core behaviors of a dataclass, like how hashing and equality are defined.

Maybe this is totally arbitrary, but iter feels different, and like something that’s reasonable to expect users to define explicitly if they want it.


I’m not strongly against this idea. But I think that if the proposal doesn’t have a good motivation, it will not succeed. So I’m challenging it in part to force the issue.

2 Likes

Thanks all for the replies!

The downside I see to namedtuple is that you need to be comfortable buying into the object being an actual tuple. I often don’t want many of the features that tuples provide:

>>> from typing import NamedTuple
>>> class Point(NamedTuple):
...     x: float
...     y: float
...
>>> p = Point(1, 2)
>>> q = Point(3, 4)
>>> p + q
(1, 2, 3, 4)
>>> p * 2
(1, 2, 1, 2)
>>> len(p)
3

As others have noted, using astuple isn’t quite right and I didn’t use it in my linked example code and probably should have noted its downsides in my initial post.

As @peterc noted, the astuple function deeply copies (which is unnecessary for this use case). But even weirder it deeply converts all dataclasses to tuples, as @DavidCEllis noted.

>>> from dataclasses import astuple, dataclass
>>>
>>> @dataclass
... class A:
...     n: float
...     m: float
...
>>> @dataclass
... class B:
...     a: A
...     x: float
...     y: float
...     def __iter__(self):
...         return iter(astuple(self))
...
>>> b = B(a=A(1, 2), x=3, y=4)
>>> list(b)
[(1, 2), 3, 4]

The way to implement __iter__ correctly might look like this:

    def __iter__(self):
        for field in dataclasses.fields(cls):
            yield getattr(self, field.name)

If there was a less verbose way to correctly implement this, I wouldn’t be so tempted to propose adding iter=True to dataclasses.dataclass.

1 Like

As another datapoint, I also encounter this somewhat frequently, also always with the goal of being able to unpack the class in a convenient way, e.g. Point2D, Point3D, Rectangle, Triangle are all classes where I have implemented and used this.

Ideally this would also be solved with a more powerful unpacking system that gets closer to an inline match-case-pattern to e.g. allow both Rect(xy=(x,y), wh=(w,h)) = rect and Rect(topleft=(x1,y1), bottomright=(x2,y2)) = rect to allow usage of different rect representation in a self-documenting manner. But I think making this simple usecase easier is a step in the right direction.

I’m -.5 on the idea. Today someone wants it to be shallow, tomorrow someone may want it to be a deep copy.

I think a better alternative is adding a shallow param added to astuple. They people can change the behavior as they’d like.

3 Likes

I don’t feel that this is particularly verbose? You could reduce it by a line with yield from if you wanted.

    def __iter__(self):
        yield from (getattr(self, f.name) for f in fields(self))

But as said before, this would probably be generated into the specific yield statements similar to how the other dataclasses methods are generated.


I’m actually more negative on this if the goal is to add it to dataclasses in order for type checkers to understand this as I don’t like the idea that in order for type checkers to understand such a thing it has to be in the stdlib implementation.

I don’t like mixins normally but I think it’s one of the cases you can just mixin this method.

class IterFields[T]:
    def __iter__(self) -> Iterator [T]:
        for field in dataclasses.fields(type(self)):
            yield getattr(self, field.name)

I just think that there’s not going to be enough traction here for adding iter to dataclasses generally

1 Like

I agree.

The problem for typing fans here is that it’s unpacking which needs the special case. For a general iterable, it, type checkers need to be able to infer types for a, b = *it and for x in it. For that to be consistent, x, a and c must all have the same type. But tuple unpacking is often used for returning multiple values from a function, and those multiple values can have different types. So there’s a special case for a, b = *some_tuple, to handle that case.

For runtime purposes,

def __iter__(self):
    for field in dataclasses.fields(cls):
        yield getattr(self, field.name)

is perfectly fine. It’s not particularly verbose, and for something that’s occasionally needed, having to write it yourself is acceptable. Dataclasses are designed to allow you to add your own functionality like this.

For typing purposes, something typing-specific seems like the right approach to me. I have no particular insight into what would be a good design, but something along the lines of

@typing.unpacks_like(int, float)
@dataclass
class A:
    x: int
    y: float
    def __iter__(self):
        yield self.x
        yield self.y

seems more in line with the principles that typing is optional, and shouldn’t require extensive changes to runtime code.

3 Likes

It’s not “special in the typerchecker” as in you couldn’t do it today (mark yourself as returning a tuple[...]), it’s “special in the typerchecker” as in “I didn’t have to repeat the exact type of every field in the right order”.

Same goes for the synthesized __init__, or any of the other nice synthesized utilities around dataclasses. It’s not that we couldn’t already do it, it’s “if I have to spell this out ONE MORE TIME…”

Edit: now that I think about it, returning tuple[...] might “work” but also not be 100% correct (since a tuple is iterable but isnt an iterator). That point could use some fixing in typing, but doesn’t change the above point

I’d say that someone would be wrong. Iterators are for iterating, not for copying. They should never copy data, they should just Iterate it.

2 Likes

OK, but that’s a typing issue that should be solved by the typing system, not by adding special cases to the runtime. And by “special cases” here, I mean an iter=True argument that does something you can’t do yourself by writing out the generated code by hand.

Your edit here is the key point. __iter__ returns an iterator, not an iterable. And there’s currently no way of expressing in the type system that a function returns an iterator which will yield precisely one int followed by one float. So if I, as a user, can’t write a class with an __iter__ that typechecks the way tuples do in unpacking, then nor should a library function be able to - regardless of whether it’s in the stdlib or not.

2 Likes