Add automatic constructor for classes

In practice most constructor implementations are trivial in that they simply map self attributes to correspondingly named parameters.

class Foo:
  x: int
  y: int
  z: int
  def __init__(self, x: int, y: int, z: int):
    self.x = x
    self.y = y
    self.z = z

The typical workflow consists of a) annotating the types for each attribute, b) creating parameters for said attributes, c) assigning attributes to the parameters. This is 3x more typing than necessary. Modern languages have provided a way to reduce this repetitive ceremony.

Kotlin and Scala have primary constructors that will function as field type annotations, parameter declarations, and implicit assignments to the parameter arguments all at once.

class Foo (
  var x: Int,
  var y: Int, 
  var z: Int
)

var foo = Foo(3,7,5)

TypeScript also provides a shorthand notation for this behavior

class Foo {
  constructor(
    public x: number,
    public y: number,
    public z: number
  ) {}
}

var foo = new Foo(3,7,5)

Swift has automatic initializers for structs.

After using these languages, manually writing class constructors in Python feels like a chore.

In Python, the @dataclass decorator does allow for this

from dataclasses import dataclass

@dataclass
class Foo:
  x: int
  y: int
  z: int

foo = Foo(5,7,3)

however it is not the standalone behavior, but includes other undesired effects like making the object unhashable (and so unable to be added to sets) by default, along with value object notions of __eq__ and other magic methods the user is forced to opt into by using the decorator. I may not want to represent an entity as a “data class”, but still get the automatic constructor behavior.

Requiring an import statement is also annoying, since automatic constructors is something I’d like to opt into for nearly every class I ever write, so I will have to write this in nearly every file. It would be the norm rather than the exception. Providing this either as a globally accessible decorator or as a dedicated syntax construct would be good for having reasonable defaults.

Besides a class decorator or dedicated keyword, perhaps a decorator specifically for the __init__() method would work. (But again, without requiring an import statement please.)

class Foo:
  @auto
  def __init__(self, x: int, y: int, z: int):
    ... # automatically assigns self.x = x, self.y = y, self.z = z

Thanks to the inspect module, the functionality you mentioned is not as difficult to implement as it might seem.
You can implement the @auto decorator as shown below.

import inspect

def auto(func):
    try:
        signature = inspect.Signature.from_callable(func)
    except ValueError:
        def inner(self, **kwargs):
            for name, value in kwargs.items():
                setattr(self, name, value)
            func(**kwargs)
    else:
        def inner(self, *args, **kwargs):
            bound = signature.bind(self, *args, **kwargs)
            bound.apply_defaults()
            for name, value in bound.arguments.items():
                if name == "self":
                    continue
                setattr(self, name, value)
            func(self, *args, **kwargs)
    return inner

Example:

class MyClass:
    @auto
    def __init__(self, a, b, /, c, d, *, f, e=None):
        print(vars(self))

my = MyClass(1, 2, 3, d=4, f=5)  # prints out {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'f': 5, 'e': None}
1 Like

I’ve tested that in vscode, and it does not seem to provide any type information for the instance attributes.

@dataclass
class Foo:
    @auto
    def __init__(self, x: int): ...

    def fn(self):
        self.x # pylance shows the type as x: Any

forcing me to also include the type annotation

py
@dataclass
class Foo:
    x: int
    @auto
    def __init__(self, x: int): ...

    def fn(self):
        self.x # x: int correctly inferred

Your solution eliminates 1/3 of the repetitive ceremony, but there is still an extra step I’d like to remove.

I assume we could get inspect working for the class instance annotations themselves, presumably what @dataclass uses already.

This would be quite nice, but the suggestion is quite opinionated as to what the automatic constructor should look like. It’s also common to have:

self._x = x

If you want a hashable dataclass instance, you can set frozen=True by the way. As the docs for the unsafe_hash arg explain, a mutable hashable dataclass is a can of worms.

Is there a problem with using dataclass(repr=False,eq=False)? If you use this a lot, you can also assign this to a variable.

I’m inclined to agree with the OP that it would be nice to have a decorator besides @dataclass. Though for different reasons, and I think importing the decorator is fine.

My personal frustration with @dataclass is that it is too useful not to use even for classes that aren’t dataclasses.

A proper ‘dataclass’ is a object with attributes and either properties or cached_properties. That’s just what the word means. A dataclass is a class for storing and accessing data.

There is (as always) a fuzzy boundary.

(Like is a dataclass allowed to have methods that take arguments. The fact that it goes against the strict definition is probably less important than the fact that such methods are useful.)

But a class that has methods that modify itself is not a dataclass. And yet I use @dataclass for those classes anyway.

frozen=True is useful (thanks @JamesParrott).

A decorator for more general classes would be appreciated. Such a decorator could also implement default getters and setters for protected attributes such as _x and set them in the init, if we wanted that.

Is there a problem with using dataclass(repr=False,eq=False)? If you use this a lot, you can also assign this to a variable.

The problem is that it feels like a misuse of @dataclass, a hacky workaround. I’m not trying to represent a “data entity” or “value object”, I simply want a less repetitive constructor and don’t want any of the philosophical implications that “dataclass” might carry (or the magic methods included).

I could assign it to a variable, but then I have to import that variable into every file, in all of my codebases, and introduce it to all of my coworkers who might think it is so unusual as to not justify it’s usage and instead just accept the more repetitive manual constructors because at least it’s idiomatic python.

It’s important for languages to provide reasonable defaults out of the box if they have any chance at being widely used; and I am sure there are thousands of other developers coming from the languages I mentioned who would hope for and benefit from this as a builtin feature.

If you want a hashable dataclass instance, you can set frozen=True by the way.

This would make the object state immutable, but I often want mutable objects inside of sets. Like a gameworld.characters as a set[NPC] containing NPC instances which of course have mutable state like health: float

1 Like

Use unsafe_hash=True then, and live with it.

I think you’re reading too much into the name dataclass. It’s perfectly fine to use it to define classes that are more than just boxes around some values. You also can use the arguments to dataclass to specify exactly which dunders you want to have auto generated. By default these are __init__, __repr__, __eq__, __match_args__, the ordering ones, and potentially __hash__. You can individually toggle each of them on or off with their corresponding argument and also have __slots__ included. If you want to use the default id based object equality you can just set eq=True and it will use object’s hash and equality methods, just like for a default class.

As annoying as it is, that is probably not a great idea to begin with. The reason that dataclass won’t generate a custom hash function if the class defines a custom equality and is mutable is that things like sets and dictionaries are implemented with the fundamental assumption that any objects you insert will have hash and equality methods that are compatible with each other and that their results don’t change while the object is in the set/dict. If you insert some of these NPC objects into a set and then change their health values it is entirely possible that you end up with e.g. duplicate copies of these objects in the set or ones missing entirely.

The only really safe way to add mutable objects to sets/dicts is to use the id based equality and hash methods. You can do that by either using plain classes or by setting eq=False. But then you also lose the ability to have a == b mean something other than literal object identity.

You can do that by either using plain classes

Right, I want to use plain classes, and avoid this altogether.

I think you’re reading too much into the name dataclass.

I think what @petercordia said is the common impression of dataclasses; that they are definitionally for storing and accessing data, and you generally are not supposed to imagine them as actors with methods that can mutate their own state and do things in the world. A Vector2D may be a dataclass but a Zombie probably shouldn’t be.

You also can use the arguments to dataclass to specify exactly which dunders you want to have auto generated … You can individually toggle each of them on or off with their corresponding argument

This is a lot of boilerplate, which defeats the original goal of reducing monotonous ceremony. I may as well just write the manual constructor at that point, since we are just replacing one eye sore with another, except much less idiomatic now.

Use unsafe_hash=True then, and live with it.

I still don’t think this is a satisfying solution for the other reasons mentioned.

But this is an attitude problem - Not a problem with the technical solution. The exact behavior you want is reached with @dataclass(eq=False, repr=False, match_args=False) (and I would argue that the last two are unnecessary, having those definitions is fine for debugging capabilities).

You having a strict definition of what a dataclass is supposed to be does not mean that dataclass isn’t the correct tool for this.

auto_constructor = dataclass(eq=False, repr=False, match_args=False)

Here, that’s the complete boilerplate you need to write.

1 Like

Semantics matter too, and many engineers appreciate incorporating attitude. Especially in the Object-Oriented Design world where philosophizing about the conceptual nature of classes and so on, beyond the mere syntactic mechanics, is common and a motivating tenet of the paradigm. This great video on Domain-Driven-Design is almost entirely just philosophizing about how we should approach different sorts of classes in attitude and mindset https://youtu.be/xFl-QQZJFTA

auto_constructor = dataclass(eq=False, repr=False, match_args=False)
Here, that’s the complete boilerplate you need to write.

To clarify, if I don’t want any extra behavior beyond the auto initializer, as if it still were a plain class all else considered, the complete boilerplate would be considerably longer, right? Could you show that one?

Excluding that __dataclass_*__ attributes get set on the class, no, nothing extra is being done here. And I am not sure what harm you would see in those attributes being set.

Indeed. I would argue that dataclass is not supposed to exclusively define such classes, but provide a framework for reducing boilerplate common to such classes.

I agree that one shouldn’t read too much into the name dataclasses. It was a hallway conversation with Guido and a drive-by off the cuff comment from Brett that came up with the name. The name could have just as easily been autoclasses or something.

I use dataclasses all the time when I’ve got a full-blown class and I want some auto-generated methods. That’s how they should be thought of: “please add me some methods because I’m too lazy to type and maintain them myself”. And this was an explicit design goal, and why they don’t use base classes or meta classes: you should be able to use @dataclass on any class, no matter its nature.

3 Likes

Totally agree that dataclasses are the right solution, but it’s worth mentioning that using dataclass breaks cooperative multiple inheritance unless you jump through difficult hoops.

1 Like

Names do matter though. As programmers, names are one of the main ways we communicate our intent.

That’s why many classes have an .all() and .any() method, even if you can technically replace them with .max() and .min()

Like could you imagine if had to call @cached(cache=False) instead of @property? That’s how weird this situation is. You’re calling a decorator that’s named after one of the properties that you’re not using, and then disabling the features that the decorator is named after.
If I did that with a decorator or class I had defined myself, I would not get it accepted into my company’s git repo.

It’s still worth using. And I’m not going to fight to get @dataclass changed. But it feels unfair to accuse OP of an attitude problem.

Only someone who has worked a lot with @dataclass before will intuitively know what it means. Even if they’re an experienced programmer. Which is bad. Unpythonic even. And unnecessary. And y’all should be able to acknowledged that, I think.

There are also solutions with a relatively low cost to this problem. Maybe the cost would still be too high, I’m not in a position to judge that, since I wouldn’t bear the cost. But the manner in which this topic is being shut down seems rather sad and frustrating to me.

Would it be possible to have a poll instead? You could give a 5-start rating to proposals normalised as

  1. I would refuse to update my Python if this got implemented
  2. This would bother me
  3. meh
  4. I would use this
  5. I would help implement this

Then people who are dismissive can vote 3 stars and move on with their lives, and people who care can use the thread to gauge interest and to find each other

1 Like

Polls like this are almost entirely useless. How many people would ACTUALLY refuse to update Python because of one specific change? Have you ever done that, with any program? Similarly, almost nobody will actually help implement it, just because most people aren’t C programmers or aren’t comfortable contributing to the Python standard library. So all you really get is the same “-1”, “-0”, “+0”, “+1” that people sometimes use (borrowed, I believe, from the Apache voting system).

Much better to get actual responses. Polls reduce everything to a simple number and then let you make a decision with a sense that it’s less uninformed - but it isn’t. With actual posts giving arguments for and against, yes, it’s more effort to read through them than to just read off a justifying number, but it actually carries some weight.

1 Like

That’s actually precisely why I think such a poll would be helpful.

I’d expect a distribution where at least 80% of people vote 3 stars. Those people now write 80% of forum posts. And they’re not helpful usually, after the first 3 post written by ‘meh’ people. These first few post often help clarify what someone wants, even for themselves, and point the way to good short-term solutions.
After that though it’s discouraging, frustrating and annoying when 80% of posts are some variation of “I don’t care, therefore on the basis of that evidence, I don’t think it’s worth the cost of implementing” or “I don’t understand what you want or what your problem is, but here’s how you solve a different problem”. Both for readers and for authors I believe. I believe it quite possible a significant portion of the ‘meh’ crowd already realizes their response would basically be spam, and refrains from posting.

In contrast, those people who would suffer actual harm from a proposal deserve to be heard. And people who like an idea encourage each other and the author, and can use a thread like this to find each other and gauge interest.

Finally people who vote 1 star or 5 stars are extremely interesting, presuming they aren’t lying. The former because they would probably act as a veto, if the harm it would cause them would be so severe. The latter because if no-one votes 5 stars a project isn’t getting implemented. Period. No further discussion necessary, even if discussion may still continue for the sake of pleasantry.

At the risk of making this too personal, perhaps the discussion about PEP 671 – Syntax for late-bound function argument defaults | peps.python.org could have gone better if the people who didn’t actually care about late-bound function arguments (ie 3-star voters) had shut up a bit more instead of arguing about alternative ways to implement something they didn’t care about?

1 Like