Add automatic constructor for classes

Monarch · July 18, 2024, 4:59pm

I wasn’t aware of this, thank you!

But this adds another thing a user has to now be aware of just to get a simple class constructor. typing.dataclass_transform also seems to have its own arguments that you need to keep in sync with your decorator. Having the out of box support of an stdlib decorator still seems like a significantly better experience

DavidCEllis · July 18, 2024, 5:41pm

Yes, typing.dataclass_transform works in some IDEs but not all, and not completely - although that’s largely their problem to solve^[1].

But if needing to know about and use that is enough reason to create an alias for a basic __init__ and __repr__ generating decorator then surely it’s also useful to those wanting aliases for frozen=True or slots=True or other commonly modified parameters. I think it’s more flexible to make it easy for users to create the appropriate aliases for their use case rather than just blessing specific sets of defaults.

PyCharm, for instance doesn’t yet support the base class form of class generation that @dataclass_transform is supposed to allow for. (I get a lot of nice yellow squiggles where I’ve used my own implementation that uses a base class, despite @dataclass_transform). ↩︎

ncoghlan · July 19, 2024, 2:41am

The generated dataclass repr only makes sense for value types (since different instances with the same field values share the same representation).

If the class instances retain identity based comparison semantics, they should also retain the default identity based repr.

Hence my suggestion of autoinit as the baseline decorator for just the initialiser shorthand, with no assumptions around the value semantics.

I do like the idea of an underlying ClassBuilder type that summarises the options that have been defined, though.

Monarch · July 19, 2024, 4:49am

That makes sense to me! I would love to have it

petercordia · July 19, 2024, 7:05am

You would still be able to use Classbuilder as

@ClassBuilder(init=True, repr=True)
class MyClass()
    ...

right?

That’s probably how I would use it most of the time.

Can you conceive of a situation where it would be harmful to have an automatically generated __init__ when you haven’t defined __init__ yourself? I can’t, but then I didn’t see the problem with autorepr either until it was explained.

ChrisBarker-NOAA · July 19, 2024, 7:13am

well, not really – what __repr__ usually is essentially the call signature filled in, i.e.

eval(repr(obj)) == obj

So it should match the __init__, which is what dataclass does. So all good.

However: as for this idea – I can see why folks don’t want to write the boilerplate – but for classes that aren’t mostly “data containers” I usually need a lot more logic in the __init__, and __eq__, etc, so having a simple decorator that only generates an __init__ doesn’t seem that useful – and it means you have to put your “real” logic in __post_init__() – which is not very idiomatic.

What might be nice for this use case is more magic way to do:

class Something:
    def __init__(self, param1, param2, param3,...)
        add_all_params_to_self(self)

I’m sure it’s possible – this is Python after all.

BTW: please don’t try to design Python to make it easier for Type Checkers and IDEs – that way lies a statically typed language

pythoncontributor837 · July 19, 2024, 9:41am

The ability for users to create their own better versions of the defaults provided is rarely a compelling argument on it’s own against improving the defaults. Because there is a cost in deviating from standards; I have to make an alias, import it in every file, in every codebase I work on in the future, and burden everyone who reads my code with the task of familiarizing themselves with my proprietary constructs instead of them taking advantage of the transferable knowledge and fluency they’ve acquired over years of using the defaults; Further, it means I’m setting as a precedent in my codebase that this is the sort of thing I do anytime I think my version of the standard is slightly better. Am I just going to create an alias every single time I dislike the way something was designed in Python? If not, then why this one? It’s a matter of principle. At a certain point, you have to accept the language for what it is, and use it’s unsatisfying idiomatic defaults, for the overall long-term scalability and accessibility to the rest of the community. I will accept the rough edges of the language because I understand that’s the most mature course of action. But, it would be great if we would just smooth out those edges for everyone by improving the defaults, and not put that burden onto users who realistically will not because they are disincentivized to do so.

pythoncontributor837 · July 19, 2024, 9:57am

I’m not sure what to make of doom-and-gloom predictions about Python adoption when it is already well ahead of the languages it’s being compared to in this post.

I was referring to the constructs being widely used, not the language. I would like a more concise way to implement constructors to become widespread in Python, and not just a strange alias that one person on the planet has in their obscure codebase that confuses all onlookers.

project-specific decorator that specializes @dataclass the way you want sounds like the right approach here.

Why couldn’t providing a more concise way to write constructors by default be the right approach? I get if this is low on the priority list, but I’m sure at some point we could get this done? For the reasons I mentioned in my last reply to @DavidCEllis, I think the recommendation of locally defined aliases is effectively saying “we are not going to improve this aspect of the language, sorry. Here is a bandaid fix you can use, if you are comfortable with the cost of having bandaids eveywhere in your codebase (most users are not)”.

pythoncontributor837 · July 19, 2024, 10:18am

I would be happy with a standalone @autoinit decorator.

I do have to ask out of curiosity - and I know this is just the way python has chosen to do things (ABC, Protocol, typing etc.) that I live to live with - but why is an import necessary at all? The @property decorator is builtin and accessible without any import. Presumably we could do the same with @autoinit?

dg-pb · July 19, 2024, 11:11am

dataclass import is probably the best what can be expected for a start. It might end up in builtins if time proves that it is super mega useful.

Of course, I might be wrong, but in my opinion, a proof that it would be used as frequently as property is a bare minimum to start conversation of its inclusion to builtins straight away.

But why not? If this is being included, why not put work to come up with a version that has a good flexibility to simplicity ratio?

Although I see the argument that one-to-one arg_name->attr_name it is the most common, I am not convinced. E.g. This would be much more useful to me:

from dataclasses import autoinit

class A:
    @autoinit(a='_a')
    def __init__(self, a, b):
        pass

    @property
    def a(self):
        return self._a

a = A(1, 2)
a.a  # 1
a._a # 1
a.b  # 2

ericvsmith · July 19, 2024, 11:30am

Remember that @dataclass won’t add a __init__ or __repr__ if you supply one:

>>> from dataclasses import dataclass
>>> @dataclass
... class Foo:
...   x: int
...   def __init__(self, x):
...     self.x = 2 * x
...
>>> f = Foo(1)
>>> f
Foo(x=2)
>>> @dataclass
... class Bar:
...   x: int
...   def __repr__(self):
...     return "I'm a Bar"
...
>>> b = Bar(1)
>>> b
I'm a Bar

So “I want __init__, but not __repr__ because I’m going to supply my own” already works. I realize this doesn’t solve all problems, but it’s worth remembering.

I think this is true of all methods, but would have to check.

ericvsmith · July 19, 2024, 11:34am

I agree with the general sentiment, but once we’ve shipped it we can’t change the defaults. This work needs to go into the up front design.

ericvsmith · July 19, 2024, 12:01pm

A bunch of reasons. It would probably have to be written in C, which is a hassle and not worth it when dataclasses was new. And then it couldn’t be used in other implementations. Maybe if @autoinit only generated __init__, not other methods, and didn’t deal with thing like field, default_factory, etc., it would be tractable.

But I use other things, like itertools, way more often than I’d use @autoinit: it’s just how Python is. If we had it to do over, I’d probably argue that @property should require an import.

DavidCEllis · July 19, 2024, 12:09pm

The import is technically necessary because the decorator is implemented in Python^[1] in the dataclasses.py module. The basic behaviour and concept (analysing annotations, generating source code and then exec on the result) are largely inherited from the attrs project others have mentioned.

If the current implementation (or something based on it) was imported by default at startup and included in builtins it would be a significant regression in Python startup time. There have been previous discussions about having something more ‘baked-in’ both for startup and class generation performance, but no consensus on what that should look like.

Personally I would disagree on this being an improvement over autogenerated __repr__ and __eq__.

It’s fairly easy to leave out a __repr__ when defining classes by hand so including one ‘by default’ means that it’s more likely I can see useful information in tracebacks where other people have used @dataclass. __eq__ by default I find most useful when writing tests on functions that have dataclasses as output - the only downside being that by default the classes are not hashable (as with any class with __eq__ defined without __hash__).

I don’t consider locally defined aliases to be a band-aid. I consider it to be customising the tool for your use case.

I actually somewhat wish dataclasses was more modular so it was easier to modify the workings. The collection of attributes and generation of methods all happens within _process_class so it’s not really possible with dataclasses to separate analysis from generation.

I’d love to - for instance - have a metaclass that generated slots before a class was created^[2] and put the necessary analysis details somewhere else and allowed dataclasses to write the methods based on the pre-gathered details.

As opposed to being implemented in C ↩︎
dataclasses, by virtue of being a decorator has to create an all new class in order to place slots which has some additional complications if anything has already made a reference to the original class. ↩︎

oscarbenjamin · July 19, 2024, 12:45pm

This is why I don’t use dataclasses personally. Each dataclass adds a millisecond to startup time. When you have a lot of classes that adds up to noticeable slowdown.

As for the discussions above about alternative decorators I prefer the way that the derive approach in Rust reads over Python’s dataclasses. I think that what dataclasses do would be more easily understood if it was more like:

@derive('init', 'eq', 'repr', 'hash')
class Foo:
    ...

It is easier to understand what is happening if all options are off by default and you enable whichever sets you want rather than having to enable some and disable others or having some enabled/disabled implicitly.

ericvsmith · July 19, 2024, 1:03pm

I sped this up by about 20% in 3.13. It’s still not awesome because of the expense of dynamically generated code, but it’s a nice improvement.

ncoghlan · July 19, 2024, 1:32pm

Chris Barker:

well, not really – what __repr__ usually is essentially the call signature filled in, i.e.
eval(repr(obj)) == obj
So it should match the __init__, which is what dataclass does. So all good.

The quoted equivalence rule only applies for value types, where two objects are considered equal if they were created from the same arguments, even if their identities are different.

The default equality rule defined by object is based on identity: even if two objects have the same field values, they’re still not considered equivalent (e.g. type instances work that way, hence why reloading modules can give you multiple distinct runtime types with the same name and other identifying information). The default __repr__ implementation in object reflects that.

DavidCEllis · July 19, 2024, 2:17pm

I generally don’t have a huge issue with the actual generation of the classes (although improvements are very welcome), the bigger issue I have is that the import time itself is fairly significant. I can generate a decent number of classes in the time it takes for dataclasses to import. Perhaps this can be remedied somewhat once the new form of deferred annotations is implemented?

At one point I had a dataclass-like tool that worked on the AST and had an import hook to write the class in-place in the .pyc files but it was a nightmare to maintain and cumbersome to use (you had to remember the import hook). I’ll accept some performance cost to avoid maintaining that.^[1]

Did lead me to find a bug in importlib though (since fixed). ↩︎

pythoncontributor837 · July 19, 2024, 2:21pm

I don’t think this is my use case. A more concise constructor for classes is a general feature that would be applicable to the vast majority of classes written be users. There’s nothing particular about my codebases or project domains that motivate me wanting this feature other than that I write classes a lot; I simply don’t want to write repetitive boilerplate hundreds of times a year. That the three modern languages I mentioned decided to implement this as builtin syntax demonstrates it’s general ergonomic appeal.

jamestwebber · July 19, 2024, 2:26pm

Is there data to support this?

Personally, if I’m not using a dataclass I’m almost certainly doing something in the constructor. So an automatic “assign inputs to attributes” method would be useless for me, unless I could wrap it.

Other languages have similar tools, but those languages have different idioms from Python, so it’s not self-evident that it makes sense to adopt this here. Sometimes programming languages are different for a reason.