Simplify class variable initialization by allowing class variable names in member function

Annoyance: init functions often immediately begin by copying the inputs into class member variables

class planet_gravity:

    def __init__(self, r, m):

        # define class variables for each input
        self.r = r
        self.m = m

        # define variables that are functions of the input variables
        self.g = 6.6E-11 * self.m/(self.r** 2)

In addition to extra typing for each input in the init function, this repetition also makes it more difficult to change the name of a variable as there are two versions of it (e.g. r and self.r).

Similarly, when a variable in the init function has two versions (self.variable and variable), one can freely switch between the two. However, if the code is moved to another class member function, it will only work with the self.variable version, making it cumbersome to simply move code from the init function elsewhere as the init function becomes complex.

There exist packages which can automate this process (e.g. dataclasses), but they are often focused on data-only classes or are quite sophisticated. A syntax for simplifying this common coding pattern would be helpful.

Proposal: Allow inputs to class member functions to include the self modifier.

Whenever a variable is inserted into the init function prepended with self, it is syntatic sugar for the following replacement

def __init__(self, ..., self.my_variable, ...):
    pass

Is equivalent to

def __init__(self, ..., my_variable, ...):
    self.my_variable = my_variable

For example, the example at the top of the post could be simplified to

class planet_gravity:

  def __init__(self, self.r, self.m):

      # define variables that are functions of the input variables
      self.g = 6.6E-11 * self.m/(self.r** 2)

One can consider this syntax a variation on the way function inputs traditionally work, storing the input in a variable that can be accessed by the function. With the self keyword, the variable is stored in a class member variable instead of a local variable.

Advantages:

  1. Simplifies init statements
  2. Eliminates multiple variables intended to store the same data
  3. Backward-compatible with current python code as one cannot include self in a function input in this way currently.

General use case:

This syntax is designed primarily for init functions, but could be extended to all class member functions, allowing declarations like this:

def set_my_variable_to_new_value(self, self.my_variable): pass

which would be equivalent to

def set_my_variable_to_new_value(self, new_value):
    self.my_variable = new_value

Explicitly specifying variable names

Given a function with signature,

def my_function(self, input1, input_2, self.my_member_variable=5):,

it must be possible to call the function with explicit names for the inputs. The most obvious approach is like this,

my_class.my_function(
     input1=3,
     input2=2,
     self.my_member_variable=37
)

Interested to hear what everyone thinks. I’ve always wanted something like this in python, but maybe there are good reasons why it would be worse than the existing system.

Also, my apologies if this is an old idea, but I could not find any duplicates of it. I posted part of this idea yesterday, but realized that I had not included a definition for how functions could be called with explicit variables such as f(self.my_variable=3) and have added this to the end of the discussion.

1 Like

I don’t think the proposal is an improvement over using @dataclass with __post_init__.

The use of __post_init__ does have drawbacks, primarily when you’re using frozen classes. I’ve resorted to using factory functions to resolve those problems myself. So I think there is room for improvement.

But my issues with this proposal are twofold:

  1. It is too much magic.
  2. It seems to misunderstand what self is. self is not a magic keyword. self is just the conventional name that the object gets passed into. The following is proper Python:
class C:
    def __init__(it):
        it.a = 1

    def b(itself):
        return itself.a + 1

c = C()
assert c.b() == 2
1 Like

Well, __post_init__ breaks with inheritance - so this is clearly different.

Moreover, the O.P. proposal is universal, not just for dataclasses.

That said, for the time being I am at +0 for the proposal - but if I can’t see a real downside that might change. And competing with dataclass.__post_init__ and being better at it is not really a downside.

1 Like

IMO this would add confusion and not clarity. It is, at best, syntactic sugar that doesn’t provide any tangible benefit.

That’s an interesting point about self not being a key word, which I was not aware of.

When working on this idea, I considered other symbols to represent that a function input should be passed directly to a class member variable, including class_name.myvariable or an entirely separate symbol &my_variable, but self seemed the most obvious.

that doesn’t provide any tangible benefit.

I think that the part of “no tangible benefict”, if you don’t
want to consider it as subjective, is just incorrect.

The wide adoption of dataclasses prove the contrary: eliminating these
plain boiler-plate assignment lines in __init__ code is very welcome and positive.

There definitely is a tangible benefit there.
You can find other arguments if you dislike the purpose.

I myself find it weird for reading, and that’s why I was not immediately in favour of it. But I expect downsides to be real downsides.

For one, similar proposals had shown up from time to time, but the major problem is that they would use a decorator on __init__ , and ther would be no simple way to distinguish parameters that should be assigned to same-named attributes from those which should not. -Not without typing the parameter names again, which would defeat the “don’t repeat yourself” aspect of the proposal nonetheless.

2 Likes

While this might be the most obvious approach, it leaks the implementation (explicitly assigning member variables vs. using the new mechanism) to the API of your class. That speaks against this new approach.

For simple cases, one could already use a decorator, like

class MyClass:
    @store_params
    def __init__(self, intput1, input2, my_member_variable): pass

A possible implementation would be:

Summary
import inspect, functools

def store_params(func):
    s = inspect.signature(func)
    if len(s.parameters) < 1:
        raise TypeError("self parameter expected.")
    first = next(iter(s.parameters))
   
    @functools.wraps(func)
    def wrapped_func(*args, **kwargs):
        ba = s.bind(*args, **kwargs)
        ba.apply_defaults()
        self = ba.arguments[first]
        for k, v in ba.arguments.items():
            if k != first:
                setattr(self, k, v)

        return func(*args, **kwargs)
   
    return wrapped_func
2 Likes

What should happen if a name different to the first parameter is used?

def __init__(self, foo.x, bar.y): ...

SyntaxError? NameError?


What’s the execution order if the name in the parameter list is assigned to again in the function body?
Presumably, the parameter list assignments happen first.


What happens if the name referenced is a descriptor? How does that interact with type hints?


Should the self. part appear in the function signature, for example when using help() or inspect.Signature?

If someone has a clever way to lift dataclass-style implicit initializers into a language feature, I think that would be well worth discussing.

This particular incarnation isn’t it though.

It would be odd (I’d say wrong) to have a signature syntax which is only valid for __init__, so this way of phrasing it doesn’t generalize well.

# what does this do?
def foo(this.x: int = 1):
    pass

I also think this is unclear in terms of what the attribute access means even in the __init__ context. It reads as access, but it really means an assignment statement gets implicitly injected into your function. That’s, IMO, bound to confuse people.

The proposed feature gets a -1 from me, but +1 to people trying to think of ways of making initializers tidier. If there’s a way to do it that works in the broader context of the language, that would be neat.

5 Likes

I’m sorry to enter this thread and to propose something else. But simplifying class variable initialization at the cost of complicating the parameter list in the function definition looks like zero-sum change. The parameter list could be already quite crowded with type hints. For me the main point that should be addressed is tedious repetition when there are several parameters to be copied:

def __init__(self, a, b, c):
    self.a = a
    self.b = b
    self.c = c

My perception of simplification goes more in this direction:

def __init__(self, a, b, c):
    self.{a, b, c} = a, b, c

Again, my apologies, I have not researched previous ideas and corresponding feedback in this regard.

2 Likes

(as for the calling convention the OP argues, I think it is a separate thing - I see no need for that, or even utility)

As for the decorator approach: actually, there is such a decorator: it is called “dataclass” in the stdlib. The problem with the decorator approach is that you can’t mix both parameters which values should be stored directly as attributes with other parameters that will be used in other ways. Neither the dataclass automates that, and any other decorator approach would require the “exception” - either the parameters do be assigned as attributes or the converse - to be explicited again. And if one will type each parameter name twice, the self.parameter = parameter assignment already exists.

This proposal allows for fine-grained selection of the parameters that should should be stored as attributes automatically - and we’d need special syntax for this “fine grained”. - and that is the difference between this and previous discussions on similar topics.

Since we are at “ideas” maybe prefixing the parameter name with a different symbol, since parameter names are not generic Python code, could also work -

...
def __init__(self, @orbit_radius, @mass, is_a_moon):
      self.distance_to_star = self.orbit_radius if not is_a_moon else None

Yes you can:

from dataclasses import InitVar, dataclass, field


@dataclass
class C:
    a: int
    b: int = field(init=False)
    c: InitVar[int]

    def __post_init__(self, c):
        self.b = self.a + c

c = C(1, c=2)

assert repr(c) == "C(a=1, b=3)"

Hope this helps :stuck_out_tongue:

3 Likes

Without reading the whole discussion, my take on the quoted bit is that this is completely missing the mark.

There are not two “versions of a variable” - there are two completely separate values, so it makes sense for them to have different names applied to them. In the example above r marks the value of the argument passed into the __init__ method, and self.r is the value of the attribute r on the current instance.

The described use case is only one of a number of different ways arguments can be treated in a function or method, and I personally don’t think it is as common to vouch for a special-case syntax.

I could phrase it like this: The coder wants to store a variable as a class variable. In practice, they pass the variable to an initializer, which creates a new variable (literal or reference), which the coder then uses to set the class member variable. That middle step is the part this idea would try to get rid of (overlooking, for a moment, the challenges that other posters have raised.)

1 Like

I’m sorry, I don’t understand what do you mean by “that middle step”? In your sentence, the only middle part is “which creates a new variable (literal or reference)”, but that does not describe what is happening.

When you call a function passing an object as an argument, it is assigned another name which only exists in the scope of that function; nothing else gets created.

dataclass is a decorator for the class, not for the constructor and it does much more than just assigning parameters.