How to know which parameters were explicitly given to a dataclass?

I am looking for a design for the following problem. There is an executable with dozens of command line options. This executable is called from within Python using the subprocess module and the user can specify the options. The specified options will be printed. Some of the options are obligatory, others are optional.

Since there are many options, I thought of using a dataclass to store them. Here is a prototype:

@dataclass(frozen=True)
class Settings:
    """Settings for the executable

    The following options are available:

    setting1:    required
    setting2:    optional
    ...
    """
    setting1: int
    setting2: bool = True
    # dozens of other options follow ...

    def command_line(self) -> str:
        """Form command line options from the settings

        Returns:
            str: Command line arguments
        """
        cmd = ""
        # Pseudocode
        # if settings2 was explicitly provided by the user
        #     cmd += ...
        # if settings3 was explicitly provided by the user
        #     cmd += ...
        return cmd

I only want to get the arguments explicitly given by the user. The problem is that once the construction of the dataclass object finishes, we don’t know e.g. if settings2 was provided by the user or was filled in by the dataclass. As opposed to __post_init__, there is no pre-init method in dataclass. If I define the __init__ method of Settings with dozens of parameters in the parameter list, to see which parameters are actually given by the user, it defeats the whole purpose of using a dataclass.

Do you have any ideas? Or other solution than a dataclass? In the worst case, I will create a standard class with dozens of parameters in its __init__, but I am wondering if there is a simpler way. Even better if it interacts well with the autocompletion of IDEs (e.g. avoid putting optional arguments in `**kwargs`` as it won’t display the optional arguments).

Use None as the default, i.e. a value clearly distinct from all valid options.


But I think you have a fundamental conceptual flaw here: Why do you care if the user wrote setting2=True or left if off completely? That should not make a semantic difference - if it does, your design is incorrect.

1 Like

For the representation of setting2 in the dataclass, it is not important if the user provided it or was created by the dataclass. However, it is important when printing out in the log file which options were specified.

Genuinely: What is far more interesting is what settings were being used, including the defaults (e.g. incase defaults change). If you want to keep track of user input, do that at the place where the user input was received, not after it was already parsed into a dataclass.

The user will hardly find an eventual bug if all the settings are printed (note that there are a lot). Most often he only needs to provide a few options, as the rest have meaningful default values.

I agree that it must be done before the initialization of the dataclass has finished. But where to do it if not in the __init__?

Outside the dataclass. I don’t think logging user inputs is the dataclasses job.

1 Like

Me neither, that’s why I thought of simply returning the user-given options in Settings.command_line and the logger will print the text.

What about creating a function that the user interacts with, and which acts as a factory?

def settings(setting1: int, setting2: Optional[bool]=None) -> tuple[Settings, list[str]]:
    provided_options = ['setting1']  # initialize with all the required arguments
    if setting2 is None:
        setting2 = True  # provide the default value
    else:
        provided_options.append('setting2')
    return Settings(setting1, setting2=setting2), provided_options

Then Settings.command_line can be removed, it becomes a standalone function, which will take provided_options returned from settings.

You could wrap the dataclass and replace its __init__ function with one that writes *args and/or **kwargs to private attributes.

Something like:

from dataclasses import dataclass
import functools

def wraps_class_init(cls):
    dc_init = cls.__init__
    @functools.wraps(dc_init)
    def new_init(self, *args, **kwargs):
        # Write to __dict__ to handle frozen dataclass
        # Will not work on slotted classes
        self.__dict__["_user_args"] = args
        self.__dict__["_user_kwargs"] = kwargs
        dc_init(self, *args, **kwargs)
    cls.__init__ = new_init
    return cls

@wraps_class_init
@dataclass(frozen=True)
class Example:
    answer: int = 42
    name: str = "Arthur"

ex1 = Example()
ex2 = Example(54, name="Zaphod")

print(ex1, ex2)
print(ex1._user_args, ex1._user_kwargs)
print(ex2._user_args, ex2._user_kwargs)
Example(answer=42, name='Arthur') Example(answer=54, name='Zaphod')
() {}
(54,) {'name': 'Zaphod'}

Edit: **kwargs in this case should be fine as the @dataclass decorator will most likely make the IDE think that the original __init__ is still in place.

1 Like

I like your wraps_class_init to monkey-patch the __init__ method. And you are right, the IDE integration works fine! As I want to store the name of the option, I will restrict myself to kwargs, eliminating args.
Do you know if the problem in my post has a general name and if your solution is a known design pattern? I have the feeling that I am not the first who wanted to achieve this.

What did you mean by this?

Frozen dataclasses work by making __setattr__ raise a FrozenInstanceError. If I had used self._user_kwargs = kwargs it would fail. Writing directly to the instance dict gets around that - it’s also how the actual dataclass __init__ function works for non-slotted frozen classes.

Edit: If the class was slotted it would also need a slot for _user_kwargs and to use object.__setattr__ directly, as I believe dataclasses does in the frozen, slotted case.

Did you create this wrapper instead of implementing this functionality in Example.__init__ to make the dataclass Example independent of the feature of storing args and kwargs?

It’s written so it will work on any dataclass without slots.

In fact it should work on any regular class with __init__ that doesn’t define __slots__, although for normal classes I would probably write the decorator to directly wrap __init__ directly. This can’t be done for dataclasses as the __init__ method doesn’t exist until the decorator runs.

Putting anything in the class body for __init__ would prevent dataclasses from writing its own __init__ function.

there is no pre-init method in dataclass

There’s __new__. That’s basically what you are looking for. It can accept kwargs and store it on the object before init is called

How does it compare with David Ellis’ solution?

@dataclass
class Settings:
  v1: str
  v2: int = 0
  v3: str = 'default'

  def __new__(cls, *args, **kwargs):
    self = super().__new__(cls, *args, **kwargs)
    self.__dict__.update(_user_args=args, _user_kwargs=kwargs)
    return self

2 Likes

Using __new__ works, but one downside over wrapping __init__ is that some tools will now see the signature of __new__ either instead of or alongside __init__.

For example, inspect.signature of the class will show the __new__ signature:

  • inspect.signature(Settings) shows (*args, **kwargs)
  • inspect.signature(Example) shows (answer: int = 42, name: str = 'Arthur') -> None

This is shown in things like help() interactively.

Another option would be to intercept call args in the class’ __call__ method so it won’t matter whether the constructor defines parameters through __new__ or __init__:

from dataclasses import dataclass

class RecordArgsMeta(type):
    def __call__(cls, *args, **kwargs):
        obj = super().__call__(*args, **kwargs)
        vars(obj)['_called_with'] = args, kwargs
        return obj

@dataclass(frozen=True)
class Settings(metaclass=RecordArgsMeta):
    setting1: int = 0
    setting2: bool = True

print(Settings(setting2=True)._called_with) # ((), {'setting2': True})

The __call__ method can be patched through a decorator instead of a meta class if preferred.

1 Like

This still has the issue of losing the __init__ signature on inspection. inspect.signature will return the signature of __call__ in the same way that the other example would show the signature of __new__.

1 Like

Good catch. Fixed then:

from functools import wraps
from inspect import signature
from dataclasses import dataclass

class RecordArgsMeta(type):
    @wraps(type.__call__)
    def __call__(cls, *args, **kwargs):
        obj = super().__call__(*args, **kwargs)
        vars(obj)['_called_with'] = args, kwargs
        return obj

@dataclass(frozen=True)
class Settings(metaclass=RecordArgsMeta):
    setting1: int = 0
    setting2: bool = True

print(Settings(setting2=True)._called_with) # ((), {'setting2': True})
print(signature(Settings)) # (setting1: int = 0, setting2: bool = True) -> None