Dataclasses - Sentinel to Stop creating "Field" instances

HPou · April 21, 2024, 4:30pm

was addressed above. The attribute is not to be managed by the dataclass machinery. __post_init__ is dataclass machinery and forces the definition of a method which may not be desired.
The value is initialized because it is either addressed during declaration or because someone uses it later in an instance. Initialization in __init__ and/or __post_init__ is exactly what has to be avoided and desired.

Kxnr · April 21, 2024, 4:42pm

I understand (mostly) what the goal is, I’m trying to understand why that’s the goal. Do you have an example of somewhere that the current options cause problems?

Could you clarify? I don’t understand this answer

DavidCEllis · April 21, 2024, 4:44pm

I’d be wary of using a with block if it requires sys._getframe (or the inspect equivalent) to extract __annotations__ from the frame object 1 frame up.

The documentation for sys._getframe does state:

CPython implementation detail: This function should be used for internal and specialized purposes only. It is not guaranteed to exist in all implementations of Python.

HPou · April 21, 2024, 4:52pm

Nobody has stated that there is a “problem”, only that it would be rather convenient to have attributes which are not managed by the dataclass machinery. Why does one need to define an attribute with field=(....) or be forced to mark it with InitVar and address it in __post_init__?

Such an attribute is incredibly useful from a documentation point of view and useful if it can a have a default value.

As Cornelius Krupp pointed out if the first answer:

For me the most common usage is cache-like fields that get computed on demand

You can declare the attribute with a value or without a value. When initialization happens is decided by the user in the code. It can have a default value, given during declaration, or it can have an on demand calculated value. The point is NOT the initialization, that is not being addressed here.

HPou · April 21, 2024, 4:55pm

In the meantime and following Cornelius Krupp suggestion around the @ operator, I have implemented a decorator which supports that functionality.

It is a first draft, a working one, as it does some naive assumption (such as: declared attributes have a default value), but I will work on that. I simply wanted to see how it would work and I have to say that the syntax is rather appealing, a lot more than using the context manager I would say.

#!/usr/bin/env python
# -*- coding: utf-8; py-indent-offset:4 -*-
###############################################################################
from __future__ import annotations
from collections.abc import Callable

import dataclasses

# Imports meant for re-export - ignore non-used and values cannot be determined
from dataclasses import *  # noqa: F403 F401

# Specific imprts for development error-checking
from dataclasses import dataclass as _dataclass, KW_ONLY

import inspect
from typing import overload


__all__ = [
    "at_dataclass",
] + dataclasses.__all__


@overload
def ann_dataclass(cls: None = None, **kwargs) -> Callable[[type], type]:
    ...


@overload
def ann_dataclass(cls: type, **kwargs) -> type:
    ...


class _NO_FIELD_TYPE:
    pass


NO_FIELD = _NO_FIELD_TYPE()


class _NO_INIT_TYPE:
    pass


NO_INIT = _NO_INIT_TYPE()


class _NO_INIT_FACTORY_TYPE:
    pass


NO_INIT_FACTORY = _NO_INIT_FACTORY_TYPE()


ANN_MARKER = "@"
WSPACE = " "
ANN_OPEN = "["
ANN_CLOSE = "]"


def at_dataclass(cls: type | None = None, **kwargs) -> type | Callable[[type], type]:

    # actual decorator for when cls is not None
    def _annotifier(cls: type) -> type:
        # Fetch the annotations using latest best practices with eval_str=True
        # because from __future__ import annotations mey the default in the future
        # print(f"{cls = }")
        no_fields = {}

        for name, annotation in inspect.get_annotations(cls).items():
            if not (type(annotation) is str):
                continue

            try:
                _type, f_ann = annotation.split(maxsplit=1)
            except ValueError:
                continue  # splitting was not possible

            if not f_ann.startswith(ANN_MARKER):
                continue

            if not f_ann[1] == WSPACE:
                _, s_subannotations = f_ann.split(WSPACE, maxsplit=1)
            else:
                s_subannotations = f_ann[1:]

            if s_subannotations[1] == ANN_OPEN and s_subannotations[-1] == ANN_CLOSE:
                subannotations = eval(s_subannotations)
            else:
                subannotations = eval(f"[{s_subannotations}]")

            if NO_FIELD in subannotations:  # remove from annotations
                cls.__annotations__.pop(name)
                no_fields[name] = _type

            elif NO_INIT in subannotations:
                setattr(cls, name, field(init=False, default=getattr(cls, name)))
                cls.__annotations__[name] = _type

            elif NO_INIT_FACTORY in subannotations:
                setattr(cls, name, field(init=False, default_factory=getattr(cls, name)))
                cls.__annotations__[name] = _type

            elif KW_ONLY in subannotations:
                setattr(cls, name, field(kw_only=True, default=getattr(cls, name)))
                cls.__annotations__[name] = _type

        dataclassed = _dataclass(cls, **kwargs)  # apply std dataclass processing

        for name, annotation in no_fields.items():
            dataclassed.__annotations__[name] = no_fields[name]

        return dataclassed

    # decorator functionality when kwargs are used, return real deco (with closure)
    if cls is None:
        return _annotifier  # -> Callable[[type], type]

    # A cls is there, process it
    return _annotifier(cls)  # -> type


# With everything done export ann_dataclass as dataclass
dataclass = at_dataclass


class Dummy:
    pass


# Small test
if __name__ == "__main__":
    from dataclasses import field, fields
    from typing import ClassVar

    @dataclass
    class A:
        cv: ClassVar[str] = "classvar"
        a: int = 5
        e: int @ KW_ONLY = 25
        b: int @ NO_FIELD = 7
        c: int @ NO_INIT = 0
        d: int @ [NO_INIT, Dummy] = 0

    a = A()

    print("=" * 80)
    print(f"{a.__annotations__ = }")

    print("=" * 80)
    print(f"{a.cv = }")
    print(f"{a.a = }")
    print(f"{a.b = }")
    for f in fields(A):
        print("-- " + "-" * 70)
        print(f"{f = }")

    print("-" * 70)

    try:
        b = A(a=1, b=2)
    except Exception as e:
        print(f"Exception: {e = }")

    try:
        b = A(a=1, c=2)
    except Exception as e:
        print(f"Exception: {e = }")

    try:
        b = A(a=1, d=2)
    except Exception as e:
        print(f"Exception: {e = }")

    try:
        b = A(1, 2)
    except Exception as e:
        print(f"Exception: {e = }")

    b = A(1, e=2)

The output of the test cases

================================================================================
a.__annotations__ = {'cv': 'ClassVar[str]', 'a': 'int', 'e': 'int', 'c': 'int', 'd': 'int', 'b': 'int'}
================================================================================
a.cv = 'classvar'
a.a = 5
a.b = 7
-- ----------------------------------------------------------------------
f = Field(name='a',type='int',default=5,default_factory=<dataclasses._MISSING_TYPE object at 0x0000016145B01DD0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)
-- ----------------------------------------------------------------------
f = Field(name='e',type='int',default=25,default_factory=<dataclasses._MISSING_TYPE object at 0x0000016145B01DD0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=True,_field_type=_FIELD)
-- ----------------------------------------------------------------------
f = Field(name='c',type='int',default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x0000016145B01DD0>,init=False,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)
-- ----------------------------------------------------------------------
f = Field(name='d',type='int',default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x0000016145B01DD0>,init=False,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)
----------------------------------------------------------------------
Exception: e = TypeError("A.__init__() got an unexpected keyword argument 'b'")
Exception: e = TypeError("A.__init__() got an unexpected keyword argument 'c'")
Exception: e = TypeError("A.__init__() got an unexpected keyword argument 'd'")
Exception: e = TypeError('A.__init__() takes from 1 to 2 positional arguments but 3 were given')

pf_moore · April 21, 2024, 7:29pm

You can have that right now:

from dataclasses import dataclass

@dataclass
class C:
    x: int
    y: int

    def incrementing_sum(self):
        self.z = getattr(self, "z", 0) + 1
        return self.x + self.y + self.z

c1 = C(2,3)
c2 = C(4,5)
print(c1.incrementing_sum()) # 6
print(c1.incrementing_sum()) # 7
print(c2.incrementing_sum()) # 10
print(c2.incrementing_sum()) # 11

Clearly, that’s not what you mean, but it’s not at all clear what you do mean by your statement. Which is why people are asking for an example of the sort of code you’d like to be able to write, but cannot at the moment.

So yes, can you please give an example of a problem that needs this feature, so that we can understand what you’re asking for?

HPou · April 21, 2024, 7:43pm

I cannot have anything with the example because z is not declared, its value is simply set during the execution of a method. It is indeed not what I mean.

There isn’t a “problem” and nobody has stated that there is one.

Right now:

all declared attributes, except those marked with ClassVar will be considered by the dataclasses machinery

Afterwards:

some of the attributes, those declared as NO_FIELD (or whatever someone ends up with) are also not considered but are not ClassVar. Because the latter is meant not to be modified by instances (from the Python documentation, not that there is an actual restriction)

asdict would return all attributes except those marked as NO_FIELD, which is what I may wish to carry across a socket communication, for example.

The NO_FIELD, use case from Cornelius Krupp, may be a cache value which is not meant to be transported over socket to recreate a state at the other end, but which is desirable to have declared as a class attribute, for consistency, for documentation purposes and which may also have a default value at the start of execution (it may also have no value)

pf_moore · April 21, 2024, 7:49pm

So your requirement is to be able to “declare” the attribute? OK, if that’s what you want then yes, what I showed doesn’t do that. But personally, I don’t think this requirement is important enough to need anything more than the current approach of x: str = field(init=False, ...).

blhsing · April 22, 2024, 1:40am

That doesn’t stop many existing modules in the stdlib from using sys._getframe.

Also, the caller’s frame can be obtained from a traceback object as a fallback if sys._getframe does not exist:

def _getframe(level=0):
    try:
        raise Exception
    except Exception as e:
        frame = e.__traceback__.tb_frame.f_back
        for _ in range(level):
            frame = frame.f_back
        return frame

blhsing · April 22, 2024, 2:43am

Hask Pou:

Would this also be an idea or part of the idea?
@dataclass
class Foo:
    with no_init:
        a: int = 1
In this case the no_init will translate the annotation to:
a: int = field(init=false, default=1)
Imho, having a default value of field(....) is ugly and not straightforward when it comes to understanding what it is, whereas
a: int = 1
is clean and clear and only the scope of the context manager determines that it undergoes a translation to the dataclass expected syntax to avoid having a as part of __init__.

Since the default value of a field that is not declared with field is available simply as the value of the variable, one can turn the default value into a Field object if it isn’t already one, so that its init attribute can be set to False to achieve a no_init transformation:

import sys
from dataclasses import field, Field

class NoInit:
    def __enter__(self):
        self.starting_names = set(sys._getframe(1).f_locals['__annotations__'])

    def __exit__(self, exc_type, exc_val, exc_tb):
        caller_locals = sys._getframe(1).f_locals
        annotations = caller_locals['__annotations__']
        for name in annotations.keys() - self.starting_names:
            if name not in caller_locals:
                default = field()
            elif not isinstance(default := caller_locals[name], Field):
                default = field(default=default)
            default.init = False
            caller_locals[name] = default

no_init = NoInit()

so that:

@dataclass
class Foo:
    a: str = ''
    with no_init:
        b: int = 1
        c: list = field(repr=False, default_factory=list)

print(Foo()) # outputs Foo(a='', b=1)
print(Foo('foo', 2)) # TypeError: Foo.__init__() takes from 1 to 2 positional arguments but 3 were given

This is also how KW_ONLY can be potentially turned into a context manager.

HPou · April 22, 2024, 7:57am

That approach has the attribute as part of the answer given by fields and asdict which is exactly what goes exactly against the wish: declare it but have the dataclass machinery ignore it. Without having to declare it as a ClassVar and fight another war against type checkers, that will complain if the value is set in an instance.

HPou · April 22, 2024, 8:19am

I have implemented a more complete form of the decorator to use @ as a proof-of-concept.

Github Gist: Dataclass with Field Annotations using @

This code

    @at_dataclass
    class A:
        a: int
        b: int @ KW_ONLY = 25
        c: int @ NO_INIT = 5
        d: list[str] @ NO_INIT_FACTORY = list
        e: int @ NO_INIT | Dummy() | Dummy() = 0
        f: int @ [NO_INIT, Dummy(), Dummy()] = 1
        g: int @ NO_FIELD = 7

will translate to this

    @dataclass
    class A:
        a: int
        b: int = field(kw_only=True, default=25)  # or declared after _: KW_ONLY
        c: int = field(init=False, default=5)
        d: list[str] = field(init=False, default_factory=list)
        e: Annotated[int, Dummy(), Dummy()] = field(init=False, default=0)
        f: Annotated[int, Dummy(), Dummy()] = field(init=False, default=1)
        # The NO_FIELD is not managed by `dataclass`
        # That cannot be expressed in the translation
        g: int = 7

DavidCEllis · April 22, 2024, 1:09pm

I haven’t found any stdlib cases where it’s used to pull details from a class while it’s being created, are there any examples of this I’ve missed?

Outside of tests, the most common usages I can find are tools specifically for looking at the interpreter stack, in error handling or for debugging.

The other use case I see is in attempting to retrieve the module name, usually to fix some other internal issue. You can see this already in dataclasses where there’s an attempt to get the module name in order to patch a dynamically created class and make pickle work correctly.

In these cases if sys._getframe doesn’t exist they don’t achieve what they are intended for, but they don’t cause an exception or break the core function of the module.

Ben Hsing:

Also, the caller’s frame can be obtained from a traceback object as a fallback if sys._getframe does not exist:
def _getframe(level=0):
    try:
        raise Exception
    except Exception as e:
        frame = e.__traceback__.tb_frame.f_back
        for _ in range(level):
            frame = frame.f_back
        return frame

I’m not sure I’d consider deliberately raising an exception in order to extract information about the class being created from the traceback an improvement^[1].

logging does have some similar looking code, but I’ll note that that code is accompanied by a pragma: nocover flag. ↩︎

blhsing · April 22, 2024, 2:07pm

David Ellis:

blhsing:
Also, the caller’s frame can be obtained from a traceback object as a fallback if sys._getframe does not exist:
def _getframe(level=0):
    try:
        raise Exception
    except Exception as e:
        frame = e.__traceback__.tb_frame.f_back
        for _ in range(level):
            frame = frame.f_back
        return frame
I’m not sure I’d consider deliberately raising an exception in order to extract information about the class being created from the traceback an improvement[1].

While I can agree that most usages of sys._getframe in the standard library are for non-essential purposes, its use in the logging module to obtain the caller’s frame as well as the alternative implementation using the traceback object is exactly what I was talking about, and I fail to see why you consider catching a raised exception to extract the current frame to be a bad idea when the code uses only publicly documented features available to all implementations of the language:

github.com/python/cpython

Lib/logging/init.py

287d939ed


      
              """
              with _lock:
                  _levelToName[level] = levelName
                  _nameToLevel[levelName] = level
          
          if hasattr(sys, "_getframe"):
              currentframe = lambda: sys._getframe(1)
          else: #pragma: no cover
              def currentframe():
                  """Return the frame object for the caller's stack frame."""
                  try:
                      raise Exception
                  except Exception as exc:
                      return exc.__traceback__.tb_frame.f_back
          
          #
          # _srcfile is used when walking the stack to check when we've got the first
          # caller stack frame, by skipping frames whose filename is that of this
          # module's source. It therefore should contain the filename of this module's
          # source file.
          #

By the way, one can also use sys.setprofile to obtain the caller’s frame, though I consider the traceback approach to be cleaner:

import sys

def _getframe(level=0):
    def profiler(frame, event, arg):
        nonlocal caller_frame
        caller_frame = frame.f_back
    caller_frame = None
    current_profiler = sys.getprofile()
    sys.setprofile(profiler)
    (lambda: 1)()
    sys.setprofile(current_profiler)
    for _ in range(level):
        caller_frame = caller_frame.f_back
    return caller_frame

Also note that the magical super() also uses the equivalent of sys._getframe in its C implementation to obtain the caller’s frame.

DavidCEllis · April 22, 2024, 3:44pm

Ben Hsing:

DavidCEllis:
blhsing:
Also, the caller’s frame can be obtained from a traceback object as a fallback if sys._getframe does not exist:
def _getframe(level=0):
    try:
        raise Exception
    except Exception as e:
        frame = e.__traceback__.tb_frame.f_back
        for _ in range(level):
            frame = frame.f_back
        return frame
I’m not sure I’d consider deliberately raising an exception in order to extract information about the class being created from the traceback an improvement[1].
While I can agree that most usages of sys._getframe in the standard library are for non-essential purposes, its use in the logging module to obtain the caller’s frame as well as the alternative implementation using the traceback object is exactly what I was talking about, and I fail to see why you consider catching a raised exception to extract the current frame to be a bad idea when the code uses only publicly documented features available to all implementations of the language:

It’s not so much that I consider it a “bad idea” as I consider it to be a work-around. You’re causing an error in order to extract some information you normally wouldn’t have access to.

Out of CPython, PyPy, GraalPy, IronPython^[1] and MicroPython the only one I know that doesn’t support sys._getframe is MicroPython, under which exc.__traceback__ is also an AttributeError. So in the only implementation I’m aware of where this function is unavailable, the replacement wouldn’t work^[2]. I didn’t manage to find an implementation that supports one but not the other.

I think the subtler issue I have is that I don’t expect a context manager to be managing attributes outside of itself, based on location in the stack.

If it’s in the C code then I’d consider it to be covered by “internal and specialized purposes”.

The collections module has a comment that implies _getframe doesn’t work for level > 0 but this seems to be outdated as it worked on testing. ↩︎
I know MicroPython doesn’t support dataclasses ↩︎

NeilGirdhar · April 22, 2024, 6:56pm

Some thoughts:

There should be one way to do things. Proposing an alternative way of doing something that’s already possible is only beneficial if your alternative is significantly better than what we have. I personally don’t think that annotating the type is significantly better than calling the field function.
If you really want _: NO_FIELD, then please call it NO_INIT to match the parameter to field. It makes sense, but I think it would help to show how prevalent this pattern is. Have you measured this?
I find NO_INIT_FACTORY very confusing to read. Code should be easy to read even if it takes longer to type.

MegaIng · April 22, 2024, 7:44pm

But init=False has different behavior. The behavior as described by OP currently cannot be recreated without defining __post_init__ or __init__ and adding the annotations within those functions, where static type checkers find it, but dataclasses doesn’t (or one can completely misuse ClassVar, which static type checkers would still complain about). Therefore the suggestion to name it NO_INIT is completely missing the point. NO_FIELD might not be a good name, but NO_INIT is a worse one.

The following currently throws an error:

from dataclasses import dataclass, field, asdict

@dataclass
class Foo:
    bar: int = field(init=False, repr=False, compare=False)

print(asdict(Foo()))

Because bar is being looked up on the instance. A good proxy for what OP wants is that however bar is changed to be annotated, it should no longer throw an error because the dataclasses machinery completely ignores it.

mikeshardmind · April 22, 2024, 8:26pm

This seems like a really niche case that should probably be outside of the scope of dataclasses. If it’s not a a good fit for dataclasses, simply don’t use dataclasses, not all behaviors need to be crammed into this.

You can keep the non-data behavior in another class and compose it with multiple inheritance, this works to exclude from dataclass machinery:

>>> from dataclasses import dataclass, asdict
>>>
>>> @dataclass
... class X:
...     x: int = 1
...
>>> class Y:
...     y: int = 2
...
>>> class Z(X, Y):
...     pass
>>> asdict(Z())
{'x': 1}

HPou · April 22, 2024, 9:17pm

I could also do

@dataclass
class A:
    a: int = 5

A.b = 7
A.__annotations__['b'] = int

and I save myself the trouble and added step of inheritance. Does it cover what is proposed? No. The same as using InitVar and __post_init__ do also not, using ClassVar does also not and creating the attribute dynamically in a method does not.

The proposal may or may not have merit, but the fact that one can add attributes to a class or an instance at some point in time after declaration is for sure not an argument against it.

One can add an attribute to the class even without knowing it is a class and which class it is if it’s one.

This is about “declaring” that my “dataclass” has an attribute (or going to have) which is not a ClassVar and is to be ignored by the dataclass machinery because it’s not relevant to recreate the object, as expressed above, after having transmitted the object stated over a socket.

It is not going to be “crammed” into dataclasses, because dataclasses have to ignore the NO_FIELD attributes as they ignore ClassVar attributes.

It’s about being explicit about it and not relying on multi-step workarounds to create an attribute.

HPou · April 22, 2024, 10:05pm

It is not possible
NO_FIELD is not “field(init=False, …)”
Call it NO_INIT_DEFAULT_FACTORY if you wish