Dataclasses - Sentinel to Stop creating "Field" instances

HPou · April 16, 2024, 10:13pm

I was wondering whether the following would be a good idea

from dataclasses import dataclass, field, KW_ONLY, NO_FIELD

@dataclass
class Example:
    a: bool = True
    _: KW_ONLY
    kw_a: bool: = False

    b: str = field(init=False, default="hello")

    _: NO_FIELD
    c: str = "hello"

After NO_FIELD the coder can define as many class attributes as needed without the attributes being considered for __init__ (and actions liks as_tuple)

After having crafted many dataclasses I often find myself typing x: str = field(init=False, ...) and having to specify field( always feels like something out of place for attributes I want to have defined in the class but not managed by the dataclass machinery.

That was my first idea seeing how KW_ONLY is already a sentinel with an equivalent function for keyword arguments.

Something else which has been in my mind would go along this lines

from dataclasses import dataclass, field, KW_ONLY, NO_FIELD
from typing import Annotated

@dataclass
class Example:
    a: bool = True
    _: KW_ONLY
    kw_a: bool: = False
    b: str = field(init=False, default="hello")
    c: Annotated[str, NO_FIELD] = "hello"

Using Annotated would allow the definition of attributes anywhere and still be clear. Incidentally, this could also be applied to KW_ONLY avoiding the definition of _: KW_ONLY, like this.

from dataclasses import dataclass, field, KW_ONLY, NO_FIELD
from typing import Annotated

@dataclass
class Example:
    a: bool = True
    kw_a: Annotated[bool, KW_ONLY] = False
    b: str = field(init=False, default="hello")
    c: Annotated[str, NO_FIELD] = "hello"

with what seems a more compact notation which does not sacrifice readability. I would even argue it improves readability by being explicit in what the character of kw_a is, a keyword only argument.

Futhermore, no-init fields with a simple default value could also be added to this scheme and probably even for default_factory ones

from dataclasses import dataclass, field, KW_ONLY, NO_FIELD, NO_INIT, NO_INIT_FACTORY
from typing import Annotated

@dataclass
class Example:
    a: bool = True
    kw_a: Annotated[bool, KW_ONLY] = False
    b: Annotated[str, NO_INIT] = "hello"
    c: Annotated[str, NO_FIELD] = "hello"
    d: Annotated[list, NO_INIT_FACTORY] = list

Just an idea around dataclasses for your consideration guys.

Best regards

MegaIng · April 16, 2024, 10:21pm

Big plus one on this being something that should be added somehow. For me the most common usage is cache-like fields that get computed on demand but should be stored.

Using _: <name> for both doesn’t work AFAIK because the later overrides the earlier in the resulting __annotations__ dict.

With regard to Annotated, I think the syntax currently is ugly, but by other proposal would help with that:

@dataclass
class Example:
    a: bool = True
    kw_a: bool @ KW_ONLY = False
    b: str @ NO_INIT = "hello"
    c: str @ NO_FIELD = "hello"
    d: list @ NO_INIT_FACTORY = list

It would also convenient encompass the existing usage of KW_ONLY.

For Annotated you are probably going to here the counter argument that type checkers shouldn’t care about it. IMO, this is not a hill worth fighting over. Instead the goal should be to design an alternative that everything is happy with.

HPou · April 16, 2024, 10:30pm

I didn’t know about this @ proposal. It seems compact and very explicit. It further improves readability.

My personal taste would favour @KW_ONLY instead of @ KW_ONLY but I can for sure get used to it too.

HPou · April 16, 2024, 10:49pm

And after having read about the @ proposal and PEP-727, I come to the conclusion that one could envision this

@dataclass
class Example:
    a: bool = True
    kw_a: bool @(KW_ONLY, Doc("kw_a is a keyword_argument)) = False
    b: str @ NO_INIT = "hello"
    c: str @ NO_FIELD = "hello"
    d: list @ NO_INIT_FACTORY = list

or with square brackets which is usually applied for typing

@dataclass
class Example:
    a: bool = True
    kw_a: bool @[KW_ONLY, Doc("kw_a is a keyword_argument")] = False
    b: str @ NO_INIT = "hello"
    c: str @ NO_FIELD = "hello"
    d: list @ NO_INIT_FACTORY = list

Just a quick idea.

MegaIng · April 16, 2024, 10:50pm

My idea would be to just chain ´@´: kw_a: bool @ KW_ONLY @ Doc("kw_a is a keyword_argument) = False. That IMO looks cleaner.

Kxnr · April 16, 2024, 11:34pm

If I understand correctly, there are two pieces here:

have attributes on a data class that aren’t set through __init__
have attributes that aren’t fields on the dataclass

I think both of these are possible already through @property, InitVar or __post_init__:

from dataclasses import dataclass, InitVar

@dataclass
class Test:
    a: InitVar[int] = 10

    def __post_init__(self, a: int) -> None:
        self.b: str = "abc"

    @property
    def c(self) -> float:
        return 1.0

Test()

None of the attributes a, b, or c are represented in astuple, asdict, or fields. The semantics aren’t exactly the same as a proposed NO_FIELD, but it seems like the resulting behavior is the same. Is there a use case that isn’t covered by one of these options that I’m missing?

Admittedly I don’t think I’m in the target audience for a feature like this; if I needed to consistently set field(init=False), I’d probably reach for a vanilla class rather than a dataclass.

HPou · April 17, 2024, 8:10am

A property is not an attribute, it is a descriptor. To fully replicate an attribute I would have to implement 3 methods: getter, setter and deleter as opposed to a 1-line definition.

InitVar is clearly defined as “pseudo-field” and won’t be defined as an attribute. Instead of simply defining an attribute in 1-line I would have to implement __post_init__ and manually set the attribute.

The NO_FIELD idea is about avoiding things like field(init=False, ...), InitVar and similar constructs which dont encompass having an attribute defined but which is not managed by the @dataclass` machinery.

See, it may not be a good idea after all, but I am sure that InitVar and property are no substitutes for the idea.

HPou · April 17, 2024, 8:22am

Being the last to the party, I wouldn’t argue against chaining, but for readability I would see it under a different light: @ is the introducing factor and something like | is the separator.

@dataclass
class A:
    a: str @ NO_INIT | Doc("Non-init attr") | SOMETHING_ELSE | "Another Annotation"

chepner · April 17, 2024, 12:16pm

Hask Pou:

I was wondering whether the following would be a good idea

from dataclasses import dataclass, field, KW_ONLY, NO_FIELD

@dataclass
class Example:
    a: bool = True
    _: KW_ONLY
    kw_a: bool: = False

    b: str = field(init=False, default="hello")

    _: NO_FIELD
    c: str = "hello"

If it’s not a field, what is it? If it’s a class attribute, we already have

c: ClassVar[str] = "hello"

I don’t see the need for special syntax to implicitly define a number of class variables less verbosely.

DavidCEllis · April 17, 2024, 12:40pm

Hask Pou:

I was wondering whether the following would be a good idea
from dataclasses import dataclass, field, KW_ONLY, NO_FIELD

@dataclass
class Example:
    a: bool = True
    _: KW_ONLY
    kw_a: bool: = False

    b: str = field(init=False, default="hello")

    _: NO_FIELD
    c: str = "hello"
After NO_FIELD the coder can define as many class attributes as needed without the attributes being considered for __init__ (and actions liks as_tuple)

This actually won’t work because you replace the original annotation KW_ONLY with NO_FIELD.

import inspect
from dataclasses import KW_ONLY, field

class NO_FIELD:
    ...

class Example:
    a: bool = True
    _: KW_ONLY
    kw_a: bool = False

    b: str = field(init=False, default="hello")

    _: NO_FIELD
    c: str = "hello"

print(inspect.get_annotations(Example))

{'a': <class 'bool'>, '_': <class '__main__.NO_FIELD'>, 'kw_a': <class 'bool'>, 'b': <class 'str'>, 'c': <class 'str'>}

As this is how dataclasses gathers the information needed to build the class the result would be that everything after _: KW_ONLY would actually be declared with whatever NO_FIELD did.

HPou · April 17, 2024, 1:02pm

That was already pointe out above. Obviously one could also choose “__: NO_FIELD” for example.

HPou · April 17, 2024, 1:06pm

From the Python documentation:

As introduced in PEP 526, a variable annotation wrapped in ClassVar indicates that a given attribute is intended to be used as a class variable and should not be set on instances of that class.

A ClassVar should not be set. But something without the ClassVar declaration is “open” to be set. The goal is not to have it managed by the dataclass machinery. I.e:

Not part of __init__
Also not part of __post_init__
Not having to use field(...)
Not present as a result in fields
Not present as a result in asdict, astuple

DavidCEllis · April 17, 2024, 2:38pm

I must have missed that, but I’d note that while you could choose to use __ there is easily the potential to clash the names and if you added any further annotations like this you’d end up with slowly increasing levels of _____ which isn’t ideal. If the names do clash accidentally this can lead to confusing errors as @dataclass has no way to know this has happened.

So you’re looking for a type hint that is essentially equivalent functionally (ie: ignored by dataclasses) to what ClassVar does, but doesn’t imply that the attribute is a class variable not to be set on instances? Similar to not putting in any annotation, but that you still want type checked somewhere?

I’m somewhat curious both what the intended use case is and where the value is being assigned if not in __init__ or __post_init__.

I can understand the desire to have fields that are excluded from asdict and astuple. Currently I think you’d have to define a value in metadata and write your own asdict function that checked.

With regard to the use of values in Annotated instead of arguments to field you can try something like this now with a preprocessor that reads the annotations and generates Field values for you based on the annotations.^[1] Example implementation if you wish to experiment. Caveat: this will likely cause linter complaints, but linters would almost certainly need special casing if dataclasses did implement something like this.

I don’t actually like using annotations for this purpose though so it’s not something I’d be in favour of adding to dataclasses itself, especially as dataclasses doesn’t currently force evaluation of string annotations.

NO_FIELD functionality could potentially work as a wrapper, removing the annotations and replacing them after @dataclass has done its work. ↩︎

blhsing · April 18, 2024, 3:33am

The syntax of _: NO_FIELD looks rather ugly IMHO.

Since the goal is to more easily declare a block of variables with ClassVar, I think it would look cleaner to enclose the block in a context manager that applies the ClassVar transformation to the name delta between annotations before and after the block:

import sys
from typing import ClassVar

class NoField:
    def __enter__(self):
        self.starting_names = set(sys._getframe(1).f_locals['__annotations__'])

    def __exit__(self, exc_type, exc_val, exc_tb):
        annotations = sys._getframe(1).f_locals['__annotations__']
        for name in annotations.keys() - self.starting_names:
            annotations[name] = ClassVar[annotations[name]]

no_field = NoField()

so that:

@dataclass
class Foo:
    with no_field:
        a: int = 1

print(Foo.a) # outputs 1
print(Foo(2).a) # TypeError: Foo.__init__() takes 1 positional argument but 2 were given

Similarly, the existing sentinel KW_ONLY can be made into a context manager to support such a usage.

Kxnr · April 18, 2024, 7:15am

at risk of veering off topic: I’ve never thought about using with in a class body; I think the previously mentioned questions with this proposal are still relevant ^[1], but I’m interested in trying out how using with this way does (or doesn’t) work for its own sake.

how would a value not set through one of the previously mentioned mechanisms be set? ↩︎

nathan-chappell · April 19, 2024, 6:16am

I like the idea of a context manager rather than a magic value that cause semantics after its seen. Seems way more straightforward and less surprising.

HPou · April 20, 2024, 9:19pm

The idea of the context manager is really interesting. I only see one drawback: it uses ClassVar. This means, as pointed out above, that the value should not be set on instances and type checkers will complain. It achieves the intended effect of attributes being ignored by the dataclasses machinery but it has a side effect on type checkers.

I wonder if Python needs a typing.InstanceVar.

blhsing · April 21, 2024, 6:43am

The type checkers would currently complain because they perform only static analysis and are unable to infer that the variables declared within the context manager are dynamically transformed with ClassVar.

This would become a non-issue once the proposal is officially implemented and documented, and the type checkers are updated to recognize the semantics of the context manager accordingly.

HPou · April 21, 2024, 3:38pm

Would this also be an idea or part of the idea?

@dataclass
class Foo:
    with no_init:
        a: int = 1

In this case the no_init will translate the annotation to:

a: int = field(init=false, default=1)

Imho, having a default value of field(....) is ugly and not straightforward when it comes to understanding what it is, whereas

a: int = 1

is clean and clear and only the scope of the context manager determines that it undergoes a translation to the dataclass expected syntax to avoid having a as part of __init__.

Kxnr · April 21, 2024, 4:18pm

It doesn’t seem that there’s a consensus on implementing this behavior, so trying to nail down syntax seems like a hypothetical exercise.

The questions I haven’t seen answered yet are:

What’s the use case for this feature that isn’t met by one of the existing methods (__post_init__, InitVar, @property)?
How would a variable using this feature be initialized, if not through __init__ or __post_init__?

To clarify my previous comment, I think that using with in a class body is an interesting piece of Python esoterica that I haven’t explored. It would also need substantial justification before being considered for any of the uses mentioned here, and I don’t know that I’d be in favor of using with this way.

Topic		Replies	Views
Dataclasses - make use of Annotated Ideas typing	32	4096	July 28, 2023
Allow for positional- or keyword-only arguments in dataclasses Ideas	9	14347	March 21, 2021
Dataclass single reference to created field Python Help	2	995	November 19, 2022
Why doesn't Python raise a TypeError when type annotations are absent in dataclass subclasses?" Python Help	4	783	July 6, 2023
My wrapper to @dataclass that requires/enforces attribute declarations and constness Ideas	18	3560	January 2, 2023

Dataclasses - Sentinel to Stop creating "Field" instances

Related Topics