My wrapper to @dataclass that requires/enforces attribute declarations and constness

First, quick introduction. I’m a not-so-young physicist who programs a good bit, mostly C++. I only just looked at Python a few weeks ago. Decorators are very neat. __setattr__ and/or properties are just super to allow one to just get on with the coding.

But I like clear interfaces, declared/self-documented up top, and not having assignment to typos go silently unnoticed. I noticed dataclasses, which provide a great format for something like that, and even saw discussions on abusing slotted dataclasses for some amount of declaration enforcement, with some inheritance limitations I believe.

Anyway, to the point:

The readme gives the short version. A test script is included. The wiki gives some related thoughts on encapsulation, along with an included demo script for that as well.

This repo represents half of my total python portfolio, so I expect that shows. I know of a couple of issues, mentioned in the readme or issues. But, it does work (and already caught a couple of my typos in “production” code!), so I thought I’d share. Is the concept useful for python core? I don’t know. I think a more comprehensive version could deal with some of the encapsulation things in a more clean pythonic way, but I doubt I’ll be adding that personally. I think there have been some discussions and developments on that aspect already.

If you use a static type checker like mypy, then you can catch such errors as well.

With this code

from dataclasses import dataclass, field
from typing import Final

@dataclass
class A:
    CONST: Final[int] = field(default=10, init=False)
    x: int

a = A(x=3)
a.y = "foo"
a.CONST = 5

mypy reports:

test.py:10: error: "A" has no attribute "y"
test.py:11: error: Cannot assign to final attribute "CONST"
Found 2 errors in 1 file (checked 1 source file)
1 Like

Nice, does this work for preventing declaration or misspellings within methods? Still, it does clearly get most of the point, and without a run-time penalty.

Edit: Related, does it prevent class variable shadowing by instance variables? Anyway, sounds like I should learn the linters and then consider.

I meant to say thanks for the very detailed reply.

I’m realizing that I shouldn’t assume your comment should be taken as particularly dismissive of a runtime version. It seems that can have use anyway, particularly in less formal environments where one wants to inject a little protection/protection-habits without discussion. Static typing seems still optional and maybe unconventional and relies on less visibly imitable developer habits. I probably should use the annotation spec for Final instead of caps, probably.

So dataclasses and Final seem not, in general, PEP compatible, and I think it’s (close to) impossible to thus define a spec-compliant implimentation, and mypy doesn’t seem very close.

#!/bin/python3
from dataclasses import dataclass
from typing import Final

@dataclass
class MyClass:
	CONST : Final[int] = 5
	    
	def SetConst(self,val):
	    self.CONST=val

x=MyClass(CONST=20)
assert(x.__class__.CONST == 5)
assert(x.CONST == 20)

y=MyClass()          # First assignment to class attribute
assert(y.CONST==5) 
y.SetConst(20)       # First assignment to instance attribute
assert(y.CONST==20)
y.SetConst(30)       # Reassign the instance attribute!!
assert(y.CONST==30)

The asserts all pass and mypy allows it all. I have runtime versions that don’t allow any shenanigans not involving a 5, and one version (not pushed) that just blocks the last change, simply blindly allowing a single assingment to anything.

But PEP 591 explicitly says:
“There can be at most one final declaration per module or class for a given attribute. There can’t be separate class-level and instance-level constants with the same name.” My bold.

And it also seems to imply that assignment in methods other than __init__ are not allowed (and __post_init__?), although the wording isn’t quite tight on that.

It alsmost seems mypy shouldn’t allow Final with dataclasses, or dataclass needs to remove the class variable (in this case). That might be possible.

Technically mypy could simply require this syntax for a class-only const:

	CONST : Final[int] =field(init=False,default=5)

or this syntax for a keyword initializable instance-only constant:

        CONST : Final[int] =field(init=True)

and reject other combinations.

Those do create the right things, maybe not obviously. At present adding a default to the latter syntax results in producing a class attribute as well as the instance attribute, so there is no syntax to create an instance-only initializable constant with a default value that is PEP compliant, and maybe there shouldn’t be. Basically the original syntax is a shorthand for that, and a user might not expect the value 5 to be modifiable at all. It’s more questionable with the field syntax.

I’m not sure any of this can be cleaned up without asking what expected/intuitive behavior would be for a typical user. The present mypy behavior surely isn’t it.

Am I missing something here?

mypy 0.761
python 3.8.10

mypy does block a direct assignment to y.CONST at main scope.

Note that mypy will detect your assignment to final if you annotate val in SetConst:

@dataclass
class MyClass:
	CONST : Final[int] = 5
	    
	def SetConst(self,val: int):
	    self.CONST=val

EDIT: To be more precise: mypy will detect the error as soon as you add any annotation to SetConst.
This would also lead to the detection of the assignment to final:

	def SetConst(self,val) -> None:
	    self.CONST=val

If a method has no annotations, mypy will not consider the method at all for type checking (see No errors reported for obviously wrong code in the mypy docs).

You can change this behavior by running

mypy --check-untyped-defs
1 Like

Nice, but I could also check my code for assignments to CAPS. Was trying to get something else to check things, not more things to check.

You are quite probably missing a fair amount of context around Python’s “gradual typing” approach, then. It’s a key design principle that type annotations and checking are optional, and in general runtime behaviour should not depend on type information from annotations.

You should remember that PEP 591 is a typing PEP, stating how annotations should be interpreted by type analyzers. The dataclass decorator is not a type analyzer, and indeed the documentation explicitly states that

With two exceptions described below, nothing in dataclass() examines the type specified in the variable annotation.

IMO, this is both deliberate and reasonable. One of the key points of gradual typing is that unless you use a type checker, types annotations aren’t enforced. It’s up to type checkers such as mypy to deal with ensuring that the type annotations are respected. I wouldn’t be surprised if it has trouble doing that for a highly dynamic feature like dataclasses which generates code at runtime (indeed, I believe type checkers have special case code to recognise dataclasses and handle them).

Isn’t this a simple consequence of the fact that type checkers, by design, don’t check functions that aren’t annotated? Maybe it’s surprising to someone who normally uses type annotations everywhere, but again, it’s a deliberate consequence of the principle that type checking is opt-in.

I’m not sure what your point here is. If you accept that dataclasses ignore annotations except to treat their existence as the trigger for treating a class variable as a field, then all of the above seems fine to me. The SetConst cases are OK because, as noted, they are not type checked due to lack of function annotations. And setting the instance variable in the constructor seems perfectly fine to me. I’d hope that mypy treats it as Final, and indeed if you annotate SetConst it seems to do so.

So at best, you seem to be suggesting there’s a bug in mypy, but I can’t actually see what that bug is…

On the other hand, if you want a variant on @dataclass that enforces type annotations at runtime, I see no problem with that. And your strictclasses may be a good implementation, I don’t know as I haven’t looked at it. But the fact that you want it and implemented it doesn’t mean the stdlib version is in any way wrong. You may want to publish your strictclasses library on PyPI, if you think others might find it useful, though. But you might need to choose a new name, as there’s already a strictclasses library on PyPI…

1 Like

Yes, you are totally right. I edited my answer to mention this.

Thanks for the info about PyPl and the mypy options.

Yes, to me, having well enforced dataclasses in casually-distributable runtime scripts makes python a winner, and it seems the mypy suggestion isn’t panning out to change that, and is a few key steps from being able to. All fine. I certainly never thought or said that dataclass is broken by not having runtime checks. Stdlib doesn’t have to get it, or improved dataclass property compatibility, or modifications to help with this typing issue, and no dataclass isn’t required to even produce any code compatible with any typing PEP’s. It’s an ideas sub-forum.

The “bug”
Yes, the applying the “gradual” typing fixes a lot of this. Thank you. My oversight for sure in spite of actually having heard about this I’m afraid. Maybe the remaining bug aspect, is less practically important, but mypy does still allow for both the class and instance versions of a Final-annotated variable, and that’s against PEP 591, for what it’s worth. It seems it may never be possible to see the class value through the instance variable name, which definitely limits the practical impact.

The question
But the related/resulting conundrum is what should be done with this and how strictly should PEP even be followed (by checkers, not python)?

@dataclass
class foo:
    CONST: Final[int] =5
    CONST2: Final[int] = (init=True, default=5)

One should read these as instance constants really but keyword initializable in __init__ with a default of 5.

The first at least doesn’t intuitively look like an attribute that can be not 5, but it can. Should we just be happy it’s technically illegal because of shadowing and fix up the checkers? Or should we want this and ignore the technical shadowing violation?

Ok, maybe the point is I should ask in a mypy forum, as python doesn’t care what checkers do. Fair enough. Obviously I can do what I want with my checker, but I guess I was curious if there’s any thought on what is right.

So, from the patient help above, and some other digging, I worked out an alternative to achieve runtime(ish) class checks with mypy, which I’ll show for completeness, but also to highlight the remaining issues in mypy (and maybe the typing PEPs, but not python.

The two criteria I had are that it doesn’t require any installation work from a user (no manually installing mypy), and doesn’t require any extra pre-run step either (guaranteed to happen). A bonus would be that it’s impossible to use the class without using the checks (self enforcing). It turns out two out of three can be achieved with mypy at the cost of some boilerplate code in every main.py:

#!/bin/python3
import subprocess
import sys

def install(package,source=None):
    if source is None:
        source=package
    try:
        __import__(package)
    except:
        subprocess.check_call([sys.executable, "-m", "pip", "install", source])

if __name__ == "__main__":
    install("mypy")
    from mypy import api  # type: ignore
    result=api.run([__file__, "--check-untyped-defs"])
    if not result[0].startswith("Success"):
        print(result[0])
#        exit()               #comment out for testing asserts...

from dataclasses import dataclass
from typing import Final

@dataclass
class MyClass:
    CONST : Final[int] = 5
    
    def SetConst(self,val):
        self.CONST=val     #  This gets caught now by mypy

    def InitNewvar(self):
        self.newvar = 5    #mypy allows this!!  

x=MyClass(CONST = 20)
assert(x.__class__.CONST == 5)    # these pass
assert(x.CONST == 20)                   # these pass

x.InitNewvar()
x.anothervar = 10              #mypy flags this.
assert(x.newvar == 5)        # works.

x.__dict__["CONST"] = 30     # This is also allowed, here or in the setter, but presently also in my runtime checker
assert(x.CONST == 30 )     # but that seems to fall under pythonic self-responsibility. 

output

selfcheck.py:30: error: Cannot assign to final attribute "CONST"
selfcheck.py:40: error: "MyClass" has no attribute "anothervar"
Found 2 errors in 1 file (checked 1 source file)

I added gkb’s tip of “–check-untyped-defs”.

  1. PEP591 explicitly calls the x situation a type error, because the class variable and instance variable shouldn’t exist at the same time with Final, but that’s a pretty legalistic, because you have to try hard to care.
  2. The idea that CONST: Final[int] = 5 actually defines a constructor allowing initialization to 20 feels awkward and not so “final”. It’s less awkward when the CONST: Final[int] = field(default=5) syntax is used. PEP591 seems to have intended to prevent this sort of thing in normal classes (can’t use a Final class variable as a default fall-back for the Final instance variable, as per point 1). And yet with dataclasses there’s really no other way to provide a default value for the constructor parameter. I think this basically has to stand as legitimate, but… It can be explicitly disabled with init=False, so maybe it’s a point for dataclasses that init=False should be the default for Final variables, so that, like in C++11, allowing the constructor to override the const default can be setup explicitly, but is not enabled by default. Then CONST: Final[int] = 5 would not actually define a contradicting constructor. CONST: Final[int] = field(default=5,init=True) would, which is more intuitive. That’s not a typing solution though, and maybe it shouldn’t be, but it’s a code-breaking change :(.

Another thing not in the spec is if assignment should be allowed in __post_init__. Well… the spec by default says it shouldn’t, but maybe it should.

  1. Mypy catches assignment to a non existent attribute (anothervar) from outside the class, but it doesn’t stop it from methods of the class (newvar). This was actually a main motivation for the wrapper, to ensure that the class/instance attributes are all declared at class scope.

So there’s that, but also just thought I’d post the method above. (The package source is there to allow URLs)

EDITED I confused myself on a further error, that I wasn’t expecting, in my first draft of this post.
EDIT2 Added the bypass example using direct access to the __dict__
EDIT3 Added newvar and anothervar examples.

The first draft of the above post had a mistake.

I updated the above with an example where mypy does not prevent initialization of undeclared dataclass attributes from within a method. This was a big motivation for the wrapper, to enforce that attributes must be declared. The wrapper enforces this. (edit: it’s in the thread title, I just missed it when mypy was first suggested for preventing initialization of undeclared attributes.)

Are you looking to just prohibit assignment to these attributes after the instance is created? If so, you can use the frozen=True parameter on the decorator (for the whole class) or each field. Type checkers should understand that and properly show errors if you try to do so, and it’ll fail with an exception if you do it at runtime.

Are you looking to just prohibit assignment to these attributes after the instance is created?

There are two separate contexts to answer that A) my wrapper, and what I achieve with that, and B) there’s what I’m trying to understand should be a reasonable expectation and interpretation of the existing standard.

Let’s refer to my comment above from Dec 15.

For context A, in the end, the main reason I’m still going with the wrapper is because of point 3 from that post. That’s not a PEP, but I think fits very well with the spirit of dataclasses and encourages organization of declarations.

Constness is handled by mypy, and regarding context B the “const” (Final/frozen/whatever) issues I point out relate not to anything that happens after an instance is actually created, or, to be specific, at least not after __init__ (and/or potentially __post_init__). It’s just related to how it will be created, what value you would expect it to be created with, and how the standard allows specifying that. For details, refer to issues 1 and 2 in the Dec 15 comment.

Frozen is interesting. I haven’t tried it. It doesn’t address the points above, but it does create true runtime enforcement of Finalness in a sense. However, as far as I can read from the documentation:

, it only applies to the whole class, not individual fields. Like @strictdataclasses, it uses a __setattr__() to do it. For what it’s worth, the @strictdataclass wrapper uses __setattr__() but is also compatible with a user-defined __setattr__() calling super(), which becomes more meaningful when only individual fields are constant.

I haven’t read through the details of every message in this thread, but it seems like what you really want is mostly runtime enforcement of attribute annotations in a dataclass-like class, and immutable fields. If so, seems to me pretty close to what pydantic does so you might want to take a look - it’s a pretty popular library with a very comprehensive framework for data models, validation, and related features. It lets users create “model” classes that are similar to the standard library dataclasses, but provide extra functionality such as validating that field values conform to the annotations upon initialization.

@a-reich

Thanks. Sure, I’ll look at it. Quickly looking at docs, it’s not immediately clear to me that it does anything about issue 3 of post 12, which is still the main point to me, but there’s a lot there, so I can’t say for sure.

Honestly, I just added in some constness control (unfinished still really) mainly because I could, and then thought, well, why not think about doing it in a way that is “right,” and then found these questions about what right even is or should be (issues 1 and 2) that seem really not so clear.

In the end the constness thing, aside from the class variable technicality, really boils down to the idea that maybe init=False could have been default for Final variables to make things intuitive. For example in C++11, if you have a cosnt member with a default initializer, that will be its initialized value unless you explicitly add an initializer list to the constructor. You can’t just initialize it in the constructor to parameter-provided values without a very explicit syntax, whereas in python dataclasses, not only can you, but the constructor is setup to do that by default. When a variable is defined something sort of like const int a = 5. It seems to me that there’s sort an intuitively reasonable expectation that said variable will always be 5. Maybe that’s just me or my C++98 background. But I don’t think this dataclass case was really considered in the PEP.

Changing that default now would break things though, and I don’t see a reasonable way to enforce anything in a type checker either, so I think I answer my own point. var : Final[int] = 5 provides user initialization to arbitrary values in dataclasses, and that just is what it is, regardless of type checking, at least until/unless the standard lib ever gets dataclasses2.