Automatic Class Creation During Function Call

zrothberg · February 21, 2022, 9:29am

So something around this topic has already been brought up in the past, but it was specific to namedtuples and was also 5 years ago. Given that typing as a first party construct is here to stay. I think it would be greatly beneficial if we could use automatic class initialization in function arguments (this is probably a poor way to state this).

from typing import NamedTuple
from datetime import datetime

class ArgumentGroup(NamedTuple):
    name:str
    date:datetime=datetime(year=1970,month=10,day=8)
    

def greet(start:ArgumentGroup, how_many_times:int):
    pass


greet(start=(name="Jeff"), how_many_times=10)

Basically instead of needing to construct a class ArgumentGroup the actually type can be inferred from the type hints and the constructor called. In order to prevent clashing with tuples it would be required that only keyword arguments are passed. Currently this code is just illegal. While this seems fairly useless inside of one file it has a great amount of utility for dealing imported functions. It would prevent the need for important dozens of classes and still allow code to use helper objects for argument grouping.

Ideally something like this could be reached via a __method__ so that arbitrary classes could implement it. It would really help with function parameter organization because it would dramatically reduce the burden of using tuples/classes to hold arguments. Particularly it creates a lot of wasted namespace to be importing all of them. TypedDict appears to have been trying to solve a similar issue but frankly I haven’t seen nor used it often. I imagine the scope of it was limited by it being in the pre type hints are first party era.

steven.daprano · February 21, 2022, 1:37pm

Hi Zachary,

I think you need to explain this proposal a bit better.

You say:

“Basically instead of needing to construct a class ArgumentGroup the actually type can be inferred from the type hints and the constructor called.”

If ArgumentGroup doesn’t exist, what constructor is going to be called?

Without ArgumentGroup, how does the interpreter look at this call:

greet(start=(name="Jeff"), how_many_times=10)

and infer that start has to be an object with two fields, “name” plus “date” that has a default value?

Also, you should consider that as written, ArgumentGroup instances have no per-instance attributes, and two shared class attributes.

There is also an argument that if our functions are so complicated that we have to wrap a bunch of parameters into “argument groups”, the function is badly designed.

Instead of making it easier to write such badly designed over-complex functions, we should encourage people to fix their design. Or at least not encourage them to use poor design.

The ArgumentGroup instances have state but no behaviour. They combine two arbitrary pieces of state, a name and a date, just for the sake of reducing the number of parameters in the greet() function. This is sometimes called the “Parameter Object” design pattern, but I think it is actually an anti-pattern. The only reason that ArgumentGroup instance exists is to hide the fact from casual readers that the greet() function takes three parameters, not two.

http://wiki.c2.com/?ParameterObject

Parameter Objects make sense if the parameters go together as an encapsulated whole. E.g. we can encapsulate four parameters

fontname, fontsize, style, colour

into a single TextStyle object. But if the parameters are arbitrarily jammed together, with a generic name like “ArgumentGroup”, that is a strong Code Smell and a sign that, just maybe, it is an anti-pattern.

So I think that I might be more open to your suggestion if the example was less of an anti-pattern. Can you give an actual real example of where this would be useful?

zrothberg · February 21, 2022, 8:10pm

Not sure what the mix up here is but instance=ArgumentGroup(name="Zach") Does have per-instance attributes my crappy naming may have just caused confusion.

Sorry I should have waited until I got up to post this instead of right before going to sleep. Yes I was intending to refer to something like Parameter Objects.

from typing import NamedTuple

class Point(NamedTuple):
    x: int
    y: int
    z: int

TuplePoint = tuple[int, int, int]

def distance(starting_point: Point, ending_point: Point) -> Point:

    return Point(x=starting_point.x-ending_point.x,
                 y=starting_point.y-ending_point.y,
                 z=starting_point.z-ending_point.z)

def alt_distance(starting_point: TuplePoint,
                 ending_point: TuplePoint) -> Point:

    return Point(x=starting_point[0]-ending_point[0],
                 y=starting_point[1]-ending_point[1],
                 z=starting_point[2]-ending_point[2])

NamedTuple Point(x=2, y=3, z=4), NamedTuple Point(x=3, y=5, z=7)

distance(starting_point=Point(x=2, y=3, z=4),
         ending_point=Point(x=3, y=5, z=7))

alt_distance(starting_point=Point(x=2, y=3, z=4), 
             ending_point=Point(x=3, y=5, z=7))

tuple (2,3,4) tuple(3,5,7)

distance(starting_point=(2,3,4),
         ending_point=(3,5,7))
 # this would break as tuples have no member x y or z

alt_distance(starting_point=(2,3,4), 
             ending_point=(3,5,7))

New method

# automatically converted to Point(x=2, y=3, z=4)
distance(starting_point=(x=2, y=3, z=4), 
         ending_point=(x=3, y=5, z=7))

# Exception tuple does not support keyword arguments x, y, z
alt_distance(starting_point=(x=2, y=3, z=4), 
             ending_point=(x=3, y=5, z=7))

A real world example where this would be helpful is tf Keras and Pillow. Even though Keras is using pillow and creating pil images it uses an inverted coordinate system of Height, Width to better match the rest of its library. Because both libs are using simple tuples this can get obscured and leads to lots of subtle bugs if you are converting between the two where shapes don’t match what you expect. I am under the impression that auto converting to a class like Point would increase the amount of code using well define Parameter Objects instead of simple tuples.

This should just raise an exception and fail.

steven.daprano · February 21, 2022, 10:20pm

Ah, the mixup is that I was not aware that typing.NamedTuple behaves like dataclasses, in that attributes that are seemingly created in the class level are actually at the instance level.

I knew that dataclasses worked like that; I didn’t know that NamedTuple did too. I don’t know how I feel about that as an API. I think it benefits lazy programmers at the expense of being outright misleading to those who know about class scope.

Sorry about the noise.

steven.daprano · February 21, 2022, 10:57pm

You managed to write the word “point” twelve times each in two 1-line functions (distance and alt_distance).

Not every variable or parameter needs a verbose name, especially for heavily mathematical functions and those with a lot of repetition.

How about this instead?


def distance(a: Point, b: Point) -> Point:

    """Return the difference a - b as a vector mistyped as a Point."""

    return Point(a[0] - b[0], a[1] - b[1], a[2] - b[2])

So let’s get back to the new proposal: you want to be able to write this:

distance(a=(x=45, y=46, z=47), b=(x=12, y=13, z=14))

instead of either of these:

distance(a=Point(x=45, y=46, z=47), b=Point(x=12, y=13, z=14))

distance(a=Point(45, 46, 47), b=Point(12, 13, 14))

and have the interpreter automatically cast the keyword parameters

(x=45, y=46, z=47)

(x=12, y=13, z=14)

to an implicit class constructor Point(...) based on the type hint of the distance() function declaration. Have I got that much right?

Regarding Keras and Pillow, don’t read them as “Parameter Objects” (which, as I said earlier, is a code-smell if not an anti-pattern. Read them as Points, or possibly vectors, which are well-defined mathematical and geometrical objects that go together in a meaningful way, not an arbitrary collection of parameters jammed together just for the sake of reducing the number of parameters to a function.

So in the case of Keras and Pillow, they both use points, but with different coordinate systems.

I agree that this is a case where named classes might be helpful, although I can think of a few alternatives.

But there is no reason why they can’t use a named class explicitly.

zrothberg · February 22, 2022, 1:12am

So while there isn’t any technical limitation that prevents them from doing so. It is simply not done. My impression of what the issue is comes down to namespace. For something like a class that has 10 functions each one possibly having its own special class for well defined argument groups you are incurring an additional 10 namespace imports. Coupled with wanting to use simpler names like shape, size, point ect there would be lots of namespace collisions. Most people just opt to use a plain old structure like a tuple.

GIven that python now has 3+ ways of doing this exact task: NamedTuple, Dataclass, TypedDict, ect. And it’s still a problem I dont think its been properly addressed. All the current solutions try to solve the problem at the definition side when I think the problem lies in invocation side.

I am under the impression if we cannot find a clean and easy way to avoid extra imports it simply won’t be used. Hence why I think a solution that automatically resolves the type is useful. My other thought was if complete magic like the above is problematic. Maybe a builtin would solve the problem. Has the same behavior in terms of using the type hints class but would be more explicit, and wouldn’t require heavy changes to the AST. I know there is a general avoidance of builtin names though.