Revisit Mutable Default Arguments

Yes, that’s right. There’s no protection against someone opening defer x expressions manually if they want to. I’m not sure why that’s a problem though?

1 Like

Because those aren’t problems we’re trying to solve. We’re just focused on function defaults being mutable (including dataclass factories), so that’s the only time we need to unbox.

Since iterating doesn’t unbox, you would see the boxes, however they’re represented, say Deferred(...).

This is just my opinion, but I think because people have to make sense of the code, they have to learn the rules of what parameters have been evaluated and can be seen. And it’s such a rare case that people will spend more time looking it up than they would save just doing it the “simple way”. Just my opinion. Maybe the complexity really pays off somehwere.

I just suggested in the previous two comments a meaning for defer [] on its own, which works for all three cases:

def f(x=defer []): ...

and

@dataclass
class C:
    x: list[Any] = defer []
    y: list[Any] = field(default=defer [], ...)

Is this clear enough, or does it need more detail? Revisit Mutable Default Arguments - #17 by NeilGirdhar

1 Like

So basically, you want a magic object that can be used as a default argument value, giving exactly the same semantics as PEP 671, only with it being called a “deferred evaluation object” with weird semantics? I’m confused. How would it be usefully different?

And I’m even more confused, because you then provide additional semantics for these Deferred objects. So either they are behaving like default argument values, or they’re not. Which is it?

It would work for dataclass factories—both as the RHS, or as the argument to the default parameter of field (allowing us to theoretically deprecate default_factory).

They work for all three cases. They’re just ordinary Python objects that are unboxed when they’re a chosen default value of a parameter.

Maybe it’s worth explaining that the dataclass function treat objects of type Deferred passed as default values as if the callable within Deferred had been passed as a default factory. And this is why this solution works for that case too.

I don’t think I need to code this up to make it clearer?

How does this get implemented?

Good question. In summary:

Whenever the Python interpreter sees an expression of the form deferred x, it would treat it as if it were Deferred(lambda: x) thereby creating a “deferred expression object”.

When the Python interpreter processes a function call, and is filling in default values for unspecified parameters, it would check if each parameter is an instance of the type Deferred, and if so, it would unbox it first.

The dataclass function would alter its code generation so that whenever it sees a default value of type Deferred, it would treat it as if it were a default factory having callable equal to the deferred value object’s boxed callable.

I see nothing wrong with a new keyword or symbol applying only to expressions in certain contexts. We have a previos history on that with async (softword to keyword) and yield (all the way up to yield from). Something like defer (I don’t like the verbosity) could apply only to function declarations for now.

I don’t think that solving the mutable default arguments problem requires solving the semantics of a defer any_expression in every context. Yet there’s precedent in 0709 in the ability to implement complex semantics efficiently.

That said, I agree it would be odd to solve the argument case without solving the dataclass case, which may mean resolving defer on every case.

It should yield different values.

I agree that “magic” would be required, because the result of defer cannot be a standard Python value.

It could also be implemented by changing the default behavior through a __future__ import, so developers gradually opt in (over several years), but that wouldn’t solve the dataclass case.

What I don’t like about a defer applying to any expression is a case like:

x = defer re.find(search_re, text)

On which context does the deferred expression execute? (although things like that already happen with context managers and yield).

EDIT:
Deferreing values doesn’t have the same homogenous solution in every case. The syntax may be the same, but the implementation different for default values, dataclass field values, etc.

1 Like

Then yeah, we have a major problem. This can’t work in all the ways people expect it to. Based on your semantics, every time x is evaluated, it’s a brand new object, yes? Then what happens if you do this:

x = defer []
x.append(1)
x.append(2)
x.append(3)

Three temporary lists, constructed and disposed of, as if you did this?

[].append(1)
[].append(2)
[].append(3)

And if not, what? When should it evaluate and when should it not? Or does it mean that y = x has a significant side effect?

And what about this?

# at top level, not inside any function
x = defer []
print(globals()["x"])
print(globals()["x"])

Does this cause two new lists to be constructed or does it show you the deferred object?

This is the problem. It cannot be a Python object in any way. It therefore cannot be assigned to a variable. What CAN be done with it? Basically nothing. That’s why PEP 671 does not specify any kind of “deferred object”. They simply don’t work in this context.

1 Like

That would be the correct semantics, yet I agree those semantics would not be helpful and would probably add confusion.

You’re not talking about my solution are you? Because that is a deferred object that does solve the problem that PEP 671 solves plus the problem of default factories.

No I’m not, because I’m still not clear on the semantics. You say that it only has meaning as a default argument value, but then it also suddenly has to have meaning in dataclasses?

I edited the linked comment if you’d care to look at it again.

The dataclass just reaches into the deferred value object and pulls out the callable so that it can treat it as if it were the default factory.

You could say it only has meaning as a default argument value. I would say that that’s the only context it in which the interpreter will unbox it.

Ultimately it was really just your idea. You were the one who said that we need a deferred value that will “magically” be unboxed every time a function is called. I just crafted this solution around your idea.

1 Like

Then I’m very much against it, because my entire point was that such magic is a bad thing.

Ah, I see the difference now:

  • defer should evaluate only once
  • late-binding args should evaluate on every function call

Exactly. Nailed it.

Maybe I’m dancing on my own here but I found this to be a fun exercise. Solving the general case has all kinds of complicated issues, but solving the specific case of “I want my function to default to a fresh mutable object” seems pretty useful and I think it can be solved with inspect.signature and functools.wraps. This is just a handy utility to replace if param is None: ... boilerplate.

import inspect
from functools import wraps

def mutable_default(default_factory=None, **kwargs):
    """
    Wrap a function such that default None will be replaced with a new object (defaultdict semantics).
    Use kwargs to define multiple factories based on the parameter name. If default_factory=None
    then unspecified parameters with default None will remain as None
    """
    def wrapper(func):
        sig = inspect.signature(func)
        none_args = set(kwargs)
        if default_factory is not None:
            none_args |= {kw for kw, v in sig.parameters.items() if v.default is None}
        
        @wraps(func)
        def wrapped_func(*args, **kwds):
            bound_args = sig.bind(*args, **kwds)
            unbound = none_args.difference(bound_args.arguments)
            bound_args.apply_defaults()
            bound_args.arguments.update(
                (kw, kwargs.get(kw, default_factory)()) for kw in unbound
            )
            return func(*bound_args.args, **bound_args.kwargs)

        return wrapped_func

    return wrapper

There’s a little overhead with this, but if the function is doing real work it might be negligible [1]. I’ve tested this a bit but I probably haven’t covered every way to call a function.


  1. earlier iterations were faster but covered fewer use cases ↩︎

4 Likes

hi @apalala

I prototyped an decorator which does your rewriting at runtime:

mutable_defaults.py

import inspect
import ast


class Singleton:
    pass


singleton = Singleton()


def singleton_node():
    return ast.Attribute(
        value=ast.Call(
            func=ast.Name(id="__import__"),
            args=[ast.Constant(value="mutable_defaults")],
            keywords=[],
        ),
        attr="singleton",
    )


def mutable_defaults(func):
    source = inspect.getsource(func)
    tree: ast.FunctionDef = ast.parse(source).body[0]
    tree.decorator_list = []
    args = tree.args
    header = []

    def handle_arg(arg, default):
        header.append(
            ast.fix_missing_locations(
                ast.If(
                    test=ast.Compare(
                        ops=[ast.Is()],
                        left=ast.Name(id=arg.arg, ctx=ast.Load()),
                        comparators=[singleton_node()],
                    ),
                    body=[
                        ast.Assign(
                            targets=[ast.Name(arg.arg, ctx=ast.Store())],
                            value=default,
                            type_comment=None,
                        )
                    ],
                    orelse=[],
                )
            )
        )

    # positional arguments
    for arg, default in zip(
        [*args.posonlyargs, *args.args][-len(args.defaults) :], args.defaults
    ):
        handle_arg(arg, default)
    args.defaults = [singleton_node() for _ in args.defaults]

    # keyword arguments
    for arg, default in zip(args.kwonlyargs, args.kw_defaults):
        if default is not None:
            handle_arg(arg, default)

    args.kw_defaults = [
        singleton_node() if default is not None else None
        for default in args.kw_defaults
    ]

    tree.body.insert(0, header)

    new_code = ast.unparse(ast.fix_missing_locations(tree))
    print(new_code)

    global_env = {}
    exec(new_code, global_env)

    new_func = global_env[tree.name]

    return new_func

example.py

from mutable_defaults import mutable_defaults


@mutable_defaults
def func(a, b=[]):
    b.append(a)
    return (a, b)


print(func(1))
print(func(2))


@mutable_defaults
def func(a, b=[], *, c={}):
    b.append(a * 2)
    c[a] = b
    print(c)


func(5)
func(6)

output:

def func(a, b=__import__('mutable_defaults').singleton):
    if b is __import__('mutable_defaults').singleton:
        b = []
    b.append(a)
    return (a, b)
(1, [1])
(2, [2])
def func(a, b=__import__('mutable_defaults').singleton, *, c=__import__('mutable_defaults').singleton):
    if b is __import__('mutable_defaults').singleton:
        b = []
    if c is __import__('mutable_defaults').singleton:
        c = {}
    b.append(a * 2)
    c[a] = b
    print(c)
{5: [10]}
{6: [12]}

some open points:

  • preserve signature
  • keep constant defaults like None or 5
  • call the original function instead of recompiling it into the new function

However, there are some limitations:

def func(a,l=[a]): # a is not defined
    pass

does not work because a has to be defined when func is defined.

I don’t know if macros could help here PEP 638 – Syntactic Macros | peps.python.org

There’s no given reason why a defer should evaluate only once.

But I’m replying mostly to remind us that functional languages solved state-preserving values a long time ago with Monads.

(Attention @Rosuav )

def make_list(basis=[]):
    basis.append(42)
    basis.append(spam)
    print(basis)

Is that enough reason to guarantee that it evaluates only once? I think it would be EXTREMELY surprising if this printed out an empty list.