Dynamic evaluation of function argument list initializer

zhangyx · October 24, 2024, 7:49pm

Let’s start with a minimal example:

from time import time, sleep

t_start = time()

def log(msg: str, stamp: float = time()):
    duration = stamp - t_start
    print(f"[{duration:.2f}]", msg)

print("Hello 1")
sleep(1)
print("Hello 2")
sleep(1)
print("Hello 3")

I’ve made this type of “mistake” several times, I’ve seen other people making this mistake and it usually takes a lot of effort to be diagnosed in a real code base because people usually do not think into this direction.
I also did an unofficial survey by showing this piece of code to people with moderate experience coding in python (people who coded for a few years, written a few thousand lines of python). And all of them did not see the problem without me pointing it out.

Now, what do you expect it to print? Is it something like below?

[0.00] Hello 1
[1.01] Hello 2
[3.01] Hello 3

Well this is what we actually get:

[0.00] Hello 1
[0.00] Hello 2
[0.00] Hello 3

If you are fairly familiar with either Python or JS, you might already see what’s wrong: the default value for optional argument stamp is evaluated immediately when the function is declared.

I do understand that this behavior is a feature, not a bug. But I think in some cases, this language feature will cause troubles and make implementations unnecessarily cumbersome.

1. Dynamically updated values

An example for this is already shown above. Let’s look at another case that people might be confused:

count = 0

def log(n=count):
    print("count is", n)

for _ in range(10):
    count += 1
    log()

Will print all zeros.

2. Side Effects

# Suppose this is part of a library

def do_task(prerequisite = print("do some task")): ...
# "do some task" printed upon library imported


def load_yaml_config(file = open("config.yaml")): ...
# "config.yaml" is opened regardless of whether an alternate is provided
# This open() call happens as soon as library is imported


def load_json_config(config = fetch("https://example.com/config.json")): ...
# HTTPS request send even if alternate json file is provided
# This happens as soon as user imports the library,
# even if load_json_config() is never called.

3. Circular References

Sometimes we want a default value to be a not-yet-initialized object. While the not-yet-initialized object also depends on the function itself. This will throw “attribute not found” because argument list is immediately evaluated:

def task1(next_task = task2): ...
def task2(next_task = task1): ...

4. What if anything is meaningful? (e.g. `None` has its own meanings)

“Enough,” - you might say - “You can set the default value to None and check it later”

Well I guess many of you will come up with this solution for my original example:

from time import time, sleep

t_start = time()

def log(msg: str, stamp: float = None):
    if stamp is None:
        stamp = time()
    duration = stamp - t_start
    print(f"[{duration:.2f}]", msg)

Well this solves the original problem with the cost of two extra lines of code with some (arguably) ugly indentations. But what if I now ask for one more feature: stamp = None means do not print stamp?

# The solution will be like this:
class Nothing:
    pass


def log(msg: str, stamp: float | None | type[Nothing] = Nothing):
    if stamp is Nothing:
        # Stamp is not provided, use current time
        stamp = time()
    if stamp is None:
        # User asks to omit stamp
        print(msg)
    else:
        print(f"[{stamp - t0:.2f}]", msg)

# Alternative, but comes with some drawbacks:
# 1. Will loose type hinting to argument stamp
# 2. argument "stamp" is now keyword only
# 3. time() is called each time even when user supplies a stamp
def log(msg: str, **kwargs):
    if "stamp" in kwargs and kwargs["stamp"] is None:
        # User asks to omit stamp
        print(msg)
        return
    stamp = kwargs.get("stamp", time()) # Extra call to time() here
    print(f"[{stamp - t0:.2f}]", msg)

Proposal: adding a “@dynamic” keyword decorator might help

This feature can only be implemented by the interpreter, because normal decorators are invoked after the function has been evaluated, it will be too late to do anything.

For example:

@dynamic
def log(msg: str, stamp: float = time()): ...

# time() is evaluated everytime when log() is called
# with no "stamp" argument supplied.
# Each call to time() will trigger a separate evaluation.

Potential Bonus Feature:

@dynamic decorator might also help other decorators to access run-time variables.

# Sometimes people would like to access "self" for a decorator on their
# class member-function.
# Currently the only solution is the decorator factory intercepts the
# first argument (`self`) sent to the decorated function.

def context(ctx):
  def decorator(fn: callable):
    def wrapper(*args, **kwargs):
      with ctx:
        return fn(*args, **kwargs)
    return wraps(fn)(wrapper)
  return decorator

class Queue:
  def __init__(self):
    self._lock = Lock()
    ...

  # attribute "self" is captured from argument list and made available to
  # preceding decorator expressions in a temporary locals()
  # Scope will be destroyed before evaluating argument list for get()
  @dynamic(self)
  @context(self._lock)
  def get(self):
    ...

pf_moore · October 24, 2024, 7:54pm

jamestwebber · October 24, 2024, 7:55pm

I highly recommend reading the background on previous attempts at this. Besides PEP 671 there was also a more recent thread about deferred expressions.

It’s complicated!

Rosuav · October 24, 2024, 8:12pm

This can’t work as a decorator, since the default arguments have to have already been evaluated before the decorator could work. It would need interpreter support. PEP 671 has already been mentioned; it wasn’t the first proposal along these lines, and it likely won’t be the last, but broadly speaking, what you need is compiler support for it.

zhangyx · October 24, 2024, 8:19pm

Thanks! Seems like this PEP closely matches the problem I described.

@pf_moore Can you comment a bit on the “bonus feature” that I described?

zhangyx · October 24, 2024, 8:20pm

That’s why I said that it’s a keyword decorator - it changes the behavior of python interpreter.

Rosuav · October 24, 2024, 8:24pm

Hmm, that seems kinda separate actually. I don’t think it’s really connected to the idea of default argument evaluation.

What you’re trying to do, though, is rather awkward in terms of scoping. Remember that a decorator works roughly like this:

@some_decorator(arg1, arg2)
def some_function(spam=expression):
    function_body

# approximately equivalent to

_deco = some_decorator(arg1, arg2)
def some_function(spam=expression):
    function_body
some_function = _deco(some_function)

You have to be able to evaluate the decorator at the time the function is defined, not when it’s called. Something like self._lock might refer to many different locks depending on which object the method’s being called on.

For something like this, it really makes more sense to just keep the with block inside the function, but if you really want it to be a decorator, you would need the decorator to reach into the call stack. Not something I’d recommend.

zhangyx · October 24, 2024, 8:29pm

I have actually been wanting this kind of “dynamic” wrappers for a while (especially for contextmanagers) - a with block scoping a entire function indents everything. And sometimes you just cannot afford another level of indentation.

dg-pb · October 24, 2024, 10:11pm

One solution could be functools.lazy, so that:

a = lazy(lambda: 1)
print(type(a))      # lazy
print(a)            # 1
print(a + 1)        # 2
print(type(a + 1))  # int

Could be an object which implements full set of proxy methods with several clearly defined exceptions.

Your initial example would work:

def log(msg, stamp=lazy(time)):
    duration = stamp - t_start

There are libraries for this.
lazyasd
wrapt - has C implementation

But I think it might be worth considering bringing something robust to standard library.

zhangyx · October 24, 2024, 10:36pm

Interesting, never thought of this approach…

Brute force but gets the job done. (seems like certain observations cannot be proxied, such as is, and, or, and type).

For the circular reference example, could it work like this?

def task1(next_task = lazy(lambda: task2)): ...
def task2(...): ...

dg-pb · October 24, 2024, 11:34pm

Not all of them. and & or use __bool__

So the set of mis-proxied actions is pretty much limited to is, is not, type. I am sure there are some others but these are the most used ones.

Yup.

sirosen · October 25, 2024, 1:55am

You may want to look into the sentinels PEP (661, IIRC?), for these particular cases.

I also recommend looking at pylint and flake8-bugbear , if you’re not already familiar with those tools. Both have lints for special cases in which mutable arg defaults can surprise new Python developers.

I wouldn’t tend to argue for late binding defaults starting from the premise that the current behavior is confusing and that adding late binding would improve that.

I think the early binding behavior can surprise people, but late binding defaults actually make things more confusing for beginners. Novice Python developers may not understand that some special marker significantly changes the meaning of a default. To me, the argument in favor of it is “for experienced folks, this is a useful way of encoding ideas which currently are very verbose”.
(I think 671 is really interesting, by the way. But my support for it might stem from a different perspective.)

zhangyx · October 25, 2024, 4:36pm

The sentinels proposal (PEP 661) seems to provide a concept very similar to Symbols in JavaScript.

However, the case I described needs something a bit different: the placeholder sentinel must not be made available to anything other than the function itself. Since python does not provide a way to declare private variables (in classes), the only solution is to use function’s temporary local scope (closure) to achieve this:

def log_factory():

    # Private to this specific log function, cannot be used by others.
    class Nothing:
        pass

    def log(msg: str, stamp: float | None | type[Nothing] = Nothing):
        ...

    return log

In other words, even if PEP661 is implemented, this tweak is still required to ensure that a sentinel is private to a specific function.

In the meanwhile, PEP671 (=> operator) should be able to eliminate the need of a sentinel placeholder in function argument list.

sirosen · October 25, 2024, 4:54pm

I’m not sure what you’re trying to describe here, but

stamp: float | None = object()

isn’t valid under type checkers.

If you want a private sentinel value, you can

class _DefaultType: pass
_default = _DefaultType()
def silly(x: int | _DefaultType = _default) -> int:
    return 0 if isinstance(x, _DefaultType) else x + 1

If you want some kind of value which can’t be accessed from the outside, you’ll want a non-Python language, since we can always inspect the signature, call stack, etc.

edit: switched to an isinstance check so that type checkers would understand it.

zhangyx · October 25, 2024, 5:01pm

I’ve corrected my type hint (edit: turns out that Literal[] can only be used for primitives, not objects). The point is to ensure that this value is only visible to the function that declares it.

Underscore naming is only a convention that “indicates” the value being private, but it does not prevent other modules from accessing the value, neither does it prevent another function in the same module from reusing it for a slightly different purpose.

It’s not about security for packages that are source code distributed … It’s more about an assurance that the value is solely used by one single codeblock for one single purpose. Whatever stupid error made elsewhere will not break this piece of code. See example below

By directly using an object as a sentinel, you can achieve cleaner and more expressive code:

class Nothing:
    pass

# Approach 1

stamp: float | None | type[Nothing] = ...
if stamp is Nothing: # Closely matches natural language
    ...

# Approach 2

stamp: float | None | Nothing = ...
if isinstance(stamp, Nothing): # Works but requires a bit more thinking
    ...                        # when someone tries to understand the code

Edit: Turns out that the typing system does not like this:

Nothing = object()

stamp: float | None | Literal[Nothing] = ...

# Pylance warning:
# Type arguments for "Literal" must be None,
#     a literal value (int, bool, str, or bytes),
#     or an enum value

I am curious if it will be possible to use Literal[sentinel] with PEP661 …

zhangyx · October 25, 2024, 5:42pm

An example where other code can break your internal logic:

# your_lib.py
class _DefaultType: pass
_default = _DefaultType()
def silly(x: int | _DefaultType = _default) -> int:
    return 0 if isinstance(x, _DefaultType) else x + 1

# After a few hundred lines of code ...

class _DefaultType: # Other careless programmer wrote this
    ...

# Now your original _DefaultType is overwritten
# And your original typecheck will always return False.
# It will be amazingly tricky to diagnose the problem
# if you don't know where to look at.

Or even:

# naughty.py
import your_lib

del your_lib._DefaultType

# Now silly() will throw NameError
# although we did nothing to the function itself.

In Contrast:

However careless a programmer is, they would not do this unless they are intentional:

def hide():
    hidden = object()
    return lambda item: item is hidden

# check() should always return False unless hacked
check = hide()

# These must be intentional, not careless
check(check.__closure__[0].cell_contents) # True
del check.__closure__[0].cell_contents
check(None) # NameError

sirosen · October 26, 2024, 1:57am

To me, these are not compelling arguments. In general, I don’t really follow what you mean by “careless”, since it’s a definition which somehow involves very intentional looking acts of self-sabotage. Calling deletion of a module attribute “[doing] nothing to the function” is, to me, a very, very strange definition of “doing nothing”.

Python does not try to protect you from yourself. e.g., import sys; sys.stdout = (); print("boom")

Specifically regarding name shadowing, several linters will warn about this and help you catch accidental name shadowing. If this is a genuine concern of yours, I strongly recommend looking into tooling which can help you.

JadenCorr · October 26, 2024, 1:59am

In our code, we made even dirtier trick, wich actually exploits Any type to make Mypy not angree to us.

@final
class Undefined:
    __instance: ClassVar[Undefined | None] = None

    def __new__(cls) -> Undefined:
        if cls.__instance is None:
            cls.__instance = super().__new__(cls)
        return cls.__instance

    def __getattribute__(self, item: str) -> Any:
        if item in ("__str__", "__repr__", "__bool__"):
            # Excluding defining functions to prevent crash on using UNDEFINED as a default value.
            # For example, it can be used as a default argument in a python `dataclass` object.
            return super().__getattribute__(item)
        raise AttributeError(f"`{item}` is not set.")

    def __str__(self) -> str:
        return "[UNDEFINED]: Attribute is not set."

    def __repr__(self) -> str:
        return self.__str__()

    def __bool__(self) -> bool:
        return False


UNDEFINED: Final[Any] = Undefined()

Actually this allowed to use UNDEFINED as a default parameter to any function we want without typing definition complication.

Like:

def foo(some_string: str | None = UNDEFINED) -> None:
     # Do something here

It’s still pretty dirty, but help us to avoid tons of headcache In scenarios where None and Undefined mean different things. For example, values hasn’t been fetched from the source, or it is fetched and is missing.

In general, expanding usage of python NotImplemented for such scenarious (or creating similar built-in object) will be a cool idea. And not, it doesn’t leads us to JS-like issue with “undefined is not a function”. It’s always not undefined 'till you directly use it.

zhangyx · October 26, 2024, 6:06pm

I agree, upon posting those replies I realized that is just some kind of weird obsession my of own, not a common demand.

Since PEP671 is already there and would solve my problem better than what I proposed, I am now sitting back and hoping it to be delivered soon.

BTW, a general question to the devs: I am interested in learning more about CPython and hopefully contribute to it in the future. Where and how should I get started?

Rosuav · October 26, 2024, 6:30pm

It’s not under active development. It stalled out due to massive resistance.