Backquotes for deferred expression

elis.byberi · November 29, 2024, 1:17am

I don’t consider that lazy evaluation or deferred expression. It would be more appropriate to refer to it as Runtime Code Evaluation.

Currently, we can make use of eval:

import math
import time

x = 500_000

def factorial(x):
    t = time.time()
    expr = "math.factorial(x)"
    print("math.factorial(x) took:", time.time() - t)

    t = time.time()
    result = eval(expr, globals(), locals())

    print("eval(expr) took:", time.time() - t)
    print("Bit length of the result:", result.bit_length())

factorial(x)

But I cannot think of any situation where eval is necessary. In other words, Python and programming languages, in general, provide all the necessary features to express any possible algorithm.

Also, is this what CPython is actually doing—executing code at runtime?

elis.byberi · November 29, 2024, 2:21am

If that optimization were done at the interpreter level, it would be fine. However, leaving it to the user solves nothing, as the current code cannot utilize the new deferred object.

dg-pb · November 29, 2024, 2:27am

I think it is a very different concept.

The whole point of this is avoid “explicit call”.

There are cases where it is useful, but this is not the point.
builtins.eval just doesn’t solve problem at hand, which is:

d = dict()
result = d.setdefault('factorial', defer(math.factorial, 100_000))
# Expensive function unevaluated
if condition1:
    result += 1
    # Expensive function evaluated
elif condition2:
    result = defer(operator.add, result, 1)
    # Expensive function propagated with one more operation
else:
    print('result not needed')
    # Expensive function still unevaluated

new_result = d['factorial']
truetype(new_result)    # defer
# Q: Is it evaluated?
# A: If condition1 == True

# ensure that result is evaluated
defer.eval(d['factorial'])
# all `defer` objects are now evaluated and cached
# collapse `result` to actual value
result = result + 0
# or
result = defer.eval(result)

Now, yes. The code above can be restructured to behave similarly without defer.
I don’t argue that there are ways around it.

What I argue is that it is difficult to beat convenience, elegance and seamlessness of the above.

And there is a lot of scope to play with this:
a) cached and non-cached defer
b) func of defer can have arguments bound on definition or behave as no-arg-lambda

a = 1
b = 1
d1 = uncached_defer(lambda x, y: x + y, a, b)
d2 = uncached_defer(lambda: a + b, 1, 2)
print(d1)    # 2
print(d2)    # 2
b = 3
print(d1)    # 2
print(d2)    # 4

However, above is “nice”, while the main argument of this is as follows:

As stated above, the point is to “avoid explicit call”.
Why is it important? Why can’t just use lambda?

Because, it isn’t seamless.
In other words, to use lambda it needs to be explicitly implemented, while this would automatically provide seamless defer evaluation for all existing code.

Lets take implementation of dict.get.
In case of lambda, there are currently 2 options:

additional argument flag to indicate that it is lazy

def get(d, key, default=None, lazy_default=False):
    ...
    if lazy_default:
        default = default()
    ...

wrap it in some class:

def get(d, key, default=None):
    ...
    if isinstance(default, Lazy):
        default = default()
    ...

None of these options actually support further propagation of “unevaluated expensive function”.
In other words, even if it is not needed after get call, it gets evaluated within get, while concept at hand holds on to it until it is “used”.

While with “concept of this thread”, no modifications to dict.get are necessary.
If this worked, it would provide lazy defaults to all get functions in Python universe.

Note, if you want defer to be evaluated immediately, then all it takes is an extra line:

result = {}.get('key', defer(math.factorial, 100_000)
result = defer.eval(result)    # if result is not `defer`, then it just passes-through

It would be resilient even to pickle:

result = deref(math.factorial, 100_000)
unpickled = pickle.dumps(pickle.loads(result))
# still unevaluated
final = unpickled + 1
# evaluated

Also, to put this in relation to “graph evaluation builder”.
This can act as one, but is not the best tool for such.

a = defer(math.factorial, 100_000)
b = defer(operator.add, a, 1)
c = defer(sum, [b, b, b])

However, this is not ideal. Much better pattern to achieve the same:

a = dask.delayed(math.factorial)(100_000)
b = a + 1
c = dask.delayed(sum)([b, b, b])

c = defer(lambda: c.__compute__)

This way, “graph building” and “seamless evaluation entry point” are handled by 2 different tools:

“graph building framework” that takes care of optimisations, parallel computing and similar aspects
defer, which converts the above to object which gets evaluated “on use”

elis.byberi · November 29, 2024, 2:45am

Similar to a hypothetical truetype, you would need to define defer-aware functions. For instance, you would need to implement a dict.get method that supports a defer object as the default value.

Implementing this in Python does not seem as straightforward as in CPython. As mentioned earlier, it would be ideal if CPython could defer expression evaluation until the value is needed. This would eliminate the need for trueassert, yield defer, return defer, and similar constructs.

dg-pb · November 29, 2024, 3:29am

All Pure Python code supports current implementation.

truetype (has different name though) already exists in current implementation.
Very few of such specialised functions are needed, namely:

truetype
explicit evaluation (defer.eval)
defer object attribute access

All of the above are already implemented (just different API).

E.g.:

def foo(bar) -> int | None:
    if type(bar) is int:    # "evaluates here"
        # truetype(bar) == defer
        return bar + 1      # adds 1 to already evaluated value
    else:
        return None

result = foo(defer(lambda: 1)
truetype(result)    # int
print(result)       # 2

So all existing Pure Python code will just “evaluate” defer on “use” and pass-through if not “used”.

From user’s perspective.
If you pass defer into a function, be mindful that you might receive the same defer object as return value. So if you want to ensure collapse to actual value, just do it explicitly. E.g.:

from unknown_pure_python_library import unknown_pure_python_function

result = unknown_pure_python_function(defer(lambda: 1))
# result might be `defer` or NOT
result = defer.eval(result)    # so just ensure to evaluate and collapse to actual value

4 main cases of above:

def unknown_pure_python_function(x) -> truetype(x):
    # returns "unevaluated defer"
    return x

def unknown_pure_python_function(x) -> truetype(x):
    # returns "evaluated defer"
    type(x)    # "uses"
    return x

def unknown_pure_python_function(x) -> int:
    # x is "used" and new result derived from it
    return x + 1

def unknown_pure_python_function(x) -> bool:
    # used, but different value returned
    tp = type(x)    # truetype(tp) == type
    return tp is int:

I have been playing with this implemented in Pure Python for quite a while now.
The issue with it is precisely that - it can not be made to be “complete proxy” due to inability to overload builtins.type, is, is not.

OP has already implemented this in CPython with minimal set of needed utilities.
You can find OP’s PR above. It works and does all from my examples above for Pure Python code.

I think it is good that this thread explored “conceptual design itself” and found out that it is definitely achievable and there are sensible options.

And in Pure Python space this already works well to significant degree.

Now, regardless if this is favourable concept or not, C issues would be the same for any similar ideas. So I think getting down to those is valuable regardless whether this is “exact desirable concept” or not.

elis.byberi · November 29, 2024, 3:17pm

Does this runtime code execution feature isolates variable values?

def add(a, b):
    return a + b

a = 1
b = 2
c = defer(add, a, b)

a = 2
b = 3

print(c)  # 3

In the example above, the expected result is 3. If it is 5, I am unsure how to use this feature.

In the case of abstract data structures, how would isolation work?

That’s similar to how eval works, so it is not lazy evaluation but runtime code execution. With lazy evaluation, c would always evaluate to 3.

The first step is to draft a strict specification: Is it lazy? Is it eval-like? What are the possible edge cases, etc.?

dg-pb · November 29, 2024, 9:04pm

Well, it is both. There are 2 defer classes:

cached version - (what you call lazy)
uncached version - (what you call eval)

And there are 2 different ways to get arguments into the body of lambda/def:

via arguments
capture from outer scope at runtime

You can think of it as “implicitly evaluated on use” functools.partial. Apart from “implicit evaluation on use” it works exactly the same as partial.

from functools import partial

class uncached_defer(partial):
    pass

class cached_defer(partial):
    cache = dict()

    def __new__(cls, func, *args, **kwds):
        def wrapper(*a, **kw):
            key = id(self)
            if key not in self.cache:
                self.cache[key] = func(*a, **kw)
            return self.cache[key]
        self = super().__new__(cls, wrapper, *args, **kwds)
        return self

4 cases. All works as expected:

# 1. Using function with all data bound via arguments.
def func(x):
    print('Evaluating')
    return x

d1 = uncached_defer(func, 1)
d1()    # Evaluating, return 1
d1()    # Evaluating, return 1

d2 = cached_defer(func, 1)
d2()    # Evaluating, return 1 (caches)
d2()    # return 1

# 2. Using function which looks up variables in outer scope

a = 1
d3 = uncached_defer(lambda: a)
d3()    # 1
a = 2
d3()    # 2

a = 1
d4 = cached_defer(lambda: a)
d4()    # 1    caches
a = 2
d4()    # 1

Try it yourself, there is nothing new here.

Elis Byberi:

Does this runtime code execution feature isolates variable values?
def add(a, b):
    return a + b

a = 1
b = 2
c = defer(add, a, b)

a = 2
b = 3

print(c)  # 3
In the example above, the expected result is 3. If it is 5, I am unsure how to use this feature.

It is 3, because you did bind variables explicitly, but you can get 5 if you need as well.

a = 1
b = 2
c = uncached_defer(lambda: a + b)

a = 2
b = 3

print(c)  # 5

So in terms of behaviours above, this concept does not invent anything new.

All the behaviour above is the same as if one would be using partial.

Everything works as expected depending on how one uses it.

The only new bit is “implicit evaluation on use”.

elis.byberi · November 29, 2024, 9:40pm

Is there any reason why this wouldn’t work?

def func2(kv):
    kv.clear()


def func(x):
    print('Evaluating')
    assert 'key' in x
    
    return x['key']


kv = {'key': 'value'}
d2 = cached_defer(func, kv)
func2(kv)

d2()  # AssertionError

The eval version is straightforward, similar to the existing eval.

dg-pb · November 29, 2024, 10:13pm

You mean why this doesn’t work?

Because it caches on first evaluation:

kv = {'key': 'value'}
d2 = cached_defer(func, kv)
d2()    # 'value' (caches here)
kv.clear()
d2()    # 'value'

You mean “uncached version”, right?

elis.byberi · November 29, 2024, 10:19pm

Do I need to call d2 before using kv?

dg-pb · November 29, 2024, 10:32pm

I don’t know. Depends on what you want to do.

To expose inner workings of it… It is nothing more than:

class _NULL: pass
class cached_defer:
    cache = _NULL
    def __init__(self, func, *args, **kwds):
        self.func = func
        self.args = args
        self.kwds = kwds
    def __call__(self):
        if self.cache is _NULL:
            self.cache = self.func(*self.args, **self.kwds)
        return self.cache

This is full implementation of it (except “implicit evaluation on use” part).

kv = {'key': 'value'}
d2 = cached_defer(func, kv)
print(d2.args)    # ({'key': 'value'},)
print(d2.cache)   # <class '__main__._NULL'>
kv.clear()
print(d2.args)    # ({},)
d2()    # AssertionError
kv['key'] = 'value'
d2()     # 'value' (caches here, all subsequent calls will just return the same value)

dg-pb · November 29, 2024, 10:42pm

So to complete the picture, lets add “implicit evaluation on use” part. Here, only __add__ is implemented, but the concept is to overload all “usage” (including __call__) except “assign”.

class _NULL: pass
class cached_defer:
    cache = _NULL
    def __init__(self, func, *args, **kwds):
        self.func = func
        self.args = args
        self.kwds = kwds
    def __call__(self):
        if self.cache is _NULL:
            self.cache = self.func(*self.args, **self.kwds)
        return self.cache
    def __add__(self, other):
        return self() + other

d2 = cached_defer(lambda: 1)
print(d2.cache)  # _NULL
print(d2 + 1)    # 2 (caches)
print(d2.cache)  # 1

elis.byberi · November 30, 2024, 7:49pm

I gave you an example using eval, and the only difference is that we would not need to explicitly evaluate the expression. As shown in the example below, right? The syntax used is irrelevant; right now, I’m using quotes, similar to string literals.

import math
import time

x = 500_000
result = 0
def factorial(x):
    t = time.time()
    expr = "math.factorial(x)"
    print("math.factorial(x) took:", time.time() - t)

    t = time.time()
    result = result + expr   # implicit evaluation

    print("eval(expr) took:", time.time() - t)
    print("Bit length of the result:", result.bit_length())

factorial(x)

Let object E be the expression code math.factorial(x). Here are the steps explaining how the proposed feature works:

Store object E in the expr variable without evaluating it.
Evaluate the expression when you use the object in another expression.

Note that the expression code does not isolate the variable values used in the expression. The values of these variables may change after the expression is defined and stored in the expr variable. Therefore, the user must manually decide when to use object E in another expression, which will trigger the evaluation of the expression code stored in object E.

Is this the proposed feature? The caching aspect is intentionally left out of the discussion for now.

elis.byberi · November 30, 2024, 8:23pm

Assuming the description of the proposal in my previous post is correct, we can already implement this feature in Python (as demonstrated in your examples as well):

import math
import time

x = 500_000
result = 0


class Expr:
    def __init__(self, expr):
        self.expr = expr

    def __call__(self, *args, **kwargs):
        # This method will be invoked when the object is used in an expression.
        # In this case, it will evaluate the expression.
        return eval(self.expr, {"math": math, "x": x})

    # Operator overloading to allow implicit evaluation during arithmetic operations
    def __add__(self, other):
        return self() + other  # Automatically evaluate `self.expr` and add

    def __radd__(self, other):
        return other + self()  # For reverse addition (when self is on the right side)


# Define the expression object
expr = Expr("math.factorial(x)")


def factorial(x):
    global result  # Access global result variable

    # Measure the time taken to compute the factorial
    t = time.time()
    print("math.factorial(x) took:", time.time() - t)

    # Implicitly evaluate the expression when used in arithmetic
    t = time.time()
    result = result + expr  # Implicit evaluation of `expr` during addition
    print("eval(expr) took:", time.time() - t)

    # Output the bit length of the result
    print("Bit length of the result:", result.bit_length())


factorial(x)

What is not possible to implement is telling dict.get that the variable is being ‘used’ when, in fact, it is only being returned. Returning a value or reference does not involve any operation, and therefore does not count as usage.

zhangyx · November 30, 2024, 9:49pm

No it’s not the same. The defer syntax will construct a lambda function immediately. It has 2 obvious discrepancies against your eval analogy:

Syntax errors will be thrown at declare time, not run time.
Name scopes are fixed at declare time. i.e. it will not be fooled by name conflict when you implicitly evaluate it from another scope/module.

If you have to find an “equivalence”, here is the strict equivalent version:

# 1. Syntax sugar version
expr => add(a, b)
# 2. Strict equivalence to (1)
expr = defer.Immutable(lambda: add(a, b))
# 3. Not an equivalence
expr = defer.Immutable(add, a, b)

The 3rd approach is not an equivalence because variable a and b are frozen at declare time.

zhangyx · November 30, 2024, 9:58pm

Yes, the dependent variables may change after a defer object is constructed. The results of deferred evaluation will change correspondingly. But this does not mean you always need to manually control the timing of first observation, especially when you use defer objects as function arguments or return values.

If you have to take a “snapshot” of the current value of a variable, you can use one of these:

a = 1
b = 2

# Approach 1: copy construct argument list
c = defer(add, a, b)

# Approach 2: early bound argument defaults:
@defer
def d(a=a, b=b):
    return add(a, b)

a = 100
b = 200

print(c) # 3
print(d) # 3

elis.byberi · November 30, 2024, 10:30pm

Why do you think that print() will cause c to be evaluated? Does calling c.__str__() cause c to be evaluated?

Rosuav · November 30, 2024, 10:35pm

Is it possible to put these deferred objects into a list without them being evaluated? What about a dictionary?

zhangyx · November 30, 2024, 11:19pm

Yes, but AFAIK it’s repr instead of str. This is not what I “think”, this is what I specified and implemented. It’s already working in the demo.

Yes, direct assignment (including putting in a list or dict) will not cause evaluation. This is the default behavior.

Rosuav · November 30, 2024, 11:23pm

Okay. So does hash(x) cause evaluation?