I don’t consider that lazy evaluation or deferred expression. It would be more appropriate to refer to it as Runtime Code Evaluation.
Currently, we can make use of eval:
import math
import time
x = 500_000
def factorial(x):
t = time.time()
expr = "math.factorial(x)"
print("math.factorial(x) took:", time.time() - t)
t = time.time()
result = eval(expr, globals(), locals())
print("eval(expr) took:", time.time() - t)
print("Bit length of the result:", result.bit_length())
factorial(x)
But I cannot think of any situation where eval is necessary. In other words, Python and programming languages, in general, provide all the necessary features to express any possible algorithm.
Also, is this what CPython is actually doing—executing code at runtime?
If that optimization were done at the interpreter level, it would be fine. However, leaving it to the user solves nothing, as the current code cannot utilize the new deferred object.
There are cases where it is useful, but this is not the point. builtins.eval just doesn’t solve problem at hand, which is:
d = dict()
result = d.setdefault('factorial', defer(math.factorial, 100_000))
# Expensive function unevaluated
if condition1:
result += 1
# Expensive function evaluated
elif condition2:
result = defer(operator.add, result, 1)
# Expensive function propagated with one more operation
else:
print('result not needed')
# Expensive function still unevaluated
new_result = d['factorial']
truetype(new_result) # defer
# Q: Is it evaluated?
# A: If condition1 == True
# ensure that result is evaluated
defer.eval(d['factorial'])
# all `defer` objects are now evaluated and cached
# collapse `result` to actual value
result = result + 0
# or
result = defer.eval(result)
Now, yes. The code above can be restructured to behave similarly without defer.
I don’t argue that there are ways around it.
What I argue is that it is difficult to beat convenience, elegance and seamlessness of the above.
And there is a lot of scope to play with this:
a) cached and non-cached defer
b) func of defer can have arguments bound on definition or behave as no-arg-lambda
a = 1
b = 1
d1 = uncached_defer(lambda x, y: x + y, a, b)
d2 = uncached_defer(lambda: a + b, 1, 2)
print(d1) # 2
print(d2) # 2
b = 3
print(d1) # 2
print(d2) # 4
However, above is “nice”, while the main argument of this is as follows:
As stated above, the point is to “avoid explicit call”.
Why is it important? Why can’t just use lambda?
Because, it isn’t seamless.
In other words, to use lambda it needs to be explicitly implemented, while this would automatically provide seamlessdefer evaluation for all existing code.
Lets take implementation of dict.get.
In case of lambda, there are currently 2 options:
additional argument flag to indicate that it is lazy
None of these options actually support further propagation of “unevaluated expensive function”.
In other words, even if it is not needed after get call, it gets evaluated within get, while concept at hand holds on to it until it is “used”.
While with “concept of this thread”, no modifications to dict.get are necessary. If this worked, it would provide lazy defaults to all get functions in Python universe.
Note, if you want defer to be evaluated immediately, then all it takes is an extra line:
result = {}.get('key', defer(math.factorial, 100_000)
result = defer.eval(result) # if result is not `defer`, then it just passes-through
It would be resilient even to pickle:
result = deref(math.factorial, 100_000)
unpickled = pickle.dumps(pickle.loads(result))
# still unevaluated
final = unpickled + 1
# evaluated
Also, to put this in relation to “graph evaluation builder”.
This can act as one, but is not the best tool for such.
a = defer(math.factorial, 100_000)
b = defer(operator.add, a, 1)
c = defer(sum, [b, b, b])
However, this is not ideal. Much better pattern to achieve the same:
a = dask.delayed(math.factorial)(100_000)
b = a + 1
c = dask.delayed(sum)([b, b, b])
c = defer(lambda: c.__compute__)
This way, “graph building” and “seamless evaluation entry point” are handled by 2 different tools:
“graph building framework” that takes care of optimisations, parallel computing and similar aspects
defer, which converts the above to object which gets evaluated “on use”
Similar to a hypothetical truetype, you would need to define defer-aware functions. For instance, you would need to implement a dict.get method that supports a defer object as the default value.
Implementing this in Python does not seem as straightforward as in CPython. As mentioned earlier, it would be ideal if CPython could defer expression evaluation until the value is needed. This would eliminate the need for trueassert, yield defer, return defer, and similar constructs.
All Pure Python code supports current implementation.
truetype (has different name though) already exists in current implementation.
Very few of such specialised functions are needed, namely:
truetype
explicit evaluation (defer.eval)
defer object attribute access
All of the above are already implemented (just different API).
E.g.:
def foo(bar) -> int | None:
if type(bar) is int: # "evaluates here"
# truetype(bar) == defer
return bar + 1 # adds 1 to already evaluated value
else:
return None
result = foo(defer(lambda: 1)
truetype(result) # int
print(result) # 2
So all existing Pure Python code will just “evaluate” defer on “use” and pass-through if not “used”.
From user’s perspective.
If you pass defer into a function, be mindful that you might receive the same defer object as return value. So if you want to ensure collapse to actual value, just do it explicitly. E.g.:
from unknown_pure_python_library import unknown_pure_python_function
result = unknown_pure_python_function(defer(lambda: 1))
# result might be `defer` or NOT
result = defer.eval(result) # so just ensure to evaluate and collapse to actual value
4 main cases of above:
def unknown_pure_python_function(x) -> truetype(x):
# returns "unevaluated defer"
return x
def unknown_pure_python_function(x) -> truetype(x):
# returns "evaluated defer"
type(x) # "uses"
return x
def unknown_pure_python_function(x) -> int:
# x is "used" and new result derived from it
return x + 1
def unknown_pure_python_function(x) -> bool:
# used, but different value returned
tp = type(x) # truetype(tp) == type
return tp is int:
I have been playing with this implemented in Pure Python for quite a while now.
The issue with it is precisely that - it can not be made to be “complete proxy” due to inability to overload builtins.type, is, is not.
OP has already implemented this in CPython with minimal set of needed utilities.
You can find OP’s PR above. It works and does all from my examples above for Pure Python code.
I think it is good that this thread explored “conceptual design itself” and found out that it is definitely achievable and there are sensible options.
And in Pure Python space this already works well to significant degree.
Now, regardless if this is favourable concept or not, C issues would be the same for any similar ideas. So I think getting down to those is valuable regardless whether this is “exact desirable concept” or not.
And there are 2 different ways to get arguments into the body of lambda/def:
via arguments
capture from outer scope at runtime
You can think of it as “implicitly evaluated on use” functools.partial. Apart from “implicit evaluation on use” it works exactly the same as partial.
from functools import partial
class uncached_defer(partial):
pass
class cached_defer(partial):
cache = dict()
def __new__(cls, func, *args, **kwds):
def wrapper(*a, **kw):
key = id(self)
if key not in self.cache:
self.cache[key] = func(*a, **kw)
return self.cache[key]
self = super().__new__(cls, wrapper, *args, **kwds)
return self
4 cases. All works as expected:
# 1. Using function with all data bound via arguments.
def func(x):
print('Evaluating')
return x
d1 = uncached_defer(func, 1)
d1() # Evaluating, return 1
d1() # Evaluating, return 1
d2 = cached_defer(func, 1)
d2() # Evaluating, return 1 (caches)
d2() # return 1
# 2. Using function which looks up variables in outer scope
a = 1
d3 = uncached_defer(lambda: a)
d3() # 1
a = 2
d3() # 2
a = 1
d4 = cached_defer(lambda: a)
d4() # 1 caches
a = 2
d4() # 1
Try it yourself, there is nothing new here.
It is 3, because you did bind variables explicitly, but you can get 5 if you need as well.
a = 1
b = 2
c = uncached_defer(lambda: a + b)
a = 2
b = 3
print(c) # 5
So in terms of behaviours above, this concept does not invent anything new.
All the behaviour above is the same as if one would be using partial.
Everything works as expected depending on how one uses it.
So to complete the picture, lets add “implicit evaluation on use” part. Here, only __add__ is implemented, but the concept is to overload all “usage” (including __call__) except “assign”.
I gave you an example using eval, and the only difference is that we would not need to explicitly evaluate the expression. As shown in the example below, right? The syntax used is irrelevant; right now, I’m using quotes, similar to string literals.
import math
import time
x = 500_000
result = 0
def factorial(x):
t = time.time()
expr = "math.factorial(x)"
print("math.factorial(x) took:", time.time() - t)
t = time.time()
result = result + expr # implicit evaluation
print("eval(expr) took:", time.time() - t)
print("Bit length of the result:", result.bit_length())
factorial(x)
Let object E be the expression code math.factorial(x). Here are the steps explaining how the proposed feature works:
Store object E in the expr variable without evaluating it.
Evaluate the expression when you use the object in another expression.
Note that the expression code does not isolate the variable values used in the expression. The values of these variables may change after the expression is defined and stored in the expr variable. Therefore, the user must manually decide when to use object E in another expression, which will trigger the evaluation of the expression code stored in object E.
Is this the proposed feature? The caching aspect is intentionally left out of the discussion for now.
Assuming the description of the proposal in my previous post is correct, we can already implement this feature in Python (as demonstrated in your examples as well):
import math
import time
x = 500_000
result = 0
class Expr:
def __init__(self, expr):
self.expr = expr
def __call__(self, *args, **kwargs):
# This method will be invoked when the object is used in an expression.
# In this case, it will evaluate the expression.
return eval(self.expr, {"math": math, "x": x})
# Operator overloading to allow implicit evaluation during arithmetic operations
def __add__(self, other):
return self() + other # Automatically evaluate `self.expr` and add
def __radd__(self, other):
return other + self() # For reverse addition (when self is on the right side)
# Define the expression object
expr = Expr("math.factorial(x)")
def factorial(x):
global result # Access global result variable
# Measure the time taken to compute the factorial
t = time.time()
print("math.factorial(x) took:", time.time() - t)
# Implicitly evaluate the expression when used in arithmetic
t = time.time()
result = result + expr # Implicit evaluation of `expr` during addition
print("eval(expr) took:", time.time() - t)
# Output the bit length of the result
print("Bit length of the result:", result.bit_length())
factorial(x)
What is not possible to implement is telling dict.get that the variable is being ‘used’ when, in fact, it is only being returned. Returning a value or reference does not involve any operation, and therefore does not count as usage.
No it’s not the same. The defer syntax will construct a lambda function immediately. It has 2 obvious discrepancies against your eval analogy:
Syntax errors will be thrown at declare time, not run time.
Name scopes are fixed at declare time. i.e. it will not be fooled by name conflict when you implicitly evaluate it from another scope/module.
If you have to find an “equivalence”, here is the strict equivalent version:
# 1. Syntax sugar version
expr => add(a, b)
# 2. Strict equivalence to (1)
expr = defer.Immutable(lambda: add(a, b))
# 3. Not an equivalence
expr = defer.Immutable(add, a, b)
The 3rd approach is not an equivalence because variable a and b are frozen at declare time.
Yes, the dependent variables may change after a defer object is constructed. The results of deferred evaluation will change correspondingly. But this does not mean you always need to manually control the timing of first observation, especially when you use defer objects as function arguments or return values.
If you have to take a “snapshot” of the current value of a variable, you can use one of these:
a = 1
b = 2
# Approach 1: copy construct argument list
c = defer(add, a, b)
# Approach 2: early bound argument defaults:
@defer
def d(a=a, b=b):
return add(a, b)
a = 100
b = 200
print(c) # 3
print(d) # 3