It doesn’t matter, so long as you can unpickle it to get back the original deferred expression.
I am up for collaboration, but then we need to:
- Agree on what we can agree
- Limit the scope to a manageable one
- Start tackling the issues that neither of us have answers to
Regarding 1. What I have in mind and I have reasons why, which I can explain:
- signature:
lazy(func, *args, **kwds)
- in line withpartial
. Also, various fast track performance improvemnts can be borrowed from it. - potentially serialisable: see
partial.__reduce__
for example - cached version best be a subclass e.g.
cached_lazy
, instead of keyword.
Regarding 2.
- No syntax changes
Regarding 3. These I have no idea.
- Code location? Need to find good strategic place for it.
is
,type
behaviour?- etc…
But we need to discuss and figure out all of these these points 1 by 1 until all concerns and WHYs are clear.
After/if this is implemented, you can go ahead experimenting with syntax changes, but these are not needed to implement initial version and there is no reason to overcrowd already sufficiently involved work.
One thing that I’d find exciting is a new way to write lambda functions. But I mean real lambda functions, that are manually called, not implicitly evaluated.
Specifically because of the context of piping (another ongoing discussion in the ideas forum). And because you could generalise the concept to include 1-argument lambda functions.
For example, adapting one bit of example code from there to use the “`” notation:
Q = 'abcde'
|> `batched(_, 2)`
|> `map(''.join, _)`
|> list
assert Q == ["ab","bc","cd","de"]
Or instead with using @
for function composition:
Q = list @ `map(''.join, _)` @ `batched(_, 2)` ("abcde")
compared with the current lambda functions:
Q = list @ (lambda P: map(''.join, P)) @ (lambda P: batched(P, 2)) ("abcde")
compared with the “normal” python
Q = list(map(''.join, batched("abcde", 2)))
(This is not the best example to demonstrate why piping might be useful, but I think it shows how better lambda functions could play a role in piping.)
Another thing is that I think deferred expressions are cute, but I’m pretty sure I’d want to manually control when they’re evaluated.
Eg I would immagine:
a = 3
b = "abcdef"
x = defer: b[a]
assert type(x) == DeferExpr
y = x * a
assert type(y) == DeferExpr
assert y() == "ccc"
a = 4
assert y() == "ddd"
finally there’s late-bound function arguments, which I mentioned earlier, and care a lot about.
I don’t see how deferred expression is of any help here.
This concept is specifically about not needing to make a call to evaluate function.
What you are referring to above is a graph building framework. See Dask Delayed — Dask documentation
These, IMO, should be independent and can be used in conjunction if needed:
- Build a delayed evaluation graph
- Wrap it in
lazy
for implicit evaluation if desired
Yes @dg-pb, these were possibilities linked to the OP, and are not related to the direction this has gone into (deferred expressions). That’s why I didn’t want to mention it. But @zhangyx asked, and I answered his/her question.
No, what I was saying is that there is no way for the interpreter to tell that it is not supposed to evaluate a deferred expression when it is operated by a wrapper function.
For example, in the lottery drawing code below, if the @add_one
decorator were not applied, the deferred expression payout
, which contains a call to an expensive function expensive_calculation
, would not be evaluated as winning_amount
unless the lottery
function rolls a 9 from a random range between 0 and 9.
But if @add_one
is applied then expensive_calculation
gets called every time whether lottery
rolls a 9 or not because the deferred expression is evaluated upon its first usage in the wrapper function when it calls func(x + 1)
, which is undesirable:
def add_one(func):
def wrapper(x):
return func(x + 1)
return wrapper
@add_one
def lottery(winning_amount):
if random.randrange(10) > 8:
return winning_amount
return 0
payout = `expensive_calculation()`
print(f'You won ${lottery(payout)}!')
A solution then is to wrap the argument in another deferred expression:
def add_one(func):
def wrapper(x):
return func(`x + 1`)
return wrapper
But that means every argument in every call in all existing code needs to be revisited to decide whether it needs to be revised with a deferred expression, which is untenable.
I considered to make defer expressions “infectious”. That is, instead of immediate evaluation, operation on a deferexpr will result in a new deferexpr.
However, this “evaluation graph” has already been addressed by 3rd party wheels such as dask
:
Bundling this feature into DeferExpr will make it too heavy to be accepted. But for sure DeferExpr will make existing wheels work more like a normal variable.
BTW, you said that existing code needs to be revised because of this new feature. My answer is “yes and no” - they can opt-in to this new feature if there is enough performance gain to justify the change, but they don’t have to. The new feature will not break anything that already works.
With that said, I am open to any possibilities. My last commit to the cpython demo just supported both “collapsing” and “non-collapsing” mode for a DeferExpr.
By the way, I am searching for other languages which have similar features to this proposal. I haven’t found any.
If there is anything you know, please feel free to share it.
Sorry I accidently hit cmd
+ enter
while typing, I deleted the half done post. Here is the complete version:
A brief summary of new features in the demo for those who want to try it out:
-
builtin function
snapshot()
:Evaluates a DeferExpr immediately and return a non-DeferExpr result. If a DeferExpr has already collapsed, use the cached result.
If
snapshot()
gets a variable that is not a DeferExpr object, return as-is. -
builtin function
expose()
:Returns an exposed version of a DeferExpr object. The exposed version looks like following:
class DeferExprExposed: collapsible: bool # When set to True, will cache the # first evaluation and reuse it onwards. callable: Callable # The expression result: Any # Cached result # AttributeError if not available.
If a non-DeferExpr value is passed to
expose()
, it will returnNone
. -
builtin function
freeze()
:# Actual logic written in C # showing Python pseudo code def freeze(x): if expose(x) is not None: expose(x).collapsible = True return snapshot(x)
This is not a complete specification for the proposal. I need much more time for that.
An implicit preservation of deferred expressions wouldn’t work unless you have another syntax to explicitly indicate when a deferred expression is to be evaluated, in which case it will still be necessary to modify all the existing code base everywhere where actual values of deferred expressions are needed, which again is impractical.
I never said it will be practical to allow defer expressions to propagate. Nor did I proposed it to work this way. This is too far off topic to this thread.
The argument you made earlier can also be used to prove how much possibility this new feature will open up. Optimizations that were otherwise unthinkable is now within your reach - that’s why you think a lot of existing code suddenly needs a revise.
At this point, it seems to me that the syntax already exists, as it’s exactly what a function (or a closure) object is for. You can pass it around, and add parentheses when you need to evaluate it.
and add parentheses when you need to evaluate it
This is what makes it different. You can use it as if it was a variable, not a function.
This exact conversation has already happened many times in this thread.
I should have quoted this:
An implicit preservation of deferred expressions wouldn’t work unless you have another syntax to explicitly indicate when a deferred expression is to be evaluated
In this case you need to differentiate between a normal variable and a deferred expressions, and you need a way to indicate its evaluation. Might as well use a function.
I agree that functions don’t replace deferred expressions in the case that you proposed initially, sorry for the confusion.
What if you want to return a pending deferred expression from your API? Do you want all of your users to manually check if you returned a function or a value?
What if your return can either be (1) a normal value (2) a function and (3) deferred expression of a function? How do you distinguish if a function is a direct return value or if it should be called to retrieve the actual value?
def lottery(sequence: list[int] | None = None):
if sequence is None or len(sequence) == 0:
result => random(seed)
else:
result = sequence.pop(0)
return result
you need a way to indicate its evaluation
For most cases this is not necessary. A deferred variable will automatically evaluate itself when it’s being observed on. As defined above: an observation is anything other than direct assignment or argument passing.
For example:
# Suppose these are costly operations
x => rand(seed)
y => x + 1
z => x + y ** 2
# Nothing happens till now
# Ask z to collapse upon first evaluation
# (optional, depends on use cases)
expose(z).collapsible = True
# Any of these triggers a evaluation (only the first time):
print(z) # repr(z) under the hood
str(z) # type conversion
np.array([z]) # (same) type conversion
z += 1 # numeric operation
z.add_one() # (pseudo) accessing attribute or index
In comparison, if you use lambda functions:
x = lambda: rand(seed)
y = lambda: x() + 1
z = lambda: x() + y() ** 2
# Now you need to manually check and reassign z
if callable(z):
z = z()
If you are dealing with concurrency or asyncio, and for some reason you really do not want a variable to change itself across the function scope, you can either enforce x = snapshot(x)
or use type annotations in your argument list to generate a type checker warning (this is part of the original proposal).
On the user side (who created a deferred expression), they can set collapsible
to True
so the same deferred expression will be evaluated only once, and behaves like an immutable variable on subsequent observations.
For most other cases, you should be able to use defer expressions as if they are normal variables.
That definition is far too vague. What counts as a “thing”?
Consider this function:
def add_ints(i, j):
return i + j
Presumably + counts as an operation that evaluates deferreds. But what
if this function is implemented in C in someone’s extension module? Does
it behave the same way? Does it depend on whether the implementation
uses PyObject_Add or does something lower-level?
But what
if this function is implemented in C in someone’s extension module?
Thanks for pointing this out!
By definition C-extensions do not need to explicitly check for a DeferExpr. I am investigating on how much of a promise we can keep here (i.e. use Python’s C API to abstract away observation
) so minimal changes are required for extensions.
In worst case scenario where such promise cannot be kept, C-extensions can use a one-liner to ensure a non-DeferExpr object. This API has already been delivered in the demo:
A special case is serialization (brought up by @dg-pb). For this case a DeferExpr object should be serialized as-is. It should be de-serialized into the same DeferExpr. I am not familiar with serialization so I need more time to investigate on this (to figure out how much can be done here).
I would really appreciate insights or help from an experienced CPython dev on this! Implementing a new feature while still learning cpython’s basics is really challenging.
A misunderstanding here: I thought pickle is a binary version of dis.dis()
. Turns out it does not work like that (i.e. it does not freeze the entire interpreter state, nor does it store nested function objects).
I’ll revise that part and come up with an alternate version of specification.
It should be de-serialized into the same DeferExpr. I am not very familiar with serialization so I need more time to investigate on this (to figure out how much can be done here).
This will inevitably have exactly same issues as lambda
, which is not serialisable in its inline form.
I think, what you are doing would be very useful for localised user defined expressions.
E.g.:
def ternary(cond, if_true, if_false):
return if_true if cond else if_false
d = ternary(a < 1, defer f1(), defer f2())
So ideally, if that is the concept, its performance should be competitive to actual if_true if cond else if_false
.
Then this would be an excellent tool for local ad-hoc language constructs, same as if_true if cond else if_false
. I.e. you don’t bring it with you. It is very short lived.
For possibility to be more than that, it would need to be able to source arbitrary function with arguments, thus would most likely need to give up some of the performance benefits and in turn it would not be attractive anymore for the case above.
Good analogy is partial
vs lambda
.
lambda
definition is much faster than partial
with similar __call__
times.
You are doing lazy-lambda
, and I think there is more value (although not as fun) in lazy-partial
(which would inevitably be able to receive lambda
as well - just would not be as performant). Furthermore, lazy-partial
does not need new syntax to be useful.
If I want to have fun with ad-hoc lazily evaluated expressions, then lazy-lambda
is the tool and if I want to source lazy object to defaults of unknown function and sleep well, then lazy-partial
is needed.
My bet is that in current situation lazy-partial
has a chance due to its usefulness in real-life applications, while I don’t have strong arguments for lazy-lambda
apart from it being super fun.
I don’t think it is not useful, I just think Python is not yet ready for lazy-lambda
, given it is exotic feature (much more than lazy-partial
), which needs a lot of time to develop into something attractive.