Builtins.lazy for lazy arguments

dg-pb · May 6, 2025, 9:03am

it is supposed to be a defaultdict of ints, where occasional exception for different default is needed.

this doesn’t work for what you are suggesting (which is not what I meant by my example anyway).

Neither does this.

dg-pb · May 6, 2025, 9:13am

peter:

import some_lib

d = some_lib.their_dict({1:1, 2:2})
keys = [1,2,3]
for key in keys:
  a = d.get(key, lazy(lambda:3))
  if a == 3:
    ...
    #oops this never triggers

The problem I saw with the Backquotes for deferred expression,

I think I misunderstood last time what you meant. If you are referring to Backquotes for deferred expression, it actually has made it work quite nicely.

Even type(deferred_object) does the right thing by modifications in C. It works predictably and intuitively for Pure Python code, but a way to extend this to C extensions has not been found as far as I know, which is a major dealbreaker.

peterc · May 6, 2025, 9:32am

I still stand by what I said. I don’t know how you misunderstood me?

To me the proper way to achieve this with a Lazy is

value = resolve(a.get(key, Lazy(lambda: math.factorial(1000))))

where resolve is a function such that resolve(x) == x if x is not a Lazy object, and resolve(x) == resolve(x()) or resolve(x) == x() if x is a Lazy object.

Explicit is better than implicit. You will get better performance. You won’t get as many intractable bugs. You get more options, because you can choose when to resolve. And you don’t push any responsibility onto other maintainers.

Polars has Lazy objects. They work well and can significantly improve performance. You also have to .resolve() them explicitly. I think they are a good example to follow.

Do you have any examples of big code projects that use lazy objects with implicit resolution?

blhsing · May 6, 2025, 9:40am

This only works for functions that do not ever need to evaluate the lazy object and actually return the lazy object as is.

I think the proposal can be better justified with usage like the examples in my previous post:

# the sun usually rises from the east
assert validate_sun_rises_from_east(), lazy(expensive_verbose_info)
# no wasted call when logging level > DEBUG
logger.debug("Verbose info: %s", lazy(expensive_verbose_info))

dg-pb · May 6, 2025, 9:59am

Ah ok, maybe I misinterpreted my misinterpretation. Never mind then.

Yup, I have explored this path. It is surely a nicer solution. It would be great if it generalized to all cases and a generic wrappers/DSLs could be made with this.

Unfortunately, this does not generalize…

E.g.

def foo(a, b):
    if type(b) is lazy:
        b = b()
    return a + b

This is not a real world case, but just to give an example why external unwrapping/resolving is not fitting as general solution.

peterc · May 6, 2025, 11:06am

Not entirely, though you do have a point

I don’t think it’s too hard to create a Lazy class such that

resolve(x) + y == resolve(x + y)

because the Lazy object stores an initial value and a list of operations. So there are some cases that could be handles where a lazy object is modified before being returned.

That doesn’t help with your logger.

@dg-pb it’s somewhat funny that this example of adding to a lazy object is actually what I was already describing in my answer to @blhsing .
Polars also does a version of this, which is actually where the speedup comes from. You start with a lazy object, assemble a whole bunch of operations, and then when you .collect(), the engine figures out an optimised path to apply those operations.

peterc · May 6, 2025, 11:19am

Trying to come with a constructive alternative for the logger, and situations like that:

Python already has 2 kinds of object that can be lazy: dicts and iterators.

I believe it would be best to leverage these, instead of creating a whole new class of “lazy scalars”.

When you use a function, you enter a kind of contract with the authour(s) of that function. If the function is designed with scalars in mind, and you pass in a lazy scalar, all kinds of bugs can occur. I’m sceptical we can really get a grasp on all the ways this can go wrong, until we actually try putting lazy scalars into polars/pandas/scipy/numpy/sympy/turtle/lru_cache/…

But if lazy scalars are made part of basic python, people will expect to be able to substitute them in functions designed for real scalars.

In contrast, functions that are designed for pre-computed dicts/iterators [largely] already work for lazy dicts/iterators. Just implement a __getitem__ or a __next__ that does your expensive calculation. This is already in base python, so we know which bugs to expect, which aren’t (m)any.

For logger, it would seem reasonable to me to request that they add support for supplying the values in a dict or in an iterator.

And maybe we can get a nice way to construct a lazy dict / iterator. That’s a solvable problem.

The “downside” is that you could not use lazy objects for functions designed for true scalars. To me that is an advantage.

dg-pb · May 6, 2025, 11:56am

IMO, although closely related, this is a separate concept.

I think the cleaner approach is to keep these decoupled:

Call graph construction, optimization, etc (Polars, dask style stuff)
Signalling lazy object

They can be used in conjunction of course. e.g.:

a = <call graph>
value = dict.get('key', default=lazy(a.collect))

blhsing · May 7, 2025, 4:20am

The problem with repurposing an existing type for a lazy object, as already pointed out in Builtins.lazy for lazy arguments - #67 by dg-pb , is that it would then be difficult and awkward to pass that object as an argument for its existing purpose. Doesn’t matter if you’re repurposing an iterator or a lambda as a lazy argument. It becomes awkward whenever one wants to actually pass an iterator or a lambda as an argument.

Having a type dedicated to the very purpose of a lazy argument makes the implementation of every function supporting lazy arguments easy and unambiguous.

Eneg · May 8, 2025, 11:38pm

a = collections.defaultdict(int)
value = a.get(key)
if value is None:
    value = math.factorial(1000)

If None is a valid value, you’d use a sentinel.
Personally I’d even use := here.
Counters are even easier as you can just use 0.

dg-pb:

Rosuav:

But how do you test for the situation where the type exists, yet a specific function may or may not recognize it?
handles_lazy = {}.get('', default=lazy(lambda: True)) is True

This test relies on the fact such call can be done without any side effects.
The same test performed using .setdefault(key, lazy(...)) would necessarily mutate the dict.
Now picture this in context of an API where the side effects are detrimental.

dg-pb · May 9, 2025, 5:59am

Eneg:

dg-pb:
handles_lazy = {}.get('', default=lazy(lambda: True)) is True
This test relies on the fact such call can be done without any side effects.
The same test performed using .setdefault(key, lazy(...)) would necessarily mutate the dict.
Now picture this in context of an API where the side effects are detrimental.

I just responded to this specific case to show that programmatic test is possible.
But I completely agree with you.
Programmatic test is not a good option to test existence of any feature really.

Standard and reliable way to do this is to check the versions which support the feature and which don’t and handle those cases appropriately. And this is no exception.

Yes, as I have mentioned before, for this special case of dict.get it needs to be done with dedicated sentinel:

MARKER = object()
if (value := a.get(key, MARKER)) is MARKER:
   ...

I like this pattern and use it when it works. Although would use lazy instead if it was implemented, and do so for get methods that I implement myself. e.g.:

cm = CustomChainMap(...)
value = cm.get(key, default=lazy(factorial, 100_000))

If MARKER pattern above applied to all cases where optional lazy argument can be useful, then there would be no need for anything else - some general convenience wrapper might be enough.

But it doesn’t scale to all cases. E.g. setdefault you mentioned:

d = defaultdict(int, {...})

if key in d:
    value = d[key]
else:
    value = d[key] = factorial(100_000)

# or possibly a bit more performant version with marker, but still not making use of `setdefault` method.

MARKER = object()
value = d.get(key, MARKER)
if value is MARKER:
    value = d[key] = factorial(100_000)

# as opposed to

value = d.setdefault(key, lazy(factorial, 100_000))

blhsing · May 9, 2025, 6:20am

Yes, testing for support for lazy arguments from built-in functions can be simply based on Python versions. And for user-defined functions, parameters that can be lazy should be typed as T | Lazy[T], with perhaps a convenience type alias of type MaybeLazy[T] = T | Lazy[T].

dg-pb · May 9, 2025, 6:57am

This is primarily aimed at cases of one-to-several lazy values/arguments of most likely different kind.

For a list of homogenous values, iterators is a more appropriate option. E.g.:

evalued = [*values]
unevaluated = map(func, objects)

def any_higher_than_10(iterable):
    for v in interable:
        if v > 10:
            return True
    return False

result = any_higher_than_10(itertools.chain(evaluated, unevaluated))

Another note is that caching would be best left out to keep it as simple as possible.
It can be done orthogonally if needed. E.g.:

value = d.get(key, default=lazy(eval_and_set, factorial, 100_000))

def eval_and_set(func, *args):
    container.attribute = value = func(*args)
    return value

# or

value = d.get(key, default=lazy(my_obj.get_height))

class Table:
    _height = None
    def get_height(self):
        if self._height is None:
            self._height = calculate_expensive_height(self)
        return self._height

xitop · May 9, 2025, 3:02pm

dg-pb:

def dict_get(self, key, default=None, default_factory=None):
   [body deleted]
versus:
def dict_get(self, key, default=None):
   [body deleted]
The former, to me personally, is just not satisfactory. I like my code at the level of use cases that I have to look cleaner than that.

Usage of the former is obvious. In the latter case one needs to look at the docs or the docstring or study the implementation to be sure whether lazy arg is supported by this function.

dg-pb · May 9, 2025, 3:18pm

Nothing more complicated than seeing if an argument accepts None. Just a standard investigation which I believe everyone does before using any function.

blhsing · May 9, 2025, 3:23pm

Like I suggested above, all parameters that can optionally support lazy objects should be typed with a generic alias like type MaybeLazy[T] = T | Lazy[T].

pf_moore · May 9, 2025, 3:29pm

Having a language feature gated behind the use of type annotations is a major change in Python’s design philosophy. Even dataclasses can be used without type annotations, if you want to.

I’m not saying that it’s impossible (maybe typing is now ubiquitous enough that it’s time to allow features to not support untyped code), but it would need to be agreed as a matter of principle, with consensus from the core devs/SC.

Going against the “typing is optional” principle in an ordinary feature proposal is a good way to get your proposal rejected, I’m afraid.

dg-pb · May 9, 2025, 3:31pm

Also, think function with 3 lazy arguments:

def foo(a, b, c, a_is_lazy=False, b_is_lazy=False, c_is_lazy=False):
    ...

For such, not only there is a signature bloat, but also extra inconveniences of using such. I.e. if arguments are delegated, then extra checks are needed before calling and appropriate bools passed in.

With lazy arguments can be wrapped in advance and they will be treated appropriately when they reach the function. Otherwise, one needs to store tuples:

args = [(a, a_is_lazy), (b, b_is_lazy), (c, c_is_lazy)]

If callables have args and kwds, then this becomes:

args = [(a, a_is_lazy, a_args, a_kwds), ...]

It is pretty much the same thing, just makes signatures cleaner, usage more convenient and offers standardised object to be used for such cases.

dg-pb · May 9, 2025, 3:31pm

Wrong thread Paul

pf_moore · May 9, 2025, 3:46pm

I was directly replying to a comment suggesting that you should use a type to flag parameters that can be lazy. Maybe I misunderstood the intent of the comment, but I was posting exactly where I meant to

(I’ll concede that at this point I’m skimming this thread - there’s too many contradictory proposals, and too much rehashing of old arguments, for me to even want to keep up with the details).