Builtins.lazy for lazy arguments

1. Backstory

There have been many attempts to deferred evaluation so far and as far as I have seen the following 3 cases are at least in the top 5 that it is aiming to address:

  1. lazy imports
  2. evaluation graphs
  3. lazy defaults

Although lazy imports (1) can be addressed via deferred evaluation concept, it is most likely not the best path as they are subject to import machinery and there are likely nuances which could not be addressed with deferred evaluation.

Although evaluation graphs (2) could be done via deferred evaluation, they have many possible features that would be difficult to incorporate into deferred evaluation approach. Such features are (but not limited to) manipulating graph before evaluating for optimization purposes (e.g. dask), parallel execution. Thus, again, such, in my opinion is a separate case, which would most likely require its own thing if was to be addressed in standard library.

Lazy defaults (3) would be perfectly handled via deferred evaluation. However:

  1. The issues of the last (and pretty much the only proper) attempt has pretty much nailed everything down that was manageable within reasonable effort and is now stuck on key issues, which are not straight forward to address. See: Backquotes for deferred expression
  2. It is a major overkill to implement deferred evaluation for this sole purpose.

2. Proposal: builtins.lazy for lazy arguments.

So regardless of my opinions above about (1) and (2), this proposal is to address lazy defaults (3).


The aim of this is to have more convenient way to do:

FAIL = object()
result = {}.get('a', default=FAIL)
if result is FAIL:
    result = math.factorial(100_000)

, which, as far as I have seen, is currently most natural and robust approach, which can be used in any place.


I have used this for a fair while and will continue using it as it does deal with the problem well with minimal complexity.

Suggestion is to implement builtins.lazy as:

class lazy:
    def __init__(self, func, *args, **kwds):
        self.func = func
        self.args = args
        self.kwds = kwds

    def __call__(self):
        return self.func(*self.args, **self.kwds)

So it can be recognised in various places across the standard library and open source packages as:

class dict:
    def get(self, key, default=None):
        if key in self:
            return self[key]
        elif isinstance(default, lazy):
            default = default()
        return default

result = {'key': 1}.get('key', default=lazy(math.factorial, 500_000))
print(result)    # 1

Also, it need not necessarily be in builtins, it could as well be in functools, but if this was to be implemented to say dict.get, then builtins seems a bit more natural place for it.

3. Alternatives

3.1. Just use lambda

Although this works in many cases, this is not robust approach for libraries that implement generic tools. E.g. such would not be suitable to implement for defaults.dict.get. Reason being is that this prohibits default to be lambda, which:
a) breaks backwards compatibility if was to be implemented to existing methods
b) is just not a good idea, because why should lambda be incorrect default value? E.g.:

class dict:
    def get(self, key, default=None):
        if key in self:
            return self[key]
        elif callable(default) and default.__name__ == '<lambda>':
            default = default()
        return default

callback_dict = {'a': lambda: 1, 'b': lambda: 2}
callback = callback_dict.get(default=lambda: 3)
print(callback)    # 3, while I would like to get back (lambda: 3) as it is.

3.2. Let users define one for themselves. Why implement it to standard library?

Few reasons:

  1. Consistency - easy to learn and remember
  2. If it is not implemented into standard library, standard library objects will not have this (which is an important part of this)
  3. If it is not implemented in standard library and its objects, it is unlikely to become a standard practice.
  4. For some cases pure python class might be a bit too slow:
d = {}
%timeit d.get('k')                       #  32 ns
%timeit lazy(math.factorial, 100_000)    # 321 ns
%timeit partial(math.factorial, 100_000) # 181 ns

So the object construction should ideally be as efficient as possible and being 10x or even 5x slower than dict.get might not be attractive for cases of frequent/iterative usage with high hit ratio.

The fastest one that is available is partial. Which is the one that I am currently using as:

class lazy(partial):
    pass

However, dedicated implementation would be significantly faster as partial does many things that this does not require. I suspect it can be made to be not that much slower than:

%timeit object()    # 70 ns

I would say <= 100 ns is fairly likely outcome. The __call__ would be faster than the one of the partial as well.

3.3. Some generic DSL

While it can be made generic via some DSL (e.g. along the lines of some ideas in DSL Operator – A different approach to DSLs

result = lazy(t'{d.get}({key},default={lambda: math.factorial(100_000)})')

), such approach is more suitable for domain specific DSLs as opposed to widely used features.

It is also unlikely it would be possible to achieve good performance via this approach.

3.4. Utility function for 1 lazy default argument

This is a possibility. Instead of a “builtin flag object”, a some functools.lazyargs could be made. E.g.:

class lazydefault:
    def __init__(self, arg):
        self.arg = arg
    def __call__(self, func, *args, **kwds):
        FAIL = object()
        if isinstance(self.arg, int):
            args = list(args)
            dflt = args[self.arg]
            args[self.arg] = FAIL
        else:
            dflt = kwds[self.arg]
            kwds[self.arg] = FAIL
        result = func(*args, **kwds)
        if result is FAIL:
            result = math.factorial(100_000)
        return result
        
lazydefault(1)({}.get, 'a', lambda: 1)

The benefit of this is that it could be used in any place without changes to methods. However, it isn’t as convenient as proposed approach and implementation is fairly hefty in comparison.

Also, this does not offer intrinsic capability to bind arguments.
Thus, even if this existed, builtins.lazy would still be complementary. E.g.:

lazydefault('default')({}.get, 'a', default=lazy(factorial, 100_000))

Although performance of such implementation is unlikely to be very good, similar utility can be useful as it could be used for methods with defaults, where builtins.lazy has not yet been implemented.

Finally, this approach only covers one case - “one argument of default return value”, while builtins.lazy is a more general concept, which, although needs to be implemented to specific case, is suitable for arbitrary number of lazy arguments and is not bound to specific use case.

4. To sum up

This proposal offers a simple method to generalise the case of lazy arguments.
It also hints at possibility of incorporating this into existing methods of standard library objects, such as Mapping.get.

Would be interested to hear what others think abut this.


Related:

I find the usefulness quite limited, because lazy can wrap only a single call. Even a small change is problematic:

v = dct.get(k, default=lazy(factorial, 100_000) + 1) # error

Expressions would require lazy-aware versions of operator.XXX (and some infix → prefix rewriting):

v = dct.get(k, default=lazy(operator.add, lazy(factorial, 100_000), 1))

I would then prefer:

v = dct[k] if k in dct else factorial(100_000) + 1

Good point, but lazy isn’t purposed for that. Its sole purpose is to wrap a callable and signal argument being lazy. It can be used in conjunction with other tools to achieve what you have pointed out. Something along the lines of Evaluation graphs would do. E.g.:

v = dct.get(k, default=lazy(dask.deferred(factorial, 100_000) + 1)

Or given the fact, that this case is trivial, simple lambda can do:

v = dct.get(k, default=lazy(lambda: factorial(100_000) + 1))

That is also an option.
In this specific case one could do with something like a pipe instead (see: `functools.pipe`. It didn’t go through though):

v = dct.get(k, default=lazy(pipe(factorial, partial(add, 1)), 1))

But the main point is that for complex cases there are many tools to be used in conjunction.

This is a good option, however it only works for this specific case of “default return value argument”, but lazy is more generic, e.g.:

def ifelse(cond, a, b):
    result = a if cond else b
    if isinstance(result, lazy):
        result = result()
    return result
1 Like

Also, this can be expensive for some complex object, where checking containment is costly, while SomeContainer.get is optimised via try-except or some other appropriate manner.

Also, if default = lazy(callable) can be pre-stored and re-used it would always be most performant solution. Checks would most likely be type(default) is lazy as opposed to isinstance(default, lazy), thus there would be very little performance cost in Python and undetectable slowdown in C.

Another use-case, which in my experience comes up even more often that get is setdefault. Well, I use it less than get, because I aim for some other workaround given no elegant pattern exists. E.g. coll.defaultdict does the trick for some cases.

With lazy, I would probably stop using defaultdict alltogether. 1 less import and makes things more explicit and less magical:

d = dict()
default = lazy(list)
setdefault = d.setdefault

for i in range(100):
    setdefault(i, default).append(i)

Reasonably performant and requires no imports.
Also more flexible as can have different defaults in different places.

I don’t see people currently using functools.partial for this, and I can’t imagine that the first thing people think of when they need this functionality is “I’d use partial, but that extra 100ns puts me off”. So I doubt people are going to find lazy that compelling an improvement over partial.

To put this another way, this proposal feels like a bad case of premature optimisation to me.

5 Likes

It isn’t really that.

I add benchmarks because speed is important to me. Although speed is not the main driving factor, there is often a sensible performance below which proposals do not make sense to me.


This is more about having a lazy flagging mechanism at a low level, which can be standard for both standard library methods and for external adaptation.

I use class lazy(partial) for stuff that I can and am quite happy with it.
And I wouldn’t propose this if I was able to use it with dict.get, had it in builtins and it was a standard for other libraries as well.

IIUC you say that this can be used to inplement lazy imports, deferred evaluation, and lazy defaults but I don’t see examples of how this mechanism can be used, leading me to think the same way as Paul does, why is this necessary when the other mechanism exist.

Can you provide code showing how you expect this mechanism to be used in code? Without examples it’s very hard for me to understand what the actual purpose of this mechanism is.

I think this proposal fails to solve the pitched problem.

One can already write

def f(x, y=lambda: []):
  ...

The primary problem is that the function doesn’t know whether y is a variable or a nullary function that needs to be called.
In places where you really really need this behaviour, you can either force the user to call the function as

f(1, lambda:[2,3])

or you put something into the function like

def f(x, y=lambda: []):
  try:
    y = y()

I’ve not yet run into a situation where those complications are truely worth it over def f(x, y=()) or def f(x, y=None).


In matlab it’s now possible to write the equivalent of

def f(x, y?= [x]): ...

but in Python you can’t neither reasonably write def f(x, y=lambda: [x]) nor def f(x, y?=[]).
getting one-upped by MATLAB like this is embarassing :smiling_face_with_horns:

My apologies, maybe I shouldn’t be writing backstories with more information than necessary, but to me it seemed relevant to have a full picture and reasoning.

This is only about standard mechanism for “lazy arguments” / “lazy object signaling”.

To clarify, I think that the lazy class you pitched is still a nullary function, and therefore has all the same disadvantages using lambda or partial does.

The point of the idea is that with other solutions you might not know whether the default was actually meant to be a lamdba, say. Hence the introduction of a new construct to make the intent clear.

1 Like

Well, you can not as it is not robust in places where y can be anything.

See section “3.1. Just use lambda”. I think it explains why lambda or other callable is likely to cause issues at some point in the future.

An alternative is to have extra argument lazy=True/False, but that is much less elegant, more verbose and less performant as for cases with arguments one will need to bind them with partial.

Also, this can already be done and rarely is.
Not sure why others don’t do it.
But I was using extra argument at first for a fairly long time and it just did not feel right.
Thus, eventually, I switched to lazy.

Been using it for a while now and it did stick well, thus thought maybe it is worth proposing for a wider use.

It is fully backwards compatible, intuitive, simple and properly solves the case - at least this is the way it feels to me personally at this point in time.

And I guess I wish I was able to use the same lazy in all appropriate places.

To me this proposal is not particularly interesting without being more developed, as it seems to me that the available mechanisms work better than this idea. I think it would better guide the discussion to think about how it would be used in practice and come with a rough but concrete proposal. Now, anybody will insert whatever they think a lazy mechanism is into the discussion and it will go a bit all over the place I think.

Could you please elaborate?

That hasn’t happened yet, maybe let’s wait and see?

This is pretty simple and condensed into smallest possible form - in my world this is the result of a lot of failed experimentation - some approaches too inconvenient, some too complex for the problem, some don’t scale well, etc…

It is what it is.

Apart from coming up with something else or just not doing it at all, I don’t think there is much else to develop here.

But you still need to do something like

def f(x=lazy(list)):
  if isinstance(x, lazy):
    x = x()

right?

For me that’s not particularly interesting, because you’re not saving any boiler plate compared with using None as a default.

I get that you work with applications where using None as a default isn’t acceptable, but I think the rule of thumb is that in order to add something to the standard lib, it either has to improve the lives of lots of programmers, or improve things a lot for some programmers? Neither of those seems to apply.

I would be very excited for something that allowed us to get rid of the boilerplate.

1 Like

Because this proposal, as I read it, mostly states “I want a mechanism for lazy evaluation as long as it’s not the one of the ones we have already”, and I’m perfectly happy with the ones we have, my conclusion is that the ones we have is enough. The proposal does explain in a satisfying way how we can improve the status quo because to me there is no real proposal. It’s not that this is the idea in its most basic form, it’s that it’s not really fledged out beyond a wish at this point.

So which ones do we have and how are they better?

Showing why I don’t see the point based on the example the OP seems to find the most important:

FAIL = object()
result = {}.get('a', default=FAIL)
if result is FAIL:
    result = math.factorial(100_000)

would become?:

result = {}.get('a', default=lazy(math.factorial(100_000))
if isinstance(result, lazy):
    result = result()
1 Like