Backquotes for deferred expression

Neither repr nor str should cause c to be evaluated. In Python’s current execution model, print or any function cannot detect when a variable’s value is being used. Passing c to print() is essentially just another assignment.

Yes it does.

repr is no different than str, int, float etc. All type conversations are considered as observations.

Right. So you can put it into a dictionary, but not as the key, and you can’t put it into a set. That’s ot going to be confusing or anything.

What about calling id(x)? Do you get the id of the deferred object or does it evaluate and give that back?

How many C extensions need to be aware of these subtleties and will break if they don’t?

Please provide a full implementation in Python. Below is a basic, complete, untested implementation:

import ast
import inspect

class Expression:
    def __init__(self, code: str):
        self.code = code
        self.captured_values = self._capture_values()

    def _capture_values(self):
        # Parse the expression to find variable names
        tree = ast.parse(self.code, mode='eval')
        var_names = {node.id for node in ast.walk(tree) if isinstance(node, ast.Name)}

        # Get the current frame's global and local variables
        frame = inspect.currentframe().f_back
        global_vars = frame.f_globals
        local_vars = frame.f_locals

        # Capture the current values of variables
        captured = {}
        for var in var_names:
            if var in local_vars:
                captured[var] = self._dereference(local_vars[var])
            elif var in global_vars:
                captured[var] = self._dereference(global_vars[var])
        return captured

    def _dereference(self, value):
        # Handle dereferencing lists, dictionaries, etc.
        if isinstance(value, (list, dict)):
            return value.copy()  # Return a shallow copy
        elif isinstance(value, (int, float, str, bool, type(None))):
            return value  # Immutable, safe to return as-is
        else:
            return repr(value)  # For objects, return string representation

    def evaluate(self):
        # Evaluate the expression using stored captured values
        return eval(self.code, {}, self.captured_values)

    def __str__(self):
        return str(self.evaluate())


# Example Usage:
x = 10
y = [1, 2, 3]
z = {"a": 5, "b": 8}

code = "x + sum(y) + z['a']"
expression = Expression(code)

x = 100
y = [10, 20, 30]
z = {"a0": 50, "b0": 80}

print(expression.captured_values)  # Captured values: {'x': 10, 'y': [1, 2, 3], 'z': {'a': 5, 'b': 8}}
# print(inspector.evaluate())       # Compute the expression: 21
print(expression)             # Implicit evaluation: 21

If the proposed feature creates an object, it can definitely be implemented in Python. There is no need to implement it in CPython just for a prototype.

Proxy objects written in pure Python have their limitations. You can already find existing python wheels which does the job well. I prefer not to reinvent those wheels.

With that said, some operations are not possible to be proxied by a python object. Namely type() and is statements. I’ve just updated the demo to support is statement. You may find relevant changes here.

REPL example with the latest cpython demo
Python 3.14.0a1+ (heads/feat/defer-expr:688df27f60, Dec  1 2024, 14:02:53) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x => True
>>> x is True
True
>>> x is False
False
>>> x is None
False
>>> type(x)
<class 'bool'>
>>> x is x
True
>>> @defer
... def x():
...     global b
...     b = not b
...     return b
...     
>>> b = True
>>> x, x
(False, True)
>>> x is x
False
>>> type(x)
<class 'bool'>
>>> x is True, x is True
(True, False)
>>> x is x
False
1 Like

That’s a good catch. I’ve updated my demo and it now works transparently on id():

REPL DEMO
Python 3.14.0a1+ (heads/feat/defer-expr:39136869fb, Dec  1 2024, 14:18:59) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> o = object()
>>> x => o
>>> id(o), id(x), x is o
(4369729536, 4369729536, True)
>>> x = defer.Mutable(object)
>>> id(x), id(x)
(4369729632, 4369729648)
>>> x is x
False

Exactly. Desired behavior is to invoke defer.snapshot() (or PyDefer_Observe() in C) when a defer object is used as a dict key or set element. I am still working on this. The current behavior of the demo is not correct. This is another reason why defer object needs support inside cpython core.

REPL DEMO (incorrect behavior)
>>> x = defer.Mutable(lambda: i)
>>> i = 0
>>> d = {x: "hello defer"}
>>> d
{0: 'hello defer'}
>>> i = 1
>>> d
{1: 'hello defer'}
>>> i = 2
>>> d
{2: 'hello defer'}
>>> 0 in d
False
>>> 1 in d
False
>>> 2 in d
False
# Desired output should be [ None ]
>>> [defer.reveal(k) for k in d.keys()]
[<DeferExposed object at 0x104ae12a0>]
>>> s = { x }
>>> [defer.reveal(i) for i in s]
[<DeferExposed object at 0x1047a6470>]

I mentioned some possible solutions for compatibility issues in this post. The most ideal situation will be figuring out a way to hide the changes inside python’s C API (e.g. Py_Is, Py_IS_TYPE(), etc.) so most 3rd party C-bindings will be compatible out of the box.

At what point will a be evaluated, and why?

def func(a):
    return a

b = func(a=expr)

In your case the defer object will remain unevaluated. Returning as-is will not trigger an evaluation.

Thank you! Then my statement is correct:

How will this work with mappings written in pure Python? Or are builtin dicts (and lists?) privileged in this regard? I can foresee a bunch of other cases where special methods and otherwise would need to be able to preserve deferred objects. One example is debuggers, it’d be very bad if attaching them silently starts converting deferreds if it tries to examine locals. I think there needs to be a way to write code which manipulates them while preserving, and therefore some way to handle backwards compatibility there.

“Most” is going to be a VERY hard sell. How many of the biggest and most popular frameworks will not work with your proposal? And will they (a) throw an exception, (b) segfault, (c) silently misbehave, or (d) do strange and inexplicable things depending on the order you do things?

I don’t think you’ve seen ANY support from core devs here, and I suspect that this sort of deep incompatibility may be part of the reason for that. Unless you feel like maintaining a fork of Python that has significantly less useful value than the core (given that a lot of Python’s ecosystem’s value comes from third-party modules), you’re going to have to figure this out.

I am looking for a way to test defer objects against 3rd party wheels. This will involve compiling them locally against the modified python headers. This seems like a really challenging task, I’d appreciate if someone could help on this.

As of what I know, there might be 2 kinds of outcome when passing a defer object into an incompatible C function:

  1. The function throws a TypeError because it does not recognize type Py_DeferObject.
  2. The function works incorrectly because it does not use python’s C API. For example, it writes op1 == op2 instead of Py_Is(op1, op2).

I do not think it could end up with segfaults.

With that said, there still exists a way to make all existing C-bindings work with the new feature - by following approach 2 of the solution I mentioned.

True. This is very unfortunate. I will keep polishing the details and try to come up with a more consolidated specification which addresses most of the concerns.

I hope some coredevs could be interested to sponsor me when I have that done.

Thanks for pointing this out! I am not familiar with debuggers. Can you share a minimal demo when defer objects may cause problems in pdb?

I’m watching the discussion and I’m still waiting for a clear specification (in words, not by examples) of how the feature is intended to work[1]. More precisely, a specification that will answer the questions “how many popular frameworks will not work with this feature?” and “what will they do?”, so that you don’t have to ask those questions of the implementation author, like you just did.

Until there’s a spec that is at least that detailed, the proposal isn’t ready to be turned into a PEP. And until it’s turned into a PEP, it’s not going to become part of core Python. And if I’m honest, I suspect that it won’t be possible to write such a spec without it becoming clear that the proposal isn’t actually going to work. I’m willing to be proved wrong, though.

Full disclosure: even if all that happened, I would still have reservations about this feature. But I might be willing to offer cautious support.


  1. intended to work, as opposed to how the current implementation happens to work - that’s an important distinction which isn’t getting as much attention as it deserves, IMO ↩︎

5 Likes

This isn’t something you should be testing, it’s something you should be specifying. Your current approach of “implement something that seems like it should work and then see how it performs in real-world code” is never going to deliver a workable proposal - people will always be able to invent edge cases, and you can’t propose a language feature where your response to such questions is “hmm, I don’t know, let’s give it a try”.

3 Likes

There’s a lot missing here for me to consider it worth looking into in more depth, but the idea of deferring expressions/scopes/etc until needed is not a new one, and is definitely something we could benefit from more specifications regarding.

I think you can solve the hash issue (id too) by specifying that attempting to hash it means the interpreter should evaluate it before doing so, and separately from this, have a means to directly use the proxy object as a key itself rather than intending on the result of the deferred expression being the key (similar to what exists for weakrefs here, but a little different to ensure old code doesn’t need to be aware of the new construct)

Making the proxy object special does mean it can’t be implemented in python.

I am struggling a lot on this. I think I need a better understanding of how existing wheels “play with” python object pointers so I can make my specification more realistic (i.e. not requiring too much change for existing code bases, and is technically possible). But reading through the source code of all 3rd party bindings and understanding their needs is beyond my capability.

For example, we may specify: “for all C-bings which are incompatible with defer objects, they shall throw a TypeError upon encountering one”. However, we have no control over the behavior of 3rd party C-bindings. Which means such specification will look more like a “recommendation” - no enforcement is possible.

You should specify behaviour in terms of the CPython C API, in the first instance. That is manageable. Once you’ve done that, you need to look at whether the change breaks existing users of the API - it will be, so document how existing code will break, and what it needs to do to fix the breakage. Once that’s done, we need to assess whether the amount of work required of 3rd party modules to fix the breakage is justified by the benefits the new feature provides. That final part will always be a judgement call, but it’s something that can be done if the impact on the C API is properly specified.

It’s not your job to fix (or even understand) all of the 3rd party libraries out there. But it is your job to explain to them what they’ll need to do to fix their code when you break it :slightly_smiling_face:

One other thing you need to consider. As a Python language change, this doesn’t just affect CPython. It will need to be implemented in other implementations as well. So that makes it even more critical that the Python level semantics are clear, and well specified, in terms that don’t rely on the C API to understand.

4 Likes

I’ve been watching this thread here and there, I’m wondering if this works with logging.

There are times I want to log something large which can take a lot of time to stringify.

Take this example.

import logging
var = ["stuff"] * 300000
logger = logging.getLogger(__name__)

# This will JSON stringify the thing right away, even though the logger won't output it. 
logger.debug(json.dumps(var))

# Hopefully with defer we can avoid the expensive JSON? 
logger.debug(defer(json.dumps, var))

(please excuse any syntax errors, I’m on my phone)

3 Likes

In its current state this works with 99% of Pure Python code and large part of C code.

So to answer your question, yes this would work with logging as it is written in Pure Python.

Of course, it is possible to make it not work. But that would require adding new style (in hacky manner) to logging._STYLES, which implements format using C code that would fail.

In short, yes, it would work.
The main motivation is to handle cases exactly like this.

1 Like

Hi, it’s been a while since I last posted. I’ve been too busy to push forward the deferred object idea.

That said, I do have an idea that might make deferred object truly magical: using memory protection APIs to trap pointer accesses, and register a segfault callback to evaluate and fill in deferred results. This will work the same for both Python and C code - without requiring any change to existing code.

And this seems to be widely supported:

  • POSIX provides mprorect().
  • Windows provides VirtualProtect() (according to ChatGPT).
  • Baremetal platforms typically provide MMU/MPU.

I will try to find some time to play with this idea.

1 Like