Proposal: Reactive (event-based) Python

In spreadsheets, a cell can contain a value (text or string), or a formula. A cell value is a formula if preceded by an equal sign (i.e. ‘=SUM(A2:A10)’). In a formula, you can reference other cells, use function calls, and even have conditionals (like ‘=IF(C2=”Yes”,1,2)’. The main advantage is that when one of the referred variables changes value, all the formulae that depend on it automatically recalculate, and the results cascade on down.

My idea is to create a class of python variables – let’s call it Formula – and implement a similar functionality but in a more pythonic way. For example:

>>> a = 10
>>> b = 20
>>> c: Formula = "a + b"
>>> print(c)
30
>>> a = 30    # <- change value
>>> print(c)
50            # <- formula auto-updates
>>> print(c.formula)
a + b

Another example might be (assuming there were a few other variables):

...
>>> z: Formula = "sum([a, b, c, d, e, f])"  # <- contains function
>>> print(z)
567

If any of the referenced variables were changed through assignment, the value of z would automatically change. An example of this is the notion of “reactive” or “bound” variables in UI frameworks like React or Vue.js.

Note, that in the above example, z references c which itself was a formula. So changes to c would cascade to z automatically.

As always, there are edge cases to consider. Invalid values, like adding strings and numbers, obviously. But also notice that BOTH c and z reference a. So a change to a would update c then evantually cascade to z which may end up getting evaluated twice depending on the evaluation order. An optimization might be to catch these and do them just once. In the spreadsheet universe, they also check for circular references and signal them as errors. This could be done at assignment or at runtime during evaluation.

A simpler way to define these could be as a formula-string or x strings (since f is already taken), so you could have something like:

z = x"sum([a, b, c, d, e, f])"

As an abstraction, formula evaluation could be extended to event dispatching. Supporting bound variables would make it easy to create event-based applications and also pub/sub type models when a value changes.

For pub/sub, multiple “watchers” could subscribe to a single value then automatically get notified when the source value changes. This could be used, for example, to dispatch multiple events when an http networking request comes in by watching the variable that stores the request. Or you could easily build a UI dashboard that updates its values automatically whenever the underlying data changes, without having to write any additional code.

Dispatching of events to support evaluation could be performed sync or async, as needed. The abstraction could allow a plugin system with stages like:

  • subcsription
  • change detection
  • evaluation
  • transform
  • update
  • dispatch/pub

Under the hood, you could patch into the event stream and watch, debug, or override what is happening.

If you want to follow the spreadsheet model further, a variable can reference a different tab or have a named alias so the reference doesn’t need to be A10. In the python world, we might allow referencing values in other modules or use namespaces or aliases. But these could come later.

I started a POC implementation of the basic concept using SymPy for symbolic evaluation. Started off with eval first and even went down the rabbit-hole of AST parsing the string formula. But I needed a bootstrapped symbolic math library, to make it look more like spreadsheets, and SymPy offered a good shortcut.

Problem is, the syntax quickly became too cumbersome to implement cleanly. There are issues overriding the assigment operator to a string so you have to come up with strange workaround syntax. Have tried overriding class setters and getters, but am not happy with how the syntax gets complicated pretty quickly. This feels like something fundamental that should be very cleann and simple to understand, like the concept of a spreadsheet cell. Adding too many steps to get it going makes it harder to approach.

The easiest, cleanest version would be the x operator, but that would require a change to the underlying plumbing evaluation engine, hence this posting.

Is this way off-base? Any alternatives or existing solutions? Drop it? Keep going?

All feedback appreciated.

P.S. Am happy to post up what I’ve done so far, if it helps. It’s not complete and is pretty brittle in its current state. I thought I’d post to Ideas first (as recommended in PEP 1) before going much further. If more appropriate to post to a different topic, please advise. Thanks.

Hi, yeah, this is something that people were trying tackle for a long time.

The concept is simple.

One can simply use lambda or partial, but then explicit calls are needed.

To make it so that explicit calls are not needed is a steep climb.
See: Backquotes for deferred expression for latest progress.

By the way, by now, the concept of this is pretty clear and implementation is 99% functional for Pure Python code. But it is stuck on how to make it work for C-level code without breaking a lot of stuff, which is a tricky one.

This is absolutely implementable with attributes of an object using e.g. an overwritten __setattr__, or, more simply, @property. (see this interesting reddit post for a very cursed implementation)

I don’t think there is much value in generalizing this to all variables in all scopes and namespaces. Just keep the variables you want to be tied together close to each other in the same object and you can get a decently manageable system.

1 Like

A quick search turned-up an old spreadsheet recipe that implements your eval approach. It works fine but isn’t safe if the spreadsheet inputs are untrusted.

from math import sin, pi

class SpreadSheet:

    tools = dict(sin=sin, pi=pi, len=len)  # Put your math library here

    def __init__(self):
        self._cells = {}

    def __getitem__(self, key ):
        return eval(self._cells[key], self.tools, self)

    def __setitem__(self, key, formula):
        self._cells[key] = formula

    def getformula(self, key):
        return self._cells[key]


if __name__ == "__main__":

    Sheet1 = SpreadSheet()
    Sheet1["a1"] = "5"
    Sheet1["a2"] = "a1*6"
    Sheet1["a3"] = "a2*7"
    print(Sheet1["a3"])

    Sheet1["a2"] = "a1*8"
    print(Sheet1["a3"])

    Sheet1["a1"] = "10"
    print(Sheet1["a3"])

    Sheet1["b1"] = "sin(pi/4)"
    print(Sheet1["b1"])
    print(Sheet1.getformula("b1"))

Check out ipyflow and marimo.

c: Formula = "a + b" doesn’t feel very Pythonic—it’s more like Excel. Why are you using strings for this?

Thank you for the link. The DeferExpr mentioned there is in the same neighborhood of what I had in mind. In my own POC, I ended up overloading => for assignment as well. I also started down the eval() route, but it seemed too dangerous and open ended so I headed toward sympy.

The backtick proposal in the link will likely break a lot of Markdown. F-string like syntax would be much cleaner, but obviously a bigger change.

My proposal is to combine late-eval with event handling and pub/sub. This would mean creating something like a DAG and then throwing events as you traverse the graph to publish changes. I think both need to be there so you not only have spreadsheet-like syntax, but also can monitor changes and invoke events based on those changes. To make it happen, you need to have an event dispatcher sitting behind the scene and monitoring and dispatching changes.

It would automatically give you cascading, so a cell expression could depend on another cell expression and so forth.

There’s also a concept of push vs. pull. If an upstream value changes, should it “push” the change to all dependent expressions (cascading all the way down), or should an expression “pull” the latest version of the values upstream each time you access it. My own sense is it should be “push” to trigger event publishing.

Maybe I’m missing it, but I didn’t see events addressed in the DeferExpr implementation. Also, it seems to be mostly “pull” based.

The use-case example is really simple. A spreadsheet, with a chart based on cell values. Change a single number and all the cells and charts and everything else that depends on it auto-update. This means no performance penalty during runtime when accessing the values and no significant change to python internals. Obviously, the underlying pub/sub mechanism could be used for a lot of other use-cases.

c: Formula = "a + b" doesn’t feel very Pythonic—it’s more like Excel. Why are you using strings for this?

Mainly to distinguish between variables that are formulae and those that should be evaluated immediately without breaking python. An alternative example might be:

c = binding(a + b)

But the interpreter would evaluate a+b before passing it to binding().

Having it be a string signals that it shouldn’t be evaluated immediately. Like:

c = binding "a + b"

Or something like that. Another option is to override certain operators (like ‘=>’) but again, you don’t want the right-hand side evaluated immediately. The cleanest would be f-string like syntax, but that might require deep changes to the python core.

Would this approach work?

sheet = [
    [1, 2, 3, 4],
    [5, 6, 7, 8]
]

c = lambda: sheet[0][0] + sheet[0][1]
print(c())

sheet[0][0] = 3

print(c())

Check out ipyflow and marimo.

Thank you! Those both are really close to what I had in mind. But obviously, in regular python. I also wanted to expose the evaluation engine’s events so they could be used for display and debugging as well as an in-app pub/sub dispatcher.

I’ll dig into both and see if they already offer that. Thanks again.

Would this approach work?

It could work, but I imagine having cascaded dependencies could become unwieldy really quickly. Also, it would put the burden of evaluation on the print(c()) call (“pull”), whereas it would be more efficient if it was done at assignment time (sheet[0][0] = 3) i.e. “push”.

That way, you could implement pub/sub and not have every subscriber pay the penalty of having ro run potentially nested lambda evals.

I had seen that, and I did implement an eval version first, but switched to sympy for the same safety reasons you point out.

Also:

Sheet1["a1"] = "5"

seems a bit unnatural.

Interesting / clever things can be done with classes. e.g.

def main(): 
    s = Spreadsheet()
    s.a = 7 
    s.b = 3 
    s.c = s.a + s.b
    print(s.c)  # → 10  
    s.a.value = 12
    print(s.c)  # → 15

Which is implemented by overloading setattr / getattr dunders:

class Value:
    def __init__(self, val):
        self._val = val

    @property
    def value(self):
        return self._val

    @value.setter
    def value(self, new_val):
        self._val = new_val

    def __add__(self, other):
        return Formula(lambda: self.value + (other.value if isinstance(other, Value) else other))

    def __radd__(self, other):
        return Formula(lambda: (other.value if isinstance(other, Value) else other) + self.value)

    def __repr__(self):
        return str(self.value)

class Formula:
    def __init__(self, func):
        self._func = func

    def evaluate(self):
        return self._func()

    def __repr__(self):
        return str(self.evaluate())

class Spreadsheet:
    def __init__(self):
        self._cells = {}
        self._formulas = {}

    def __setattr__(self, name, value):
        if name.startswith('_'):
            super().__setattr__(name, value)
        elif isinstance(value, (int, float)):
            self._cells[name] = Value(value)
            if name in self._formulas:
                del self._formulas[name]
        elif isinstance(value, Value):
            self._cells[name] = value
            if name in self._formulas:
                del self._formulas[name]
        elif isinstance(value, Formula):
            self._formulas[name] = value
        else:
            raise TypeError("Unsupported value type. Use int, float, Value, or Formula.")

    def __getattr__(self, name):
        if name in self._formulas:
            return self._formulas[name].evaluate()
        elif name in self._cells:
            return self._cells[name]
        raise AttributeError(f"Unknown attribute: {name}")

(Value doesn’t strictly need value to be a property but it will likely be useful if the concept is expanded.)

You can leverage a caching mechanism if the formula is complex; otherwise, it won’t be any faster than simply executing the formula repeatedly.

For example, when updating a cell, you’ll need to look up formulas where that cell is used. Keep in mind, there’s no lookup faster than a simple a + b.

It’s only an efficiency question if you can absolutely guarantee that the calculations are stable. If they’re not, the question of push vs pull becomes one of intended semantics and behaviour, so it’s going to depend on what you’re seeking to accomplish with this.

Another reactive API worth mentioning Reactive Functions and Expressions — param v2.2.0, that Panel (a Python dashboarding library similar to Dash, Streamlit, Gradio, etc.) users can use to simplify writing their interactive code. A contributor wrote a pretty good blog post about it rx Marks the Spot: A Prescription for Reactive Python | by Andrew Huang | Medium.

This provides a good entry into the reactive features. Combining this with a safe version of sympy + recursively evaluating and updating automatically should get it pretty close.

The only thing missing will be having to override “=>” instead of “=” or f-string like prefixes. But this gets it much closer to what I started off with. Thanks!

This is very close to what I implemented, but I was trying to hide making it explicit. Instead of:

    s = Spreadsheet() 
    s.a = 7 
    s.b = 3 
    s.c = s.a + s.b

It would be more like:

    # s = Spreadsheet()  - this would be implicit in the module
    a = 7 
    b = 3 
    c = "a + b" 

The last line would need to be be a formula, so if a changed, c would change as well. An alternative could be:

from foo import Formula
...
    c = Formula("a + b")

Also, to make the variable reactive, each time they change, they will trigger a downstream “push” modification so c doesn’t need to be evaluated on __getattr__ access and by cascading it, you could make a Formula dependent on another Formula and so forth.

The param library, suggested in this thread may be able to handle the reactive update part.

And this is exactly the part that is a bad idea. To quote the PEP 20: explicit is better than implicit.

Also, you are requesting a pretty significant change to the python language for something that is both niche and can already be done.

You need to make a very good argument why saving this s. is a major benefit.

3 Likes