In spreadsheets, a cell can contain a value (text or string), or a formula. A cell value is a formula if preceded by an equal sign (i.e. ‘=SUM(A2:A10)
’). In a formula, you can reference other cells, use function calls, and even have conditionals (like ‘=IF(C2=”Yes”,1,2)
’. The main advantage is that when one of the referred variables changes value, all the formulae that depend on it automatically recalculate, and the results cascade on down.
My idea is to create a class of python variables – let’s call it Formula
– and implement a similar functionality but in a more pythonic way. For example:
>>> a = 10
>>> b = 20
>>> c: Formula = "a + b"
>>> print(c)
30
>>> a = 30 # <- change value
>>> print(c)
50 # <- formula auto-updates
>>> print(c.formula)
a + b
Another example might be (assuming there were a few other variables):
...
>>> z: Formula = "sum([a, b, c, d, e, f])" # <- contains function
>>> print(z)
567
If any of the referenced variables were changed through assignment, the value of z
would automatically change. An example of this is the notion of “reactive” or “bound” variables in UI frameworks like React or Vue.js.
Note, that in the above example, z
references c
which itself was a formula. So changes to c
would cascade to z
automatically.
As always, there are edge cases to consider. Invalid values, like adding strings and numbers, obviously. But also notice that BOTH c
and z
reference a
. So a change to a
would update c
then evantually cascade to z
which may end up getting evaluated twice depending on the evaluation order. An optimization might be to catch these and do them just once. In the spreadsheet universe, they also check for circular references and signal them as errors. This could be done at assignment or at runtime during evaluation.
A simpler way to define these could be as a formula-string or x
strings (since f
is already taken), so you could have something like:
z = x"sum([a, b, c, d, e, f])"
As an abstraction, formula evaluation could be extended to event dispatching. Supporting bound variables would make it easy to create event-based applications and also pub/sub type models when a value changes.
For pub/sub, multiple “watchers” could subscribe to a single value then automatically get notified when the source value changes. This could be used, for example, to dispatch multiple events when an http networking request comes in by watching the variable that stores the request. Or you could easily build a UI dashboard that updates its values automatically whenever the underlying data changes, without having to write any additional code.
Dispatching of events to support evaluation could be performed sync or async, as needed. The abstraction could allow a plugin system with stages like:
- subcsription
- change detection
- evaluation
- transform
- update
- dispatch/pub
Under the hood, you could patch into the event stream and watch, debug, or override what is happening.
If you want to follow the spreadsheet model further, a variable can reference a different tab or have a named alias so the reference doesn’t need to be A10
. In the python world, we might allow referencing values in other modules or use namespaces or aliases. But these could come later.
I started a POC implementation of the basic concept using SymPy for symbolic evaluation. Started off with eval
first and even went down the rabbit-hole of AST parsing the string formula. But I needed a bootstrapped symbolic math library, to make it look more like spreadsheets, and SymPy offered a good shortcut.
Problem is, the syntax quickly became too cumbersome to implement cleanly. There are issues overriding the assigment operator to a string so you have to come up with strange workaround syntax. Have tried overriding class setters and getters, but am not happy with how the syntax gets complicated pretty quickly. This feels like something fundamental that should be very cleann and simple to understand, like the concept of a spreadsheet cell. Adding too many steps to get it going makes it harder to approach.
The easiest, cleanest version would be the x
operator, but that would require a change to the underlying plumbing evaluation engine, hence this posting.
Is this way off-base? Any alternatives or existing solutions? Drop it? Keep going?
All feedback appreciated.
P.S. Am happy to post up what I’ve done so far, if it helps. It’s not complete and is pretty brittle in its current state. I thought I’d post to Ideas first (as recommended in PEP 1) before going much further. If more appropriate to post to a different topic, please advise. Thanks.