DSL Operator – A different approach to DSLs

As I mentioned in the thread on reviving PEP 335, I’ve been thinking about alternative ways of facilitating DSLs in Python that don’t involve overriding operators.

The basic idea is to have a way of marking an expression to be evaluated in a completely customisable way.

Suppose we could write something like this:

import my_whizzy_computer_algebra_system as cas

my_expr = cas$(x**2 + 3*x - 5)

This would get translated into the equivalent of

quad = cas.sub(cas.add(cas.pow(cas.name("x"), cas.int(2)),
  cas.mul(cas.int(3), cas.name("x"))), cas.int(5))

The cas object can then define these functions to do whatever it wants – build a parse tree, evaluate them in a special way, etc.

What think ye all?

5 Likes

What exactly would a custom expression evaluator look like? I’m guessing a function with access to the local variables but then what?

def cas(expression, namespace):
    # What is `expression`? A string? Some AST-like thing?
    # What would go here?
1 Like

One feature that I think would be essential convenience is automatic variable value pick-up. Such that variables are not passed in as cas.name("x"), but rather cas.value(x), where x is loaded in a standard manner.


I think there needs to be a way to do both: “Named Atom” and “Value Atom”.

A wrapper class actually overriding operators. So that any non-builtin algebra logics is covered.
The $(...) bracket should ensure no modification out of its scope.
(And should maybe be ${} or $[] to not be confusing with the conventional () logics.)

Translates to
my_expr = cas.out(cas.in(x)**2 + 3*cas.in(x) - 5)
Am I right ? In other way you have to manage the operators precedence within your class, otherwise you create some brain hell.

EDIT : I’m mistaken, the idea is to defer the set of instructions, ordered by python precedence, to some external module I guess.

This boils doen to a macro language while the given example seems no more powerful than what something like the sqlalchemy expression language can already do without macros

@cas
def myexpr(x,y):
  return x ** 2 + y * 3 - 5
# or
myexpr =  cas.x ** 2 + cas.y * 3 - 5


3 Likes

These approaches are already used e.g. jax uses the former and sympy uses approaches more like the latter. One thing that cannot be done this way is to intercept builtin operations e.g.:

In [1]: import sympy.abc as cas

In [2]: expr = cas.x**2 + cas.y**2 - 1/3

In [3]: expr
Out[3]: 
 2    2                    
x  + y  - 0.333333333333333

It would be nice to be able to control the division e.g. to preserve the exact fraction 1/3 rather than have it evaluate to a float. SageMath uses a “pre-parser” for this sort of thing.

Greg’s suggestion would mean that:

expr = cas$(x**2 + 3*x - 1/3)

would become

expr = cas.sub(cas.add(
  cas.pow(cas.name("x"), cas.int(2)),
  cas.mul(cas.int(3), cas.name("x"))),
  cas.truediv(cas.int(1), cas.int(3)))

Then cas.truediv can control how to evaluate 1/3. Similarly it could intercept something like 100**100**100.

Another example is if want to have things like if e.g.:

expr = cas$(x if y > x else y)

There is currently no way to avoid eager evaluation with something like that because the expression y > x cannot remain symbolically unevaluated e.g. with sympy:

In [7]: x if x > y else y
...
TypeError: cannot determine truth value of Relational: x > y

Similar problems of eager evaluation arise with operators like and and or and builtin functions like max.

Some applications like numba.jit use ast-manipulation to avoid the eager evaluation problems but ast-manipulation is fragile. A macro-based approach as suggested by Greg can provide a more robust and ergonomic version of ast-manipulation where you can build your own tree rather than needing to use the unstable trees produced by the ast module. You would need more than just inline expressions for it to be usable by the likes of numba but the suggestion as it stands would work for e.g. numexpr.

There is some overlap here with t-strings. One thing that t-strings provide that would also be needed here is the ability to interpolate local variables into the expression rather than conjure up new symbols:

e1 = cas$(x**2 - 1)
e2 = cas$({e1}**2 + 2*y)

With t-strings I think you could do that as

e1 = cas(t'x**2 - 1')
e2 = cas(t'{e1}**2 + 2*y')

Conceptually and in terms of implementation I prefer the “this is a symbolic macro” approach rather than “this is parsing a string”. Already the syntax highlighting looks off with t-strings because it is rendering as a string and the runtime implementation that needs to parse the strings would be less efficient.

2 Likes

Note: DSL = domain specific language.

id$( <sub-domain is here> )

One idea that I was playing with is that it would be good to be able to employ full syntax of Python to be available to “sub-domains”.

It can be restricted of course, but the point is not to write something new, but to make use of existing toolkit, while gaining maximum potential flexibility.

One obvious way to do it would be to make use of ast. e.g.:

import ast
a = 1
tree = ast.parse_dsl$( a + b )
# tree:
    Expr(
            value=BinOp(
                left=Name(id='a', ctx=Load()),
                op=Add(),
                right=Name(id='b', ctx=Load())))

An implementation then could be:

class my_dsl(ast.parse_dsl):
    def __init__(self, namespace):
        print(f'{namespace=}')
        self.namespace = namespace
    def process_Expr(self, value):
        return self.namespace[value.left.id]

my_dsl$( a + b )
# namespace={'a': 1}
# 1

So it can be customisable ast.parse with similar to ast.Transformer API, which additionally "interpolates local variables ".

Just for completeness : jax requires that the inputs are already jax arrays to trace the operation graph. Thus the variables have to be wrapped before in every case (as for sympy).

Regarding interpolation, I was thinking that the $ symbol on its own could be used to insert sub-expressions that are evaluated using normal Python semantics, e.g.

c = 42
expr = cas$(x**2 + 3*x + $c)

which would give the same result as

expr = cas$(x**2 + 3*x + 42)

I don’t really understand the motivation for allowing DSL’s in Python. Like I understand why you’d want to write a DSL in Python, but just wanting to do something is not a good motivation for adding it to the language.

Could you motivate why this would be a good thing for Python beyond a more natural syntax for DSLs? Is there some benefit to the language in general beyond DSLs?

2 Likes

The only currently missing part is the extraction of roots and chained operations from a code expression, thus the most atomic missing step is a decomposition like :

args, ops = ${ a + b | c[d] }

# would be equivalent to :
# args = [a, b, c, d]
# ops = "(x0.__add__(x1)).__or__(x2.__at__(x3))"       #option1
# ops = lambda x0, x1, x2, x3 : x0 + x1 | x2[x3]       #option2
# ops = <any object containing the 'operations tree'>  #option3

Not sure I can, this is more about exploring what could be done, than what should be done, I assume.

1 Like

IMO this is the key point here - all of this can already be done if you’re willing to accept the imput as a string, which you then parse. But there’s a very real sense in which people prefer the idea that “the input is a Python expression”. The rest is just what you choose to do with that input.

So maybe the key thing here is to have some form of construct that acts as a form of literal for an AST object? So $(expr) will give the AST object that you would get via ast.parse("expr"). Maybe with a similar construct for a suite of statements:

parse_suite as my_ast:
    x = 1
    fun(x)

(Syntax is to be argued about as much as anyone cares to :slightly_smiling_face:)

There may be details to work out around scoping and variable access, but at its most basic, would “ast expressions” be the core building block here?

5 Likes

I like this. In essence it could theoretically take advantage of full Python syntax and employ it for DSLs in both eval and exec.

1 Like

I’m curious how this aligns with and differs from PEP 638 – Syntactic Macros | peps.python.org.

It’s basically the same - TBH, I’d forgotten about PEP 638. That PEP is still in draft mode, maybe the right thing to do here is simply for someone to either take it over from @markshannon, or work with him if he’s still interested in moving it forward.

3 Likes

While the two ideas sound similar they are also different. I see them aiming at different applications although there is overlap.

PEP 638 is about transforming fully arbitrary Python code and so requires working with the ast module’s representation. The ast module is not so nice to work with because the trees are complicated to support full Python syntax, are unstable across Python versions, and have awkward interfaces for manipulation. PEP 638 does propose making changes to the ast module so it could be improved but the idea here bypasses the ast module altogether.

The idea as demonstrated in the OP here allows direct evaluation or building your own tree/graph which is more useful if you only want to support a limited part of the full syntax of Python. Building your own tree means you can use your own data structures and don’t have to screen for arbitrary tree nodes. You can also limit the syntax to a stable subset of Python and then not need to adapt to every new Python version changing the ast, byte code etc.

Here is an example to show possible differences. The decimal module allows creating contexts with different precision for arithmetic. You can use e.g.

from decimal import Context, Decimal
ctx = Context()
ctx.prec = 10
z = ctx.add(1, 2)

You can also add two decimals together like a + b rather than using ctx.add but then you are using the global precision and you might not want to mess with the global precision. With the idea here you could do e.g.:

x = Decimal(2)
z = ctx$($x**2 + 0.1)

Now ctx$ can translate + to ctx.add to ensure that the correct precision is used and ideally could also turn the literal 0.1 into an exact Decimal('0.1'). To make this work you can simply define ctx like:

class Context:
    def add(...): ...
    def pow(...): ...
    ...

With an ast tree you instead get handed an arbitrary tree and need to traverse it to figure out what is what. The literal 0.1 will have already been turned into a float so you can’t get that back.

The ast is also more complicated in structure than you would really want:

>>> t = ast.parse('0.1')
>>> t
<ast.Module object at 0x7b8d893c8e10>
>>> t.body
[<ast.Expr object at 0x7b8d893c8d90>]
>>> t.body[0]
<ast.Expr object at 0x7b8d893c8d90>
>>> t.body[0]._fields
('value',)
>>> t.body[0].value
<ast.Constant object at 0x7b8d893c9310>
>>> t.body[0].value._fields
('value', 'kind')
>>> t.body[0].value.value
0.1

If you wanted to make a tree that was nice to manipulate for simple cases, and DSLs and so on then I think you would use something a bit simpler than this ast representation.

This is one of several threads recently – including those on boolean operator overloading – in which I understand what people are wanting to do but I’m very much puzzled about why they want to do it.

pyparsing and sqlalchemy, among others, demonstrate that you can already build a sophisticated DSL with the available tools.

Creating tooling for even more DSL customization starts to raise questions for me. It seems that proponents of these ideas want to reuse the Python parser, but without the restrictions that come from their code being evaluated as Python. (e.g. operator short circuiting)
So it should be Python code when parsed, but it should not be Python code when evaluated.

Wouldn’t t-strings or writing parsers for these other languages be a better solution? The contents would explicitly not be Python, so you’d have no issues with Python having rules about things like operator precedence.

I know this all may read as hostile, so I will reiterate: I’m confused by these threads. Could someone help clarify why they want these things parsed as Python, rather than something else?

4 Likes

I think “motivation” and “rationale” sections of PEP 638 sum it up well.



My current take is that it would be good to digest this idea together with PEP 638 and see if both can be achieved at the same time or maybe designed in a way so that they are complementary with each other.

My personal applications that led me here pretty much cover both:

  1. flexible expression graph backend that can be re-used for many different things
  2. macros / DSL entry points that capture local variables
1 Like