DSL Operator – A different approach to DSLs

dg-pb · February 9, 2025, 12:47am

Yes, they can be used, but they are not made for this. And using them for applications that are being tackled here would result in a lot of repetitive complexity. Not to mention performance, which is inevitably much poorer compared to what could be achieved here. See: Linked Booleans Logics (rethinking PEP 505) - #28 by dg-pb

I think this can be likened to regular expressions versus writing custom pattern matching for every new pattern in Pure Python.

If there is a class of problems that could potentially benefit from a standardised, robust and performant toolkit, then, depending on circumstances, it might be a good idea to invest some time.

hprodh · February 9, 2025, 2:08am

What if… there were ast-strings, such as :

astr = $" a + b | c[d] "

astr.ast # some AbstractSynraxTree container
astr.roots # [a ,b, c, d]

Where the linter treats the astr as python code, not an str. The astr combines str properties (for parser compatibility), code instrospection consistency (no more unused variable warnings, etc…), provides AST and list of roots. The deferred evaluation is eventually done with e.g. :

astr.ast.to_func(*astr.roots)

Thus also providing possibilities to apply a wrapper on every root, or AST processing, before eventual evaluation.
Any opinion ?

Side note : I read “python is not homoiconic” in PEP 638, I found it insightful.

sirosen · February 9, 2025, 4:49am

Okay, maybe I’m being dense but I’m still not getting it. PEP 638 is about being able to define preprocessing of code before execution. I find it immensely interesting and would definitely experiment with the feature if implemented.

Given the age of it, I do wonder if the PEP should be withdrawn if it’s not active, just to make the signal around it clearer, but I’d also be happy to see it revived and put back in motion towards submission.

The note about homoiconic languages looks to me like basically calling out lisp, since that’s a major (and, in that community, beloved) lisp characteristic. Every lisp program defines macros and makes its own language variant. Everything about that PEP makes sense to me.

Syntactic macros aren’t the same thing as DSL definition. And I’m still getting some signal that this isn’t a thread which is attempting to inject new energy into PEP 638. It seems to be at least partly about using the Python parser for things which aren’t actually Python code. (Per the thread title, it seems to be a thread about DSLs!)

Writing a parser (e.g. using pyparsing, lark, etc) isn’t made for parsing DSLs?

Maybe a different, more direct question: What is “this”?

I might bow out of this thread. I voiced some confusion and the explanation I’m getting is just confusing me more. I’m clearly looking at things from a very different angle from the folks who are excited about these ideas.

If the true core of this conversation is “we would like syntactic macros”, then I get it. I somewhat agree, but I do worry about the downside risk that Python code in aggregate becomes harder to understand. If that’s not the topic of conversation and it’s not about DSL definition (which I consider to be a fully separate topic from macros), then I’m just totally lost and I’m not sure anyone should spend much energy trying to help me understand.

hprodh · February 9, 2025, 11:55am

Syntactic macros, None-aware operators, DSLs, chained functions, deferred expressions…
There is, somewhere, not “one problem” but “a whole class of diverse problems” that looks to appear in recurrent proposals, mostly rejected or deferred because too “niche case problems”. The point is, perhaps there is a common solution to all of these… and it seems to me the topics are now converging to highlight a python blind spot regarding proper AST management ~ homoiconicity.

dg-pb · February 9, 2025, 12:38pm

I am not very sure about anything at this point really (except that this is interesting and could be worthwhile to continue). I need to digest PEP638 properly and put everything into perspective - a lot of stuff to consider by now.

However, the way I see it now:

PEP 638:
CODE → PARSE → AST → MACRO → NEW_AST → EVAL

While this proposal does not need the tail (mostly due to performance penalty and generally not doing unnecessary things) so all what is left:
CODE → PARSE → EVAL

As I said, maybe it is possible to to split it into 2 steps:

CODE → PARSE(ENGINE)
ENGINE → RESULT

In case of macros, then engine could do AST transforms as per PEP638, and in case of more straight forward DSLs, it can just evaluate straight away.

I have similar insight here.

I think there is one more thread worth linking to: Make the PEG parser available to all Python users

Of course, parser would not be made available, but it would be possible to utilise it for DSLs that can make different use of Python syntax.

John_Carter · February 9, 2025, 1:03pm

Inspired by PEP 638 Ive written my own version of a macro processor. It avoids changing the interpreter by using source encoding. Macros are

[x =] name!<delineter>Some Text<delimiter>[:
    Some Body
   ]

Expression and statement macros with or without bodies.

The Text or Body need not be valid python and the delimeters are very flexible. The Body just needs to be indented. Here are some working examples

        # super switch, match with addons
        switch! x with logic:
            case! Y == 'abc' capture Y:
                print('match and case', Y)
            case! _:
                print('Default')
        # inspired by Rust
        macro_rules! abc as EXPRESSION:
            case! _0:
                _0 + 7
            case! _0, _1:
                _0 + _1
            case! _0, _1, _2:
                (_0 + _1) * _2
            case! _0, _1, _2, _3:
                _0 + _1 + _2 + _3
        x = 4
        y = abc!(1, 2, 3)
        y = abc![1, x/2]
        y = abc!{7}
        # lisp s-expression to list.
        l = lisp!$1 2 3 4 5 6

In the final exampe the delimiters are $ and \n. The case macro is never defined. All the examples in the PEP work. Ive also used it to write multi-line lambda functions.

Ive not published it as I regard it as a toy and learning experience for me, some bits are fragile and its not properly tested.

It has limitations,

errors refer to locations in generated code, not source code which makes deuging difficult.
comments are best avoided.
using macros in other macros may not always work.

However it does allow non valid python to be mixed with valid python.

I havnt tried the OPs idea but see no reason for it not to work.

For expresion macros the processor works by

simple text processing to turn name! into unique_name(name, delimeter, paramater, delimiter) where all arguments are strings. This is a fool proof as I can make it. The result compiles to a single AST node.
convert all code to AST tree.
process macro in AST tree and replace AST function node and define any necessary new functions.
convert back to python.
pass to compiler

Statement macros resort to some different tricks to capture the body.

It should be built into the compiler but I gave up trying to understand and modify the parser.

All the hard work is done by manipulating the AST tree which makes it all possible.

Its tested with 3.11 on Windows 10.

dg-pb · February 9, 2025, 2:56pm

I have never properly looked at macros and not familiar with macros in Rust.

And having a bit hard time following PEP 638. I imagine it is much more straight froward to someone familiar with macros in Rust or other languages that build on similar concepts.

What would really help is more elaborate end-to-end examples with intermediate and final results being printed.

Could you please add print(x); print(y); print(l) in your examples? That would be helpful for someone who is looking at this for the first time.

sirosen · February 9, 2025, 2:59pm

Thanks for this explanation. I disagree, but now I understand what we’re on about do I’m at least able to engage productively.

General purpose language features should be applicable to a wide variety of domains. And the design process can and should often take various ideas and find their commonalities.

Macros would drastically change what it means to read, write, and debug Python programs. They’re powerful but almost frighteningly so – as it stands today I’ve had to unwind some pretty tangled code over the years. Macros give less disciplined developers tremendous latitude to create a mini language which uniquely “fits their brain” rather than the more general and classic “Python fits your brain” (generic “you”).

So surely they would be applicable to many, perhaps even all of these problem spaces. But I don’t think that means that these problems themselves are all good motivators for macros, or all would be considered solved if we had macros. For example, I can’t imagine that the (seemingly eternal) discussions about deferred expressions and multiline lambdas would suddenly stop and go away if we had macros. Nor do I think that the language should stop evolving new syntax and features like None coalescing operators (whether or not I support PEP 505 is not pertinent here). Because a macro definition is akin to a function definition, calls for unified “in language” solutions would remain.

If you want to build a proof of concept, you can probably write a decorator which reads AST from the decorated function’s source, reshapes it, and then formats and evaluates the result. In spite of the criticisms in this thread of the ast module, I find it quite pleasant to use and have been using it since around 3.5 to build custom linters. It changes when new syntax is added but PoC code could just choose to only support the latest CPython.
I believe that asking for that module to change how it evolves as syntax changes would be improper. It should track changes in the language.

You say that you see these threads as converging on homoiconicity – which in our context is to say macros. I don’t see that. I see threads wandering from topic to topic without spending enough time in one place to let the ideas mature into a clear proposal for a change to the language. I’m not seeing people build a proof of concept and demonstrate how it’s powerful and flexible and solves many problems. And exposing that proof of concept to constructive criticism.

Remember that changing a mature language with a dramatic addition is incredibly hard and takes years of work. You need a very concise and clear abstract for what you’re doing and why. And if it isn’t concise and clear enough, you need to put in the time to sweat the details and make it concise and clear. I hope I’ve been able to help push in that direction.

dg-pb · February 9, 2025, 3:33pm

With this I agree. Python’s ast is not too bad. And if it can be re-used then large part of new feature already “fits in ones brain”, plus added benefits of mature and feature-rich expression graph.

dg-pb · February 9, 2025, 3:41pm

Also, for None-Aware DSL building, expression graph construction might be needed. E.g.:

result = DSL$( a[b] | c[d] )

If doing bottom-up, it is impossible to know which atoms need to be wrapped to get:

result = func(a)[b] | func(c)[d]

Thus, for some applications, such as sympy-like simple graph building, bottom-up-eval approach works well and is optimal, however, for more complex cases top-down graph analysis might be inevitable.

dg-pb · February 9, 2025, 3:53pm

Also, regarding “what is wrong with simply parsing a string?”.

I could not put it simpler than programming languages - What exactly does homoiconicity mean? - Stack Overflow

“… which you could parse and feed to a compiler, but that’s awkward, because it’s a string rather than a structured term.”

Also, the issue of picking up variables. t-strings work here, but it gets awkward in other aspects.

MegaIng · February 9, 2025, 3:56pm

This has been done, multiple times, I provided a somewhat generalized examples for this before.

But this approach has fundamental issues that make it unusable for production or even true public interfaces:

It’s too fragile. It doesn’t interact well with other metaprogramming features.
It’s complex. No, what you described doesn’t quite work, you need to also compile the surrounding context, not just the function on its own.
It’s too fragile, again. It doesn’t work if the source is not available, which is pretty easy to achieve by accident.

Ofcourse, these can be worked around, but until this^[1] gets added to the language, none of these will be used in a serious context. I do agree that ast is very powerful and useful and have used it myself for similar stuff before.

This is at least partially because these topics get shut down by experience community members for various reasons, and sometimes these reasons are justified. A few examples of provided reasons:

DSL are fundamentally not something that should be added to python. This is actually a stance that Guido expressed at some point, and you also expressed in I guess a slightly weaker tone.
Macros specifically are too powerful.
defer expressions can either never work as you want or they aren’t powerful enough.
This is unpythonic and makes code harder to read. (i.e. the primary critisims of PEP 505)
Very similar to the previous point, this is too unintuitive.

I believe that if you don’t agree with the fundamental idea of “DSLs are useful thing to add to python as a first class citizens”, there is little a more concise and clear abstract is going to help. In the end, it boils down to this question, and many^[2] users of python agree with the idea that DSLs are useful, even if they don’t fully realize that this is their position - that is why these kinds of requests pop up again and again.

For some definition of this. Honestly, t-strings might be enough ↩︎
I have no clear idea on how many. Might be a loud minority, might be a significant fraction, might the majority. It’s definitely a minority among the core devs. ↩︎

pf_moore · February 9, 2025, 4:37pm

This is a good point - macros won’t stop people wanting new syntax, although they will mean that many syntax proposals will now have to answer the question “why can’t this be a library on PyPI?” But just as people still want functionality added to the stdlib (and sometimes succeed in getting it), so will people still want new syntax.

I’d characterise that somewhat differently - when difficult questions are raised^[1], no-one is actually willing (or able) to come up with an answer to those questions, and the topic goes nowhere as a result.

I feel like this reflects a similar view to your previous comment about threads being shut down. It’s important to remember that proposals are precisely about getting changing the mind of someone who doesn’t necessarily agree with the proposal (specifically, the SC or PEP delegate). While it may not be important to get Stephen to agree with the value of adding first-class DSL support to Python, it could still be good practice to persuade him, because it will help you refine your arguments into something that will convince the SC.

Ultimately, that’s the whole point of the Ideas category, and of the process of gaining community consensus. It’s not about collecting a bunch of people who support an idea^[2] - instead, it’s about refining your arguments in order to persuade people who don’t initially support you.

Indeed. And the reason they repeatedly fail to get traction, is because no-one has ever taken the problem of persuading the people who don’t agree with that idea seriously (or if they have, they haven’t been able to find a sufficiently persuasive argument).

often but not always by experienced Python developers who have seen many similar discussions over the years and know some of the problems that need to be addressed ↩︎
Python’s development process is most definitely not a popularity contest, or even a democracy where getting enough votes is sufficient for a proposal to get accepted ↩︎

sirosen · February 9, 2025, 4:49pm

Except… I love DSLs!

I use the sqlalchemy ORM at work, I’m a pretty heavy user of clicks decorator API for CLIs (and even built a single module knock-off on top of argparse in a project where we didn’t want to add dependencies), and I find pyparsing very pleasant to use.
I even like a lot about pydantic – though I’m probably less of an enthusiast for it than some other folks.

DSL support in Python is already here, as long as you work within the constraints of “it still has to be valid Python”.

I’m continually wowed by the community’s ability to find novel and clever solutions. There’s a lot of great prior art out there.

So when I point at writing a parser as the solution for “if you don’t want it to be Python code anymore, do this”, I do so in the context that you can do a lot before you reach that point.

I’m not unconvinced about the value of DSLs. I’m unconvinced that there is a singular problem under discussion here.

MegaIng · February 9, 2025, 4:50pm

Yes, but there is a difference between convincing someone about the details of an approach, or that a new feature is useful, and starting a discussion about the fundamental nature of python as a language going forward.

Making DSLs first class citizens is a major shift in the language, probably about as fundamental as indentation based syntax and at least on-par with the walrus operator and pattern matching. That doesn’t mean it’s impossible to convince people of this, but it’s a daunting tasks that is going to take years, and very few people are up to the task. Essentially what we need is a core dev, or ideally an SC member ^[1] clearly behind the idea. And this is noticeable when creating a new thread on a topic like this - it is obvious that it will never go anywhere, even if noone directly shuts it down, and people lose interest.

TBH, I am myself also only a +0. I believe that DSLs are very useful and that they should be added in some form (I am a fan of regex, and a co-maintainer of lark), but I agree with many of the concerns and don’t really have a good answer.

Not because of additional power, but because of clout and trust from other core devs ↩︎

MegaIng · February 9, 2025, 5:13pm

Except I wouldn’t describe any of these as DSLs, which tbf is a bit of an subjective definition.

But especially pyparsing is the avoidance of creating a DSL by instead using (IMO) slightly less readable python syntax. regex or BNF would be more readable most of the time, but instead pyparsing uses parser combinators to stay within valid python syntax.

(I argue that you are unconvinced about the value of DSLs as first class citizens)

This is somewhat fair I guess, but the point is that many different requests would be solved by first class DSLs.

sirosen · February 9, 2025, 5:31pm

Your perspective on this seems totally fair to me. But, if I’m reading you right, we’re (both legitimately) calling different things DSLs.

I’m content to call pyparsing’s combinators a DSL. I can write something like Atom | BinaryOp and I know that it’s cleverly using bitwise-or but it doesn’t look like that. It successfully provides an abstraction which looks like “this or that” in a grammar.

Is this not a DSL because it’s constrained by the syntactic rules of Python? i.e. In your view, to be a true DSL it needs to be able to redefine what happens during parsing? (e.g. to make adjacency imply function application, like Ruby?)

If that’s what counts as a DSL, then yeah, I’m not a fan of that. I’d like such things to be declared separately from my Python code. As strings (like the lark readme suggests!) or in separate files.

dg-pb · February 9, 2025, 7:47pm

So I think key dimensions of DSLs in Python so far are:

Grammar
a) Unconstrained
b) Python-like
Parameter passing
a) Explicit-input
b) Implicit-input

Unconstrained grammar with explicit-inputs

I am happy to use lark or my custom parser and provide arguments manually. E.g. re

Unconstrained grammar with implicit-inputs

I think t-strings will provide “good enough” toolkit. E.g.: parse(t'{a} ^^^ {b}').

Things can get a bit messy here, but in reality I personally don’t have many applications for which this would be optimal (apart from those discussed as part of t-strings).

One example that I really liked and plan to adapt is GitHub - pgjones/sql-tstring: SQL-tString allows for f-string like construction of sql queries. I have been using sqlalchemy and many other different tools that abstract syntaxes of other languages and slowly coming to a conclusion that such obfuscation doesn’t benefit me much - I need to learn a ton of new packages and slowly start to forget SQL query language, which inevitable I need to take time to remember again. But that is another story…

Python-like grammar with explicit-inputs

I am quite content with: eval('A | B', {'A': my_type(), 'B': my_type()})

Cases where this is optimal solution are rarely performance critical. E.g.: I use this for command-line argument to specify which tests to run with high degree of customisation. I think it is lovely when this can be done - very low maintenance solution (compared to writing custom grammar with integrated logic) and it is very quick to get my head around the implementation even after long time without looking at it.

Python-like grammar with implicit-inputs

I think this is the target of this discussion.

dg-pb · February 9, 2025, 8:16pm

For “Python-like grammar with implicit-inputs” t-strings can be used.

But many target applications are “quite close to the metal”, thus they would ideally feel that way and have qualities of such.

Thus the following make t-strings not very suitable:

It is not suited for parsing “Python-like grammar” and there are a lot of awkward steps to implement to achieve that.
String input does not feel natural. And the need to wrap inputs in curly braces is poor readability for syntax which is Python-like.
Performance. This would be tens of times slower compared to straight forward graph construction approach that say sympy uses.

hprodh · February 9, 2025, 8:26pm

Note : Julia provides a whole (very cryptic) interface for metaprograming

I tried to read it but couldn’t find a part where it separates AST one one hand and operands references on the other hand. Yet I am quite sure it is the most simple and powerful way to go.