PEP 750: Tag Strings For Writing Domain-Specific Languages

pf_moore · August 12, 2024, 3:36pm

I’m mostly just looking at the Jinja example @pawamoy posted above, but basically, behaviour corresponding to Jinja templates would mean having something like template"Hello, {name}" evaluate to an object with a .render(name=xxx) method. So you’d be able to do something like

message = template"Hello, {name}"
name = input("What is your name? ")
print(message.render(name=name))

Maybe I’m missing something, and there’s no way of evaluating an Interpolation object with a custom locals() dictionary. If so, then this example isn’t possible. But that also calls into question how the PEP intends to be address the sort of problems Jinja2 does, as stated in the “Motivation” section (which is the point @pawamoy was making).

steve.dower · August 12, 2024, 3:41pm

There is, because the template function here actually gets passed multiple args, the first being "Hello, " and the second being InterpolationConcrete(getvalue=<function <interpolation> at 0x2084d50>, expr='name', conv=None, format_spec=None). So you don’t actually need to read name directly from the current scope if you use .expr instead.

My inclination would be to keep this, but have already captured the value of name if it existed, and return the already evaluated value or re-raise the error when calling getvalue. Currently, getvalue goes out and looks up the value at the time it is called.

But if the implementation wants to ignore the value and use the name, then it can.

davidism · August 12, 2024, 3:59pm

Maintainer of Jinja here. There’s a few defining aspects of Jinja that I’m unsure about with tagged strings.

Jinja templates are arbitrary strings written ahead of time. For example, they’re often used for static site generators, where the user writes templates for their pages as individual files, never touching Python. The fact that the templates are separate from Python also allows them to be rendered by other implementations in other programming languages. We would need some sort of parse_tagged_string(string) function to take an arbitrary string and turn it into that sequence of decoded | interpolation objects.

Assuming we could turn an arbitrary file into a tagged string and then use the tag function to further parse it, we could do what Jinja does and compile to Python, cache the template and compiled bytecode, reference one template from another to extend/include/import, etc.

Lazy evaluation is a must not only for deferring the entire call, but also to support control structures like if and for. You don’t want to evaluate expressions within control blocks unless the blocks are actually entered during that render.

You also want to be able to pass in different values for different renders of the same template. I’m not clear how you’d store a tagged string for rendering multiple times, or how you’d pass in different values to the lazy expressions for each render.

DSLs would require the ability to define tokens, parse them, and do things based on them. But tagged strings only identify “static string” parts and “Python expression” parts. It’s certainly nice that you don’t have to parse the contents of the expression parts (Jinja has to do this and basically parses a subset of Python.) But you still have to define the syntax, parsing, and execution for the static parts.

Again assuming we have parse_tagged_string, we also need to be able to inspect the expressions to make sure they’re safe. Jinja has a sandboxed environment that effectively allows rendering templates from untrusted source, by disallowing arbitrary attribute access, etc. So we’d need to be able to evaluate that hello {world.__code__.__globals__["eval"]("evil")} or whatever that common breakout example is isn’t written so that we don’t execute it.

I think tagged strings are still valuable as a tool to perform processing on values before string interpolation, such as escaping for HTML or SQL. But I’m not sure I understand how I’d make something like Jinja with them, at least in a significantly easier or cleaner way.

pawamoy · August 12, 2024, 4:08pm

Thanks @pf_moore and @steve.dower. So we could do something like this:

class Template:
    def __init__(self, *args) -> None:
        self.args = args

    def render(self, **context) -> str:
        parts = []
        for arg in self.args:
            if isinstance(arg, Decoded):
                parts.append(str(arg))
            else:
                parts.append(str(eval(arg.expr, context)))
        return "".join(parts)


def template(*args):
    return Template(*args)


message = template"Hey {name or 'you'}"
name = input("What is your name? ")
print(message.render(name=name))

…but that requires using eval, so the “only” benefit of the template tag string would be the automatic parsing of strings vs. interpolated values (which is already quite nice by itself).

pf_moore · August 12, 2024, 4:23pm

Agreed. My feeling is that the mention of Jinja in the PEP probably shouldn’t be taken as anything too specific, but it did generate some interesting discussion about how tags might be defined in real-life applications. I’m still not sure if there’s a case for lazy evaluation in tagged strings (“replicating Jinja” clearly isn’t one ), but I am still convinced that very few^[1] people would use a tag that required you to wrap all your expressions in lambda.

I still wish the PEP had some better examples of actual use cases, though.

I was going to say “no-one”, but I don’t want to annoy @charliermarsh again ↩︎

oscarbenjamin · August 12, 2024, 4:29pm

I can give you another example of this which is that I expect that SymPy would use this as a way of parsing expression strings.

Currently you can do e.g.:

In [1]: from sympy import symbols, S

In [2]: x, y = symbols('x, y')

In [3]: e = x**2 + 2*x/3 + 1

In [4]: e
Out[4]: 
 2   2⋅x    
x  + ─── + 1
      3

A significant awkwardness comes from integer division returning a float when we typically want exact rational numbers:

In [14]: x + 1/2
Out[14]: x + 0.5

You have to be careful to avoid writing 1/2 or 2/3 in the code. The S function helps in two different ways:

In [8]: S("x^2 + 2*x + 1/2")
Out[8]: 
 2         1
x  + 2⋅x + ─
           2

In [9]: x**2 + 2*x + S(1)/2
Out[9]: 
 2         1
x  + 2⋅x + ─
           2

Using S(1)/2 is awkward in larger expressions and it is easy to forget in one place and have a float embedded somewhere. Also it is awkward to combine local variables with string parsing:

S("x^2 + 1/2 + sin(e)", locals={'e': e})

With the PEP you could turn S into a tag prefix and then:

expr2 = S"x^2 + 2x + 1/2 + sin({e})"

That makes it possible to combine string parsing with retrieving local variables in a way that should be easy to understand. You also have a way to distinguish between a symbol e in the expression vs a reference to the local variable {e} in context. I have seen people doing lots of strange things with globals etc to try to workaround the current limitations so it is clear that something better is desired.

ncoghlan · August 12, 2024, 5:08pm

Very interesting proposal!

When reading this PEP, I also went back to skim PEP 501 and realised I had never actually merged @nhumrich’s updates from last year. I have now fixed that oversight, so the fully rendered version of the “template literal strings” proposal should appear on peps.python.org in the not too distant future. (Edit: the rendered version is live)

Functionally, I’m obviously in favour of the general idea, and have been for a long time. The first thread referenced in PEP 501 is actually the PEP 498 thread, since it started out as a PEP 498 competitor (and only later morphed into an idea that built on f-strings instead of competing with them).

Syntactically and semantically, I’m not sure replacing a first class object created via a dedicated string prefix (types.TemplateLiteral and t respectively in the 2023 update to PEP 501) with a particular call signature accessed via novel syntax would end up being a net win.

Using the introductory example from PEP 750, the PEP 501 equivalent would be:

name = "World"
greeting = greet(t"hello {name}")
assert greeting == "Hello WORLD!"

def greet(template:TemplateLiteral):
    def render_template(parts):
        salutation, recipient, *_ = parts
        return f"{salutation.title().strip()} {recipient.upper()}!"
    return template.render(render_template=render_template)

Edit: fixed the example greet implementation to actually match the PEP 750 example

Since template renderers are just callables that take a template literal as an argument (it doesn’t even have to be their first argument), there are no lexer-based restrictions on how we would refer to them. Dotted names et al would all just work, since it is only the t prefix that would need special handling when lexing.

As first class objects, they’re also able to natively support template concatenation and repetition.

Syntax highlighters would only need the minimal update to recognise t as a valid string prefix, while type checkers would only need to know that a t-string defines a TemplateLiteral object instead of a regular string.

Template literal support can also be added to existing methods that accept strings (which is particularly important for potential use cases like logging) rather than needing to define new APIs that fit the tagged string function signature.

Several of the other differences folks in this thread have been requesting (like eagerly evaluating the interpolated expressions by default) are also part of PEP 501, but I see those compile time details of how the template is decomposed from source code to runtime objects as less fundamental than the core structural difference between “t-strings always emit a first-class TemplateLiteral object, which may then be passed to a rendering function as a regular parameter” and “tagged strings are an alternate call syntax that pass the component parts of the template literal to the callable named by the string prefix”.

I do agree that PEP 750 is a generalisation of PEP 501, since given PEP 750 you could write a t tagged string function that emitted a TemplateLiteral object. I’m just not sure it’s a generalisation that increases the expressiveness of the syntax over passing TemplateLiteral objects to regular functions.

One PEP 501 idea that PEP 750 does give me is that TemplateLiteral.render should probably accept a render_text callback (in addition to the already defined render_field and render_template callbacks), similar to the way PEP 750 makes it straightforward to customise the rendering of both the text portions and the interpolated fields based on the parameter types passed to the callable.

(Edit 2) As a general usage note: template literals are definitely syntactically noisier than tagged strings (e.g. html(t"<h1 {attributes}>Hello!</h1>") vs html"<h1 {attributes}>Hello!</h1>"). However, they’re also more explicit about what is actually happening at runtime (a function call to produce a particular kind of object from the given template string).

pauleveritt · August 12, 2024, 5:17pm

We chose to strip extra detail out of the PEP and put in a secondary location.

I think we chose wrong. I’ll try to reply and point to the other material as appropriate. For example: write yourself an HTML template system.

As a note, we have an implementation that we’re gradually building to be a full-featured choice, along the lines of htm or lit-html.

Full-disclosure: I’m interested in component-driven development for Python, so I have certain bias and viewpoint. My direction is intermediate VDOMs for re-execution and caching.

pauleveritt · August 12, 2024, 5:25pm

If it helps, our HTML template tutorial shows building a system with an intermediate AST. Jim has written prototypes which cache these, as does htm.py. I’m working on a VDOM representation of actual renders. Plus ideas beyond that.

As an aside: I’m hoping that different “DSLs” (HTML, CSS, SVG, SQL, etc.) could come up with protocols for intermediate representations. Then, standard tooling could be made as plugins, as one sees in Babel/Rollup/etc. We’d also be less focused on specific implementations.

godlygeek · August 12, 2024, 7:04pm

If the tag name lookup is going to use a different set of namespaces than standard name lookup, that’s something that this PEP would need to specify, no?

DanCardin · August 12, 2024, 8:02pm

all i’m saying is that there would seem to be a way out, if at some unspecified point in the future, there came a need for an addition built-in prefix, because clearly everything is special cased right now. That exact example might not make sense because some def f(...): ... could then secretly shadow f-strings.

But I think the litmus test ought to be: can the existing prefixes be implemented using the current PEP? If so, then it feels like there is the ability to alter how they’re special cased in the future (i.e. in terms of the pep), so that an addition could be made backwards compatibly.

…some def f(...): ... could then secretly shadow f-strings.

speaking of which. even if this is “disallowed” in the sense that fstrings (and other existing prefixes) take precedence, because they’re handled futher up the chain; i dont love that it’s not obviously the case from looking at the syntax and dont happen to know the full set of existing prefixes offhand.

godlygeek · August 12, 2024, 8:10pm

I believe f could be, but r and b and u definitely cannot. b"{" and r"{" and u"{" are all currently syntactically valid, but none of those would be valid tag strings (the { would be seen as a placeholder missing its end delimiter).

bwoodsend · August 12, 2024, 11:01pm

I am really struggling to even get past the hello world example in the abstract. I’m so confused by the scoping of variables. If greeting and name are defined in different scopes, what happens? Does greeting just slurp up whatever’s in the caller’s local namespace? If so this would be valid code?

from somewhere_else import greeting

print(greeting)  # Surprise NameError?
name = "Bob"  # Seemingly, but not really, unused variable
print(greeting)  # Seemingly, but not really, constant print statement

And what problem does this solves that a slightly more verbose function definition (where name is explicitly passed into greeting as a parameter) and/or the massively underrated string.Template() class can’t handle more clearly?

At work I have to work with a Java derivative called Groovy which I would describe as the most needlessly confusing language I have ever used. Lazy evaluation is a flagship feature of Groovy and also a flagship reason why Groovy is so impossible. The not knowing if/when something is being evaluated has you perpetually questioning your own sanity and your trust of any line of code with an expression in it and you resort to smothering every string substitution in redundant String.copy() calls just to be sure that the value of a string can’t change its mind.

If you take away the lazy evaluation from this idea though, what are you left with? It looks like just a way to call a function on a string without typing ( and )?

jimbaker · August 12, 2024, 11:16pm

It’s a good point. Being able to defer evaluation until actually emitting text to say WSGI or ASGI is an important usage, but without requiring someone to explicitly write this out, such as a series of yields in a generator.

That is, for some markup language with tag function markup, the idea we are trying to support here is that we would want for nested markup

markup"{top} ... {middle} ... {bottom}"

that top can be emitted (and therefore evaluated) before middle and then bottom.

I do think other use cases for deferred evaluation make sense, which is why we chose a uniform model with Interpolation.getvalue providing access to the lambda-wrapped expression.

brettcannon · August 12, 2024, 11:17pm

The scoping takes place at the tag strings declaration point. So you can think of:

name = "World"
greeting = greet"hello {name}"
assert greeting == "Hello WORLD!"

as doing the following (greatly simplified to get the scoping across):

name = "World"
args = "hello ", lambda: name
greeting = greet(*args)
assert greeting == "Hello WORLD!"

There are lambdas being created for you to make the values lazy, so the lambdas following the scoping rules you would expect.

jimbaker · August 12, 2024, 11:39pm

This is a very common pattern seen in libraries like numexpr, namely numexpr/numexpr/necompiler.py at master · pydata/numexpr · GitHub, as used by Pandas/Numpy, or Patsy for design matrices support in stats, patsy/patsy/eval.py at master · pydata/patsy · GitHub or its successor Formulaic, formulaic/formulaic/utils/context.py at main · matthewwardrop/formulaic · GitHub

As seen with these libraries for their respective DSLs, it’s possible to capture Python variables with dynamic scope using sys._getframe (or equivalently with inspect, however there are many reasons we would prefer lexical scope to dynamic scope, including proper support in nested functions, including comprehensions.

jimbaker · August 12, 2024, 11:50pm

This code snippet needs to be updated both to the latest PEP 501, so t instead of i and to what we are proposing with PEP 750 (in particular, we later refined the idea of Decoded, but here it requires an explicit decode step). But it does show how this could be done:

github.com

pauleveritt/tagstr-site/blob/main/src/tagstr_site/interpolation_template.py

from dataclasses import dataclass
from typing import Any

from tagstr_site.tagtyping import Decoded, Interpolation
from tagstr_site.taglib import decode_raw


@dataclass
class InterpolationTemplate:
    raw_template: str
    parsed_template: tuple[tuple[str, str | None], ...]
    field_values: tuple[Any, ...]
    format_specifiers: tuple[str, ...]

    # optionally implement __str__ per https://peps.python.org/pep-0501/#interoperability-with-str-only-interfaces

    def __format__(self, format_specifier):
        # When formatted, render to a string, and use string formatting
        return format(self.render(), format_specifier)

This file has been truncated. show original

Right, this context sensitive rendering is a key aspect we want to support for such templating, vs needing to use explicit escaping as seen in say Jinja. The other thing that comes to mind, and should be doable with t-strings is that support memoizable parsing to an AST to provide the necessary context - in this position in HTML it’s in a child element, in another it’s an attribute, for example.

brettcannon · August 12, 2024, 11:57pm

That’s unfortunate. It would lead to everything used in type hints and all tags being imported to the object while everything else being imported to the module (for those of us who import to the module and I have strong opinions as to why you should do that ). Now, match statements have a somewhat similar restriction in terms of capture variable versus reference to a type to match against, so this isn’t unheard of.

It’s not enough to make me -0 on this, but it does make me slightly sad.

bwoodsend · August 13, 2024, 12:27am

So this would also work then?

greeting = greet"hello {name}"
name = "World"
assert greeting == "Hello WORLD!"

And that’s where the laziness is significant since without it, referencing name before defining it is a NameError?

Still really struggling to digest why that is supposed be beneficial though…

jimbaker · August 13, 2024, 12:41am

The best approach is to use a function to wrap the tag string’s evaluation with any desired local, but lexically scoped variables, vs using exec like semantics.

So write it like this:

def message(name: str) -> str:
    return template"Hello, {name}"

name = ...
print(message(name))

The key insight here is that we think that logic like this should be managed by the usual Python functionality, including the use of functions/classes for building out reusable templates for the target DSL.

However, it’s possible to exec Interpolation.expr such that it can access the lexically scoped variables as closed over by Interpolation.getvalue, but since it’s done with exec, it also has access to any desired globals or locals:
tagstr-site/src/tagstr_site/rewrite.py at main · pauleveritt/tagstr-site · GitHub (note this example of how to do the “lambda trick” is not completed updated to PEP 750).

I don’t know if this is a pattern I would recommend, but I wouldn’t rule it out given that the lambda-wrapped expression is from source code, not arbitrary input, and maybe it could be useful for some applications.