PEP 750: Tag Strings For Writing Domain-Specific Languages

Maintainer of Jinja here. There’s a few defining aspects of Jinja that I’m unsure about with tagged strings.

Jinja templates are arbitrary strings written ahead of time. For example, they’re often used for static site generators, where the user writes templates for their pages as individual files, never touching Python. The fact that the templates are separate from Python also allows them to be rendered by other implementations in other programming languages. We would need some sort of parse_tagged_string(string) function to take an arbitrary string and turn it into that sequence of decoded | interpolation objects.

Assuming we could turn an arbitrary file into a tagged string and then use the tag function to further parse it, we could do what Jinja does and compile to Python, cache the template and compiled bytecode, reference one template from another to extend/include/import, etc.

Lazy evaluation is a must not only for deferring the entire call, but also to support control structures like if and for. You don’t want to evaluate expressions within control blocks unless the blocks are actually entered during that render.

You also want to be able to pass in different values for different renders of the same template. I’m not clear how you’d store a tagged string for rendering multiple times, or how you’d pass in different values to the lazy expressions for each render.

DSLs would require the ability to define tokens, parse them, and do things based on them. But tagged strings only identify “static string” parts and “Python expression” parts. It’s certainly nice that you don’t have to parse the contents of the expression parts (Jinja has to do this and basically parses a subset of Python.) But you still have to define the syntax, parsing, and execution for the static parts.

Again assuming we have parse_tagged_string, we also need to be able to inspect the expressions to make sure they’re safe. Jinja has a sandboxed environment that effectively allows rendering templates from untrusted source, by disallowing arbitrary attribute access, etc. So we’d need to be able to evaluate that hello {world.__code__.__globals__["eval"]("evil")} or whatever that common breakout example is isn’t written so that we don’t execute it.

I think tagged strings are still valuable as a tool to perform processing on values before string interpolation, such as escaping for HTML or SQL. But I’m not sure I understand how I’d make something like Jinja with them, at least in a significantly easier or cleaner way.

19 Likes