PEP 750: Tag Strings For Writing Domain-Specific Languages

godlygeek · August 11, 2024, 9:53pm

It’s similar, yeah. The important part of my idea is that tag callables should be declared explicitly, rather detected implicitly, because this allows for fewer mistakes, better error messages, and an extension point that could allow the stuff inside the quotes to be parsed differently in the future based on the needs of a specific tag.

Nietanod · August 11, 2024, 9:58pm

Do you think it is more appropriate to use a class or a callable?
I prefer the class, because you can add or overwrite special methods.

My main idea is a translation system, where you can init the output language, and override the __str__ method so your strings will output in the specified language

godlygeek · August 11, 2024, 10:25pm

Either one can address all of the concerns I’ve pointed out. The crucial part is that there needs to be some sort of explicit registration of tags (by inheriting from a specific base class, or by calling some function (maybe a decorator?) to register the tag), and that there needs to be some sort of way to request different handling from the interpreter (with registration decorator that might be keyword arguments passed at registration time, with a base class it might be methods that the interpreter can call to decide how to prepare the arguments that should be passed to the tag handler).

Nietanod · August 11, 2024, 10:31pm

You mean something like this ?

@tag(“tag_name”)
class CustomClass:
    def __init__(self, *args):
        …

And then tag_name”hello” can be read as CustomClass(“hello”)

dg-pb · August 11, 2024, 10:37pm

Love this! Recently started making more and more use of string.Formatter to simplify messy (and sometimes protect insecure) DSLs.

I quite like the ideas of a registry and/or more explicit invocation syntax.

My preference would be to have more control over namespaces.

Tagging is a potentially good idea, so that only tagged function is called.

Also, making use of factories could help re-use tags with different parameterisations. E.g.:

Factory would automatically tag all of its methods on instantiation.

@tag.Factory
class Speaker:
    def __init__(self, lang='fr'):
        self.lang = lang

    def greet(self, *args):
        salutation = 'Hello' if self.lang is 'en' else 'Bonjour'
        recipient, *_ = args
        getvalue, *_ = recipient
        return f"{salutation} {getvalue().upper()}!"

fr = Speeker('fr')
en = Speeker('en')

name = "World"
fr.greet"{name}"    # "Bonjour WORLD!"
en.greet"{name}"    # "Hello WORLD!"

This could provide namespace separation and have a more modular feel as opposed to intertwined with the whole.

charliermarsh · August 11, 2024, 10:48pm

Predictably I don’t really agree. No one ever? In my experience, using lambdas for lazy evaluation is fairly normal… It’s what I would do if I needed lazy evaluation! It’s what I’d expect from colleagues etc. in code review if they needed lazy evaluation! I believe I’ve used lambdas in this way to implement DSLs in Python in the past too.

(As an aside: many of the most popular use-cases for this feature in JavaScript leverage lambdas, like styled-components, which allows you to pass in props when generating styles. So I’d guess that some of the use-cases we’d see here will expect heavy use of lambdas too? But I don’t think you’re arguing against the use of lambdas in that sense, more as a form of deferred evaluation.)

I sense strong opinions on this though, so I’m happy to just refocus on the argument that lazy evaluation by default is…

Strictly less flexible (since you can implement lazy evaluation on top of eager evaluation with existing language features).
Strictly less intuitive given that f-strings exist (and are eagerly evaluated).
Very limiting for static analysis.

decorator-factory · August 11, 2024, 10:49pm

Sorry if this was discussed elsewhere. But will expressions like await foo() and (yield 5) be allowed inside of {}? They are allowed in f-strings, but would not be possible in the currently proposed interface.

Maybe it’s a good idea to disallow them. It seems pretty messy to include I/O inside of a string literal, and the use case for yield here is also unclear. You can always do bar = await foo() and then my_tag"Foo is: {bar}".

MegaIng · August 11, 2024, 11:02pm

No, it is exactly as flexible, since either form can easily implement the other. It’s just that your suggestion using extra lambda: requires more noise on the caller site (the more often used side) and the current proposal requires a teen-tiny amount of more work on the callee side.

oscarbenjamin · August 11, 2024, 11:20pm

I don’t like the idea that this 3rd party code controls whether elements of my tag string use deferred evaluation or not. Punting the responsibility to the author of the tag function does not mean that we should not discuss how this would likely work here. If the suggestion is that the author of the tag function will likely want to do things like if callable(obj) then to me that means that the design is bad.

charliermarsh · August 11, 2024, 11:45pm

Apologies, you’re correct that you could implement a tag that performs eager evaluation! But I think it’s strictly less flexible from a user perspective with existing language features. As a user, if eager is the default, you know that your expressions are evaluated eagerly, and there’s an existing language feature for conveying deferred evaluation. But if lazy is the default, you have no idea if your expressions will be evaluated eagerly or lazy, and there’s no existing language feature to convey eager evaluation (apart from evaluating the expression upfront).

I honestly think the burden of proof should be on lazy evaluation to prove that it is worthwhile, since it’s an explicit departure from how f-strings work today (which is the clearest mental analog for users) and from how this feature works in other languages. I think I’ll just wait to see some examples to motivate it before commenting further.

Isn’t this true in the spec today, though? The tag implementation could eagerly evaluate your expression, or it could not.

oscarbenjamin · August 12, 2024, 12:08am

When you say “the spec today” I presume that you are referring to the unaccepted PEP. I have not said that I agree with the lazy evaluation part of this. Quoting my first comment from above:

I don’t think that it is good as a language feature if 3rd party code decides whether evaluation is lazy or not. That distinction needs to be clear when looking at some code. Any proposals here that try to punt on this by saying that “the tag authors can document this” need to acknowledge the substantial problems that this will create for the basic understandability of simple code.

yoavdw · August 12, 2024, 1:29am

With all this talk about f-strings also being eagerly evaluated, combined with the explicit intention of this PEP to adhere closely to PEP 701 and how f-strings work, I think the best course of action here is:

Accept a version of this PEP with no support for lazy evaluation.
In the future, discuss a separate PEP about lazy evaluation of f-strings AND tag strings, in a consistent way.

Summertime · August 12, 2024, 9:12am

An idea for best of both worlds.

an @eager_tag decorator that goes through the arguments, and immediately calls the Interpolation.getvalue() functions, immediately rewrapping them to keep the API the same.

A type checker can then, when checking for the type of the tag, notice that it has been decorated with a decorator with the type of @eager_tag.

In turn, the checker can treat the binding as if it was an f-string if it is labeled eager, or as if it was a lambda that may be called at any time in the future if not labeled eager

Of course, something in typing should be made so people can make their own type-checker-understandable eager decorators or eager tags.

And finally this would also confer the information to those who use IDEs (Those without IDEs are less lucky, but as one of them, I’m usually going through documentation a mile a minute anyways, in that case I personally don’t think it would be much more of an additional mental cost)

On the other hand, wouldn’t value = lambda: tag'a{b}c' work for most people’s lazy needs?

At most, value = strcall(lambda: tag'a{b}c'), messy, but only needs one lambda, not a lambda per argument

DanCardin · August 12, 2024, 1:20pm

wouldnt it not though? because it would require f or rf or whatever yet-undefined builtin python variant prefix to be in scope in order to use PEP 750. Perhaps this is just how it is implemented today, but it seems at least possible in my mind to special case the built-in combinations as secretly-only-in-scope-for-tag-strings identifiers, that are no longer special cased at the parser level?

I very much like the idea behind this. The stuff like print"foo" and the various other examples seem concerning, and it’s not clear to me that one really loses anything by “manual registration” as exemplified here. There’s slightly more ceremony, but it really doesnt feel like these ought to be being defined ad-hoc anyway.

With that said, I can’t think of any other syntactical features that require the use of some builtin sentinel type in order to function. It seems more like defining some __tag_string__ dunder method would be more consistent with python’s object model. but would also have the effect of losing the ability to customize behavior by defining other dunder attributes, like you suggest in point 3.

pawamoy · August 12, 2024, 1:28pm

I found the Motivation section of the PEP confusing.

Templating in Python is currently achieved using packages like Jinja2 which bring their own templating languages for generating dynamic content. […]

Likewise, the inability to intercept interpolated values […]

Tag strings address both these problems […]

It mentions Jinja, stating a few issues with it and similar solutions, and then says tag strings will address these issues. But I don’t see how one can defer evaluation of a string with interpolated values with the solution suggested in the PEP.

The PEP does mention lazy evaluation, and I hope it would show a quick example such as (IIUC):

class LazyTagString:
    def __init__(self, *args: Decoded | Interpolation) -> None:
        self.args = args

    def __add__(self, other):
        return LazyTagString(*self.args, *other.args)

    def __str__(self) -> str:
        return "".join(str(arg) if isinstance(arg, Decoded) else str(arg[0]()) for arg in self.args)

def lazy(*args: Decoded | Interpolation) -> LazyTagString:
    return LazyTagString(*args)

name1 = "you"
name2 = "all"
concatenated = lazy"hello {name1}, " + lazy"and hello {name2}!"
print(concatenated)

But that doesn’t address at all what Jinja and the likes let you do, i.e. declare a template as a regular string, and evaluate it (render it) later. Example:

user_config = get_user_config()
print(user_config["greet_format"])  # prints 'hello {name}!'

# with Jinja (assuming {{name}} instead of {name})
env.from_string(user_config["greet_format"]).render(name="you")

# with tag strings
???

Will the standard library or the language itself provide a utility to parse a regular string into a sequence of Decoded and Interpolation instances? Would such a utility then use the scope from which it is called to create the getvalue lambdas of the interpolated values?

I know that the standard library already provides string.Template, but then I’m not sure to understand why PEP 750 mentions Jinja2, since it can’t really be compared to tag strings

Tag strings wouldn’t allow to declare your “templates” in a isolated module, since they need context (in Jinja terms) to be available in the same scope.

# mypkg/templates.py
def mytag(*args):
    # returns a lazy object that can be str()'d
    ...

home_page = mytag"""...{name}..."""

# elsewhere
from mypkg.templates import home_page

@app.get("/")
def home():
    name = get_username()
    return str(home_page)  # NameError IIUC

(Other than this, tag strings sound like a good a powerful idea! Great work on the PEP)

steve.dower · August 12, 2024, 2:42pm

There are a few topics going on, so I’ll respond to the topic rather than individuals.

Eager evaluation of interpolated values (i.e. the x+y in tag"{x+y}") is so much simpler to implement that we’d want to do that if at all possible. Deferring evaluation actually means evaluating in a different context, not just at a different time. If you think it’s easy, please go and help us get deferred evaluation of annotations sorted out, because it turns out that it’s so difficult we’ve delayed that for many releases at this stage. Eager evaluation would mean that the value of the expression is calculated in the context of the function, as if the author had written it as an assignment on the previous line.^[1]

Arbitrary expressions as tags sounds pretty difficult. I’ll gladly defer entirely to @pablogsal on this one, and if he’s not enthusiastic about it, then I’m not either.

Nominal subtyping has no precedent in Python for this kind of thing. We’d do it with a protocol, which means a __something__ dunder method that is called when the object is used as a tag. If the method doesn’t exist, you’ll get a TypeError at runtime.^[2]

Backticks are traditionally banned from use in future language features, due to the small symbol. No reader should need to distinguish ` from ' at a glance. It’s entirely possible that the prevailing opinion on this has changed, but it’s certainly going to be easier to stick to the letter prefixes and regular quotes.

Non-string results seem fine to me. I quite like Paul’s example of a decimal literal, and I can see great convenience in constructing XML literals for ElementTree, AST literals, or similar DSLs which would be wasted if we were to require string results. I expect we’re quite likely to see these used for strings with no substitutions as well, such as the py" example mentioned, and so it may be useful to address how a tagged string with no interpolations is meant to be handled, and whether (if it remains a normal callable) tag objects should be expected to be applied to string variables as well (e.g. do/should both xml"<a></a>" and xml("<a></a>") behave the same?)

Using tag"{lambda: x+y}" with a tag that knows how to handle it is just fine. The semantics are complicated, and so the caller needs to know about it. ↩︎
Type checkers can do whatever they want to warn you ahead of time, but that’s not how we implement the actual functionality. ↩︎

pf_moore · August 12, 2024, 3:03pm

I thought the point here is that there’s already an implementation of the PEP, which (I assume) implements the lazy evaluation capabilities that are being discussed here. So I don’t think arguments that implementing lazy evaluation is hard make sense (except in terms of “the code to do this is hard so there’s a maintenance cost”).

It may well be that the semantics that are currently implemented have problems - that’s something that could be demonstrated using the current implementation.

Note: I still don’t have a strong opinion on whether lazy evaluation should be the default. But I will note that if the PEP does want to provide capabilities similar to Jinja, then deferred evaluation is needed (to allow a tag that produces an object that can be rendered at some future time, with a supplied context).

charliermarsh · August 12, 2024, 3:20pm

Do you mind spelling this out in just a little more detail?

steve.dower · August 12, 2024, 3:32pm

The current implementation seems to capture the source variables rather than the result of the expression, which of course leads to this behaviour:

def ident(o):
    return o

items = [ident"{x}" for x in range(10)]
strs = [o.getvalue() for o in items]
strs
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

I didn’t go too far into edge cases around comprehensions, async, and the like, but I would assume they exist based on this implementation.

Compare to this behaviour, which is what I mean by eagerly evaluating each expression:

class ident:
    def __init__(self, x):
        self.x = x
    def getvalue(self):
        return self.x

items = [ident(x) for x in range(10)]
strs = [o.getvalue() for o in items]
strs
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

(Edit again). And the most compelling example I have for doing things eagerly:

def ident(o):
    return o

with lock:
    s = ident"{calculate_under_lock()}"

s.getvalue()

How do you communicate to users that their expressions aren’t calculated when/where they think they will be? The complexity goes way up for each use, for not a lot of real benefit. Evaluating the expressions in the context they are written saves all the trouble.

pf_moore · August 12, 2024, 3:36pm

I’m mostly just looking at the Jinja example @pawamoy posted above, but basically, behaviour corresponding to Jinja templates would mean having something like template"Hello, {name}" evaluate to an object with a .render(name=xxx) method. So you’d be able to do something like

message = template"Hello, {name}"
name = input("What is your name? ")
print(message.render(name=name))

Maybe I’m missing something, and there’s no way of evaluating an Interpolation object with a custom locals() dictionary. If so, then this example isn’t possible. But that also calls into question how the PEP intends to be address the sort of problems Jinja2 does, as stated in the “Motivation” section (which is the point @pawamoy was making).