PEP 750: Tag Strings For Writing Domain-Specific Languages

jimbaker · August 13, 2024, 1:05am

So obviously the lambda-wrapped expression of Interpolation.getvalue can be captured and used later, with possible confusion because of how lexical scope closes over such names. Such power should be used carefully. Tag functions are more like decorators. Decorators can radically modify the behavior of the decorated function. They are still extremly useful however, even if they require care in their design and implementation.

In this particular case, I don’t think it’s idiomatic even if it’s only very limited, because you wouldn’t expose the getvalue in code that uses the tag string.

The more usual way to write it would be for the tag function to immediately call or not getvalue; and if that’s not the case, for the user of the tag string to do something with it (or not). This avoids this whole class of problems.

We want to support deferred evaluation because it allows for writing a tag function like struct_log, which is like a lazy f-string, except that it evaluates its interpolations if rendered, either with __str__ (so works with the standard logging.LogRecord.getMessage) or with a custom formatter to directly support JSON struct logging. See https://github.com/python/peps/pull/3858/#issuecomment-2252818275 which explores this idea. Now instead of developers using f-strings in logging because it’s convenient, they can use an equivalent convenient variant - but it does the right thing.

jimbaker · August 13, 2024, 1:24am

Fortunately we were able to make use of the substantial work done with respect to annotations, specifically annotation scope, to get this to work correctly:

The actual implementation by Dave Peck is straightforward:

We particularly appreciate Jelle Zijlstra 's help in identifying this problem as part of the initial PEP editing process; PEP 750: Tag Strings For Writing Domain-Specific Languages by jimbaker · Pull Request #3858 · python/peps · GitHub

PeterL · August 13, 2024, 1:38am

I too am not convinced it’s significantly better than a function. What am I missing? Apart from not having to type parenthesis. Functions are clear and well understood (I am coming at this from a teacher’s perspective.)

ncoghlan · August 13, 2024, 1:44am

While it isn’t shown in any of the PEP 501 examples (they’re too simple), I think context aware parsing would end up occurring via classes that accepted template literal references and then called render with the rendering callbacks pointed at bound methods on the class instance. You could do the same thing with closures, but I’d expect the code to be clearer and easier to reuse with instance methods (just as it is in PEP 750).

I spent some time last night trying to work out what would need to change in the PEP 750 HTML example in a PEP 501 world, and the PEPs honestly feel pretty isomorphic to me (once the render_text callback is added to 501), since the only fundamental difference is whether the signature of rendering functions is a structured TemplateLiteral object or a flat list of fields containing the same information.

While there are currently other differences in exactly how the two PEPs represent the template info, the Decoded and Interpolated protocols could be used inside a TemplateLiteral rather than being passed as a flat list, and Interpolated objects could definitely be evaluated eagerly rather than lazily, so I see all of those as cases where whichever design decision we make can apply to either surface syntax.

Ironically, my own thinking at the surface syntax level mirrors the primary feedback I got back when I first proposed interpolation templates: given that tagged strings are a generalisation of template literals in a similar way to the way that template strings are a generalisation of formatted strings, perhaps we should pursue the narrower proposal first and see if we genuinely feel the absence of the more general proposal?

The other situation that feels somewhat analogous is when we decided that adding matrix multiplication was a better idea than adding support for custom infix operators. I’ve long been convinced that template literal support would be valuable. I’m far from convinced that we need a novel call syntax solely for rendering template strings to other objects (saving three characters per call site: the parentheses on the function call and the t prefix on the template literal).

barry · August 13, 2024, 2:11am

I’m finally back from vacation and have had a chance to read ^[1] the PEP and this thread. I appreciate the deep thought and effort that went into the PEP and reference implementation, and I’ve been talking with Jim about this idea on and off for a few years, but I have to say with all due respect, that I’m unconvinced this is a good idea.

For background, I pioneered internationalization in one of the first Python applications to support it (GNU Mailman). For that, I wrote string.Template (and PEP 292) and sys._getframe(), maintained pygettext before GNU gettext supported Python, and am the author of the flufl.i18n library for sophisticated i18n support in Python programs.

Early on there were conversations about using i as a “tag” ^[2] for translatable strings. This preceded f-strings by several years, and would have been difficult to implement at the time, so for that reason and others, I adopted the C/gettext convention of using the _() function. Note though that for i18n’d Python programs using flufl.i18n this is just a convention and not a requirement. Likewise, string.Template’s convention (and thus default) of using $placeholder syntax was a deliberate choice for simplicity, as described in PEP 292, and is also customizable by subclassing.

So putting all that together, what I find lacking in PEP 750 is a strong argument explaining why function calls are not sufficient for the use cases this PEP intends to enable.

I have to also admit to cringing at the syntax, and worry deeply about the cognitive load this will add to readability ^[3]. I think it will also be difficult to reason about such code. I don’t think the PEP addresses this sufficiently in the “How to Teach This” section. And I think history bears out my suspicion that you won’t be able to ignore this new syntax until you need it, because if it’s possible to do, a lot of code will find clever new ways to use it and the syntax will creep into a lot of code we all end up having to read.

From my experience with i18n, I also don’t think you will actually be able to use PEP 750 tag strings for i18n, for the simple reason that string fragments cannot be translated. At least, not for every human language you would want to support. The only way to do that is to translate full sentences, because placeholder positions can change in the translated string, and because there are some languages where sentence fragments simply do not make sense. I could be misunderstanding how tag strings work, but let’s say that you wanted to use this for i18n and actually wrote an i tag for these, if you had a string like i"The $ordinal test message $name" ^[4], if your translation layer and your human translators didn’t have access to the full source string, but only fragments, the source string would effectively be untranslatable.

I’ll also mention that flufl.i18n supports deferred translations which I think – but am not positive – are equivalent to the PEP’s lazy evaluation proposal.

This post is already probably too long, so I’ll leave it there for now.

er, um, detailed skim ↩︎
in the parlance of this PEP ↩︎
I say this as someone who also cringes a bit at some of the typing syntax such as PEP 695 subscripts ↩︎
from an example in the flufl.i18n user guide ↩︎

Summertime · August 13, 2024, 9:38am

On the topic of control flow: f-strings and tagged strings don’t explicitly need additional support,

items = ['a','b','c']
print(f'''<!doctype html><title>a</title>{
    ''.join(f'<div><p>{item}</p></div>' for item in items)
    if items else
    f'<p>No items!</p>'
}<footer></footer>''')

And just like inline-if in python, the path not taken does not get evaluated. Ergonomics arn’t great though.

Though I feel if one is butting into try-to-not-use-this-syntax syntax (lambda/inline-if) to do relatively trivial stuff, then maybe those approaches are not a good fit for python.

Interpolation.expr mischief

If Interpolation.expr and deferred evaluation is added, I’d be tempted to try make the following work

html'''<!doctype html><title>a</title>
    {item for item in items}
        <div><p>{item}</p></div>
    {endfor}
<footer></footer>'''

Which I feel would be horrible to make, but also painful to not have as part of every tag function

pablogsal · August 13, 2024, 11:19am

Just to be clear on my assessment: it can likely be done but I think it won’t be pretty and it has maintenance considerations.

For context, the last time we hacked the lexer to have some grammatrically-semantic behaviour (the field was literally called “async_hacks” ) it proved how tricky to maintain that can be and how many hidden bugs can happen. That hack has been making our life more difficult from time to time when implementing lexer improvements and we only recently removed it so I would prefer not to make the same mistake if possible.

devdanzin · August 13, 2024, 11:58am

Summer:

Interpolation.expr mischief

If Interpolation.expr and deferred evaluation is added, I’d be tempted to try make the following work
html'''<!doctype html><title>a</title>
    {item for item in items}
        <div><p>{item}</p></div>
    {endfor}
<footer></footer>'''
Which I feel would be horrible to make, but also painful to not have as part of every tag function

Would “strings that you somehow parse later” count as somewhat deferred?

Using an existing templating engine

Passing strings as deferred to be parsed but also using eager evaluation and custom processing in the tag function.

from templite import Templite

def templite(*args):
    output = []
    last_interpolation = None
    for arg in args:
        if hasattr(arg, "getvalue"):
            if arg.expr == "special_sauce" and arg.conv == "replace" and last_interpolation:
                value = f"<div><p>{last_interpolation.getvalue()}? That's good stuff!</p></div>"
            elif arg.conv == "deferred":
                value = f"{{{{{arg.getvalue()}}}}}"
            else:
                value = arg.getvalue()
            last_interpolation = arg
        else:
            value = arg
        output.append(value)
    return Templite("".join(output))

special_sauce = "<div><p>YUMMY</p></div>"
t = templite'''<!doctype html><title>a</title>
    <h1>{"deferred_name":deferred}</h1>
    {"{% for item in items %}"}
        {special_sauce:keep}
        <div><p>{"{{item|upper}}"}</p></div>
        {special_sauce:replace}
    {"{% endfor%}"}
<footer></footer>'''

deferred_name = "TITLE"
print(t.render({
            "upper": str.upper,
            "items": ["Python", "Geometry", "Juggling"],
            **globals(),
        }))

Which outputs:

<!doctype html><title>a</title>
    <h1>TITLE</h1>
    
        <div><p>YUMMY</p></div>
        <div><p>PYTHON</p></div>
        <div><p>PYTHON? That's good stuff!</p></div>
    
        <div><p>YUMMY</p></div>
        <div><p>GEOMETRY</p></div>
        <div><p>GEOMETRY? That's good stuff!</p></div>
    
        <div><p>YUMMY</p></div>
        <div><p>JUGGLING</p></div>
        <div><p>JUGGLING? That's good stuff!</p></div>
    
<footer></footer>

pauleveritt · August 13, 2024, 12:11pm

Thanks for the thoughtful reply. We moved more explanation to companion material, particularly for HTML. For your “strong argument explaining why function calls” part, would it help if I explained what I’m interested in doing with this?

steve.dower · August 13, 2024, 12:13pm

I expected this, but unless it’s made a requirement (i.e. “getvalue only works during execution of the tag function”), we have to account for developers capturing the object and calling getvalue later on.

Obviously my examples are not realistic. But most of the rest of the discussion here have been around producing the final result lazily which have very much been intended to call getvalue long after the tag function has run.

The equivalence to decorators here is in the behaviour of additional parameters beyond the function itself.^[1] Those behave exactly like normal parameters, which makes clear the scope that they’ll be evaluated in and the point of execution they’ll be evaluated at. I would prefer tag function interpolated expressions behave like this. The current behaviour is not equivalent to decorators, and that’s the problem.

That is, the a in @decorate(a) // def f(...), rather than the f. ↩︎

pf_moore · August 13, 2024, 3:20pm

I think we need to remember the “consenting adults” principle here. The question shouldn’t be “is it possible to do horrible things with this?” but rather “does this enable useful behaviour that is otherwise tricky or impossible?”

If it doesn’t enable anything new, we shouldn’t do it. Only if it does, do we need to consider whether the downsides outweigh the benefits - and while “people can write unmaintainable code or confusing APIs with this” is a downside, it’s generally not a major one unless there are non-obvious issues with otherwise reasonable looking code.

steve.dower · August 13, 2024, 3:33pm

The latter question doesn’t help its case, because it’s very easy to get the equivalent behaviour: put lambda: at the start of the expression.

t"{x + y}" # evaluate "x + y" and pass the result into the tag function
t"{(lambda: x + y)}" # pass a callable to evaluate "x + y" later into the tag function

Of course, there are differences in the proposed semantics, in that lambda has consistent, well-understood semantics surrounding its closure while the current proposal does not (but could).

Reframing the question the other way, how would you make [t"{x}" for x in range(10)] capture each value of x rather than just the final one, and why is it obvious that you need to do it? I’d argue that in this case, eagerly evaluating the expression enables useful behaviour that is otherwise tricky, and is not obviously opt-in (because this looks like an f-string, and f-strings evaluate eagerly).

If the intent is “people should call getvalue straight away,” then what is lost by having already evaluated the expression? They can still ignore the result and just use the expression text if that’s what they wanted - it’s not substituted eagerly.

And we end up with semantics closer to a function call than an inner function definition, which are much simpler to reason about and debug.

zware · August 13, 2024, 3:38pm

Brénainn Woodsend:

So this would also work then?
greeting = greet"hello {name}"
name = "World"
assert greeting == "Hello WORLD!"
And that’s where the laziness is significant since without it, referencing name before defining it is a NameError?

With the implementation of greet given in the PEP, this example would raise a NameError("name 'name' is not defined") on line 1^[1]. However, with an implementation like (but better than ) the following, the assert would hold^[2]:

class Greeting:
    def __init__(self, getvalue):
        self.getvalue = getvalue

    @property
    def _value(self):
        if not hasattr(self, '_real_value'):
            self._real_value = self.getvalue()
        return self._real_value
        
    def __str__(self):
        return str(self._value)

    def __eq__(self, other):
        if isinstance(other, str):
            return str(self) == other
        return NotImplemented


def greet(*args):
    """Tag function to return a greeting with an upper-case recipient."""
    salutation, recipient, *_ = args
    getvalue, *_ = recipient
    return Greeting(lambda: f"{salutation.title().strip()} {getvalue().upper()}!")

Lazy evaluation is an option available to tag functions that return something other than a str, not a requirement of the syntax.

Tested in one of the playgrounds linked above ↩︎
Also tested in the same playground with different variable names ↩︎

MegaIng · August 13, 2024, 3:38pm

This is a reoccurring suggestion. Am I missing something that and this is actually a common API? Because AFAIK, no public framework or library uses this pattern for anything.

And if it isn’t being used, but you think it’s an acceptable alternative to this kind of deferred evaluation, can you tell my why it’s not being used anywhere?

steve.dower · August 13, 2024, 3:46pm

I guess it turns out that deferred evaluation isn’t really necessary much of the time Every time I’ve needed it, I’ve used a plain callable, and a lambda is how you define a plain callable in-line.

Can you show examples of the other patterns that are currently being used? I can’t actually think of another way to do it in Python that isn’t significantly more complicated (such as requiring a particular class instance or an iterable rather than a callable object).

And those ways will work equally well here as eagerly evaluated expressions, since the rest of Python already operates on eagerly evaluated expressions. So they must.

MegaIng · August 13, 2024, 4:00pm

No, which is why I am in favor of adding new, nice syntax using tag strings. IMO it’s ok to say “deferred evaluation is unnecessary”, but please stop suggesting that lambda: is an acceptable convention: Because it isn’t, otherwise it would already be used. As to why it isn’t an acceptable convention, I am sure we could have a long discussion about it, but I don’t think it’s really on-topic here.

jimbaker · August 13, 2024, 4:11pm

The key difference is that one can write in the target DSL, but with interpolations.

For HTML, there’s generally a straightforward mapping of a tag - name, attributes, children - to a functional representation. Libraries often have very well designed internal DSLs - using Python directly to express their ideas. So Pandas or Z3Py work really well, although there are some hard edges such as the use of & and | in Pandas expressions - and the fact that their operator predence is different than and and or.

But consider other languages like SQL, regexes, Wilkinson notation (seen notably in R formulas), etc. Obviously any language can be written as a composition of functions, or perhaps using a builder approach. But how should this be written? With tag strings, you can directly use the target DSL, generally with existing parsers by substituting in suitable placeholders for the interpolations, parse to AST, then walk/compile it. And that chain (or at least much of it) can be fast, in part because of memoizing, much as we see with built-in DSLs like Python’s re.

barry · August 13, 2024, 4:32pm

Very much so!

Is my assessment of the PEP correct in this regard? If so, I think translators will be frustrated by PEP 750… unless the proposal can be augmented to also give access to the full, original source string.
Perhaps a third type of object in the *args to the tag function, or a Decoded at the start of *args that contains the full source string. A unique type might be better for structural pattern matching purposes. For the original source string, I wouldn’t do the raw.encode("utf-8").decode("unicode-escape") dance, I’d just return exactly what you got and let the tag function do the encode/decode if needed (e.g. an i18n implementation would likely just hand over the source string to the appropriate catalog and then do whatever it wanted with the translated string).

Do you think that because tag functions are just “regular” functions, that existing functions that never expected to be used as tag functions could end up being misused somehow? Would it make sense to be explicit about what could be used as a tag function and what cannot, e.g. through the use of a pre-defined @tag_function decorator perhaps?

pf_moore · August 13, 2024, 5:02pm

I’m going to repeat myself here, but evidence over the years has shown that “just add lambda: at the start to get lazy evaluation” is not a practical answer. People don’t, or won’t, do that. It’s not logical (the major objection is that “it looks ugly”) but aesthetics play a big part in API design, and ignoring that won’t make it go away.

I don’t know (either how you’d do it, or why it’s obvious). Artificial examples like this are great for exploring edge cases of semantics, but they don’t capture the question of “what do people want to do with this syntax”.

As usual here, I’m advocating starting from use cases, rather than technical details. I’m still looking myself for use cases where lazy evaluation is useful, but I can imagine they exist. My personal go-to example for this is SQL, where I’d want sql"select field + {offset} from tbl" to translate to a prepared statement that encapsulated "select field + ? from tbl" along with something that pulled the bind variable from the offset variable in the surrounding code. But I haven’t worked it through far enough yet to decide whether I’d rather have offset lazily evaluated. I think so (so that the SQL statement is reusable) but I’d need to think about how I’d use the result - and I haven’t had time to do that yet.

steve.dower · August 13, 2024, 5:17pm

The fully worked example with the proposed semantics that wouldn’t work under my semantics is this:

stmt = sql"select field + {offset} from tbl"
for offset in list_of_offsets:
    db.exec(stmt)

Anything involving passing offset explicitly is still possible. Anything involving picking up offset from a different scope is still not possible. The only thing the semantics here enable are to instantiate the tagged DSL string and then implicitly substitute a parameter from the original scope the tagged string was defined in at the time it is used rather than the time it was defined. This is an entirely new kind of variable passing for Python. (An additional feature is the ability to defer an entire function call until it’s used, which I agree is nice, but also somewhat unprecedented and so deserves better than just “it’s nice”.)

We went quite deep into this a few years ago and the ugliness concerns were secondary to the recipient having to explicitly realise the value (i.e. call it). The only acceptable design was to automatically fetch the lazy value when needed, but it turns out we can’t really do that (see also the lazy imports discussions), and so it went no further.