PEP 750: Tag Strings For Writing Domain-Specific Languages

MegaIng · October 21, 2024, 7:03pm

Please don’t significantly edit messages after people have responded to them. I am not going to engage further, I don’t think there is any value in that.

dkp · October 21, 2024, 7:07pm

Sorry; I wanted to edit for clarity.

I apologize if editing previous posts is unwelcome here. I would generally only do it when clarification seems necessary but I’ll refrain in the future.

I think your point still stands? (To be clear: I originally mentioned the explicit lack of support for laziness in the PEP in my third paragraph, not my second. The original second paragraph didn’t seem to add anything, so I removed it. I also modified the first sentence to be less glib; it’s not a workaround depending on what you’re looking for.)

zuo · October 21, 2024, 11:18pm

I am excited about this PEP

Yet, a few things seem worth fixing or clarifying:

(1) What about hashability of Template and Interpolation instances? (see the post by @Ilotoki0804) Considering the equality rules defined in the PEP, I suppose that Template hashability should depend on hashability of component Interpolation instances, and their hashability should depend on hashability ~~of value of theirs~~ [EDIT] of all four attributes of theirs. Another option is to resign from hashability. Yet another option is to resign from the current definition of equality, replacing it with equality (and hashing) based on object identity (like the default behavior for user-defined classes).
(2) Ad debug specifier (=): the PEP says that t'{expr=}' is treated as t'expr={expr}', but to be consistent with f-strings, it should be treated as t'expr={expr!r}'. EDIT: in fact, it is more subtle…
(3) What about Template.__str__()? I suppose it behaves like __repr__(), i.e. – in particular – does not provide any “default” way to render the template in an f-string-like manner (that would pose a risk that an unconscious programmer could effectively obtain the f-string-like behavior while some t-string escaping would be required for security; and they’d have a false sense of security – just “because I used t-strings, so I am safe, am I not?”).

gcewing · October 22, 2024, 12:40am

I’m finding terms like “proper laziness” and "true laziness’ it a bit
unhelpful. It’s clear that different people mean different things by
“lazy”, but it’s not clear that one is any more “true” than another.

If I understand correctly, the distiction is about lexical vs. dynamic
scoping. In the PEP, the template arguments are evaluated eagerly and
lexically scoped, whereas some people want them to be not only evaluated
lazily but also dynamically scoped.

Dynamically scoped lazy evaluation is a concept that doesn’t really
exist in Python right now. It feels like something that should be doable
without requiring new syntax, but there isn’t quite enough introspection
capability for it. We have globals() and locals(), but nothing that
captures the entire lexical environment including intermediate scopes.

Suppose we had a function, let’s call it environment(n), that returns a
mapping you can look up to find the value of a name as though it were
written into the code n frames up from the point where environment() was
called. Then the functionality of dynamically-scoped templates could be
implemented using code that parses an ordinary format string and looks
up the names appropriately.

I would be far more supportive of a PEP for such an environment function
than any kind of template string syntax. It’s a smaller unit of
functionality that has many more potential uses, and doesn’t require any
new syntax.

ncoghlan · October 22, 2024, 1:57am

With a couple of tweaks, the PEP 501 template rendering algorithm works for already composed PEP 750 templates:

def format_template(t):
  rendered_segments = []
  for segment in t.args:
    match segment:
      case str() as s:
        rendered_segments.append(s)
      case Interpolation() as field:
        rendered_segments.append(
          field.format_value()
        )
  return "".join(rendered_segments)

(assuming the field formatting helper I suggested above is included - you can write your own either way, it’s just annoying)

Omitting an obviously accessible implementation of this is intentional, since it’s a potential security trap for some expected template use cases (writing this renderer is a good learning activity, but using it bypasses the intended security benefits of using structured templates to separate trusted and untrusted data segments).

The harder part of your question is specifying that the interpolation values should be filled in later. This is the problem I was writing about in PEP 750: Tag Strings For Writing Domain-Specific Languages - #224 by ncoghlan

(I don’t think the exact solution to this problem needs to be in PEP 750 itself. I do think the PEP should point out that it does provide the pieces needed to solve the problem later)

Edit: some thoughts on what solving the problem later might look like:

a fast, non-customisable alternative to string.Formatter in templatelib that produces a template with strings as the values and quoted strings as the expression fields (as if you had written a template with a quoted string in every field)
something like the replace_values method in my linked message

anentropic · October 22, 2024, 9:19am

in that case there’s no point in using lambdas in the template string then?

it seems like the lambda workaround doesn’t solve anything

zuo · October 22, 2024, 10:59am

A useful approach could be to make it possible to have “unbound” replacement fields – perhaps by specifying them just as escaped, i.e., as parts of literal string parts (using {{ and }} escaped delimiters):

title = "Professor"
# Below `{title}` is a normally bound replacement field,
# and `{{name}}` is "unbound" (i.e., just escaped):
raw = t"select title, name from lecturers where title={title} and name={{name}}"
# Just now the rendering result would be (obviously) *wrong* (but see below...):
assert sql(raw) == "select title, name from lecturers where title='Professor' and name={{name}}"

assert len(raw.args) == 3
assert isinstance(raw.args[0], str)
assert isinstance(raw.args[1], Interpolation)
assert isinstance(raw.args[2], str)
assert template.args[1].value == "Professor"

# And later...
raw_complete = raw.bind(name="John Doe")
# Now the result is correct:
assert sql(raw_complete) == "select title, name from lecturers where title='Professor' and name='John Done'"

assert len(raw.args) == 5
assert isinstance(raw.args[0], str)
assert isinstance(raw.args[1], Interpolation)
assert isinstance(raw.args[2], str)
assert isinstance(raw.args[3], Interpolation)
assert isinstance(raw.args[4], str)
assert template.args[1].value == "Professor"
assert template.args[3].value == "John Done"

…or perhaps by specifying them using some dedicated form of an unbound replacement field (but this would require to extend the PEP, of course), e.g.:

# Below `{title}` is a normally bound replacement field,
# and `{:name}` is an unbound replacement field (!):
raw = t"select title, name from lecturers where title={title} and name={:name}"
# This will raise an error (not all fields have been bound yet!):
sql(raw)

assert len(raw.args) == 5
assert isinstance(raw.args[0], str)
assert isinstance(raw.args[1], Interpolation)
assert isinstance(raw.args[2], str)
assert isinstance(raw.args[3], UnboundInterpolation)  # new type...
assert isinstance(raw.args[4], str)
assert template.args[1].value == "Professor"
assert template.args[3].field_name == "name"  # ...with attributes specific to it

# And later...
raw_complete = raw.bind(name="John Doe")
assert sql(raw_complete) == "select title, name from lecturers where title='Professor' and name='John Done'"

assert len(raw.args) == 5
assert isinstance(raw.args[0], str)
assert isinstance(raw.args[1], Interpolation)
assert isinstance(raw.args[2], str)
assert isinstance(raw.args[3], Interpolation)
assert isinstance(raw.args[4], str)
assert template.args[1].value == "Professor"
assert template.args[3].value == "John Done"

EDIT: improved and more “cross-sectional” proposals are in my later post.

ncoghlan · October 22, 2024, 12:47pm

On reading your post, it occurred to me that we wouldn’t need new syntax for that, it could just be a convention on the template processor side that took advantage of two features of the existing syntax:

... is a valid Python expression
:some arbitrary text:<the actual format string> is a valid format string definition

This means

with_placeholders = t"select title, name from lecturers where title={...:title} and name={...:name}"

would put Ellipsis in the interpolation field values, "..." in the expression fields, and "title" and "name" respectively in the format_spec field.

Given that convention, you could write a post-processor that moved the implicitly quoted string portion into the value field when the field value was a literal ellipsis:

def parse_placeholders(t: Template) -> Template:
  segments = []
  for segment in t.args:
    match segment:
      case str() as s:
        segments.append(s)
      case Interpolation(Ellipsis, "...", conv, format_spec):
        value, _, format_spec = format_spec.partition(":")
        expr = f"...:{value}"
        field = Interpolation(value, expr, conv, format_spec)
        segments.append(field)
      case Interpolation() as field:
        segments.append(field)
  return Template(*segments)

Nineteendo · October 22, 2024, 2:45pm

Can’t this be simply implemented using string methods? (We can still bike shed over the name) Corresponding to str.format() and str.format_map().

assert "Hello {name}!".template(name="World") == t"Hello {"World"}!"
assert "Hello {name}!".template_map({"name": "World"}) == t"Hello {"World"}!"

zuo · October 22, 2024, 2:51pm

Interesting idea!

Though, IMHO a more useful tool than the parse_placeholders() one you proposed, would be something along the lines of the following:

def bind_template_fields(t: Template, **fields_to_bind: Any) -> Template:
  segments = []
  for segment in t.args:
    match segment:
      case str() as s:
        segments.append(s)
      case Interpolation(Ellipsis, "...", None, expr) as unbound_field:
        field_spec, _, format_spec = expr.partition(":")
        field_name, _, conv = field_spec.partition("!")
        if field_name not in fields_to_bind:
          segments.append(unbound_field)
          continue
        if not conv:
          conv = None
        elif conv not in ('a', 'r', 's'):
          raise ValueError("invalid conversion character: "
                           "expected 's', 'r', or 'a'")
        value = fields_to_bind[field_name]
        field = Interpolation(value, expr, conv, format_spec)
        segments.append(field)
      case Interpolation() as field:
        segments.append(field)
  return Template(*segments)

That is, it would produce a new Template object with the given field values “bound” to it – as if they were there from the beginning.

And it could be even more useful, if it was available as a Template’s method – perhaps named bind?

EDIT: improved and more “cross-sectional” proposals are in my later post.

zuo · October 22, 2024, 3:04pm

Nice Zombies:

Can’t this be simply implemented using string methods? (We can still bike shed over the name) Corresponding to str.format() and str.format_map().
assert "Hello {name}!".template(name="World") == t"Hello {"World"}!"
assert "Hello {name}!".template_map({"name": "World"}) == t"Hello {"World"}!"

Another possibility would be to make the Template’s constructor behave like that:

assert t"Hello {"World"}!" == Template("Hello {name}!", name="World")
assert t"Hello {"World"}!" == Template("Hello {name}!", {"name": "World"})

Then the current constructor would become a classmethod from_segments():

assert t"Hello {"World"}!" == Template.from_segments(
    "Hello",
    Interpolation("World", '"World"', None, ""),
    "!",
)

(Then, probably, it would also be worth renaming the attribute args to segments…)

EDIT: for a more comprehensive proposal, see my later post.

zuo · October 22, 2024, 3:09pm

PS Note that the ideas from the above two posts could co-exist.

dkp · October 22, 2024, 4:16pm

Agree, we still need to add this to the spec. (github issue) I think it will fall out in the straightforward way (Template is hashable if and only if all interpolation values are also hashable) but we’re waiting to make sure nothing comes up in the prototype cpython branch.

Ah, that’s a good catch. Thanks! (github issue).

Agree, we need to explicitly mention this in the spec. (github issue). And yes, it’s just __repr__() for the exact reasons you suggest.

dkp · October 22, 2024, 4:50pm

Hah, using ellipsis in this way is a fun and very clever hack!

Likewise for bind_template_fields()!

I just added a small new example along these lines to the pep750-examples repo.

In particular, it defines a Binder class that takes a Template in the constructor and provides a bind(self, **kwargs) -> Template method similar to bind_template_fields(). Rather than using Ellipsis, I use Cornelius’ suggestion elsewhere of quoting interpolations.

This test passes:

def test_binder():
    template: Template = t"The {'cheese'} costs ${'amount':,.2f}"
    binder = Binder(template)
    bound = binder.bind(cheese="Roquefort", amount=15.7)
    cheese = "Roquefort"
    amount = 15.7
    assert bound == t"The {cheese} costs ${amount:,.2f}"

There’s also a related Formatter class that provides a format() method; sort of an imperfect answer to str.format().

dkp · October 22, 2024, 5:07pm

Just a follow-up: I added a github issue to track this.

Because t-strings, like f-strings, eagerly evaluate their interpolations, I tend to think of them less as “partially evaluated f-strings” and more as “evaluated f-strings before rendering to string”. But that’s subtle. I suppose it does place them in a somewhat new corner of the (growing) Venn Diagram of approaches to string formatting in Python.

dkp · October 22, 2024, 5:29pm

Yes. Sorry; I should have mentioned that.

Stepping back: if what we want is to re-use templates multiple times with different interpolated values, the cleanest approach is just to wrap our t-string in a callable. No need for lambdas in the t-string itself:

from templatelib import Template

def cheese(name: str, category: str) -> Template:
    return t"{name} is {category}"

roquefort: Template = cheese("Roquefort", "blue")
limburger: Template = cheese("Limburger", "stinky")

# This assert passes
name = "Roquefort"
category = "blue"
assert roquefort == t"{name} is {category}"

That’s maybe not so interesting and is basically no different than how f-strings get “reused” in python today.

I think lambdas (or callables in general) in interpolations have a different set of uses. For instance, imagine you want to define a template but only later want to decide which parts of it to render to string. We could implement a format_some() method that takes as input a “selector” and a t-string:

template: Template = t"{(lambda: 'roquefort'):blue} {(lambda: 'limburger'):stinky}"
assert format_some("blue", template) == "roquefort ***"  # the second lambda isn't called
assert format_some("stinky", template) == "*** limburger"  # the first isn't called

This might be useful in a logging pipeline, for instance.

I find lambdas in interpolations to be awkward syntax, but referring to callables directly seems like it might be common. Of course, t-strings are only as good as the code that does something with them (like converting them into strings, or parsing them into ASTs); the code that processes the t-string needs to expect a callable in Interpolation.value and do something useful with it.

If we’re looking for something closer to str.format(), t-strings (like f-strings) offer no direct analogue. t-strings, like f-strings, eagerly evaluate their interpolations and have lexical scope. Because strings sent to str.format() are just strings, they can refer to any name, including names not in scope. (I do like the fun hacks others have cooked up elsewhere in the thread.)

Finally, an earlier version of this PEP introduced the idea of “implicit lambda wrapping”: that is, wrapping all interpolations in lambda functions without requiring the syntax. We ultimately rejected this approach as too problematic, although it did enable a number of potential use cases that the current PEP does not.

GalaxySnail · October 22, 2024, 6:19pm

I don’t know why we want lazy evaluation for t-strings, but if you don’t mind, there is a hack even works for f-strings, and it even works on pypy:

class LazyFormatter:
    def __init__(self, func):
        self.code = func.__code__

    def format(self, **kwargs):
        return eval(self.code, kwargs, {})

    def format_map(self, mapping):
        return eval(self.code, mapping, {})

>>> lazy_formatter = LazyFormatter(lambda: f"hello {name:{spec}}")
>>> lazy_formatter.format(name="world", spec="s")
'hello world'
>>> lazy_formatter.format(name=42, spec=".2f")
'hello 42.00'

pf_moore · October 22, 2024, 7:21pm

Uninteresting is fine with me - being able to solve a problem with boring code reduces the risk of bugs

I like this, not just because it solves the “how do I create a template with parameters” problem, but also because it made me stop and re-think my understanding of what a template is and how it’s created. I think I was focused too closely on the idea that t-strings were the only way of creating templates, when in fact templates are a prefectly normal Python type, and t-strings are simply the literal form of a template. No-one asks “how can I make [a, b, c] lazily evaluate the variables a, b and c?” - instead, you just write a function that returns the list you want. Template literals (t-strings) are the same as list comprehensions in that sense.

+1 on having this example somewhere in the “how to teach this” part of the PEP.

steve.dower · October 22, 2024, 11:11pm

Another option to handle delayed substitutions is to capture evaluation errors in the Interpolation object and reraise them if the .value attribute is accessed. That way the errors still bubble out in almost the same way, but if the template wants to handle them (or ignore the captured value entirely), it can, and use the .expr attribute instead:

(Made up code, untested)

>>> fmt = t"{a} + {b} = {a+b}"
>>> fmt.args[1].value
NameError: a
>>> fmt.args[1].expr
'a'

>>> def apply(tmpl, **args):
...     for a in tmpl.args[1::2]:
...         a.value = eval(a.expr, args)

>>> apply(fmt, a=1, b=2)
>>> fmt.args[1].value
1
>>> fmt.args[3].value
2
>>> fmt.args[5].value
3

I’m not suggesting that that apply function I just invented should be part of the standard, only that the fact that an error occurs while evaluating an expression need not prevent the Template from being created.

Contrast with the current behaviour (by my reading):

>>> fmt = t"{a} + {b} = {a+b}"
NameError: a
>>> fmt
NameError: fmt

MegaIng · October 24, 2024, 9:06am

Note that I don’t think this is a good solution. It fails to have the benefit that proper delayer substitution would have, i.e. you can’t do more complex calculations on these values. This ability to do more complex evaluations and actually use python syntax was in my understanding the primary driving factor of this PEP; Otherwise normal string syntax with simple custom parsers would be enough.

All of these “solutions” being suggested here are just bandaids for a missing fundamental feature.