PEP750: Template Strings (new updates)

larry · November 30, 2024, 9:13pm

I’m not a domain expert… not really a web guy. But in talking to people who do this all the time, quoting possibly-tainted values when spitting out web pages is a crucial feature. All three have built-in filters for escaping HTML and URLs. I observe Mako gave these filters super short names (h and u respectively), I’m guessing because you use them so often.

guido · November 30, 2024, 9:13pm

It seems to me that what goes inside of an interpolation is an expression, and if you want to filter something, that should be part of the expression syntax. There are already clear ways to spell that in standard expressions, e.g. function calls or method calls (in some cases).

I don’t like the idea of taking an existing operator and changing its meaning completely. (There’s a reason the existing extra notation uses characters that are not operators.)

If you want a filter operator it should be a proposal for an additional expression operator, not just in the context of t-strings.

larry · November 30, 2024, 10:09pm

Python has a long history of overloading its operators. I observe that today’s Python ships with several different distinct meanings for the | operator: “boolean or”, “union of sets”, and the recently-added “create union of types”. Also, pathlib.Path overloads the / operator to mean “smart-concatenate elements of a path”, a wildly different meaning from division–and % has meant “perform value substitution inside a string” for decades. So I suggest that Python programmers are mentally flexible enough to understand context-specific meanings for operators.

Also, | has meant “pipe things together” for 40+ years in the UNIX and DOS shells. Again, this spelling was so obvious to the developers of three separate template libraries that they all used it. Each of these libraries could have spelled apply-filter as

    filter(expression)

but felt it was conceptually important enough for some reason to use the spelling

   expression | filter

On the other hand, it’s worth noting the ticklish problem of using an existing legal operator to mean “filter” here. What if the user innocently wants to evaluate this expression in a template?

set1 | set2

The template might interpret that to mean “evaluate set1, then run it through the filter set2”, oops!

Obviously picking a different operator solves this problem. But I feel like we’re running out of ASCII punctuation that looks nice and would be unambiguous here.

I observe you could solve this problem a couple other ways. For example, you could declare that you can only specify filters after the colon, and if you don’t need a format-spec just leave it blank:

my_value :| filter1 | filter 2

Alternatively, if you want the filter syntax to take precedence inside template strings, users could use a normal | operator by putting parentheses around the expression:

(set1 | set2)

I fear I have no more to offer this conversation. As mentioned I’m not really an expert in this area. Maybe y’all could rope in some sort of domain expert, Armin Ronacher or somebody.

[Edit: oops, you concatenate pathlib.Path elements with the / operator, not the | operator. D’oh!]

guido · November 30, 2024, 11:37pm

At least in the case of Django templates, the reason is pretty clear: the design does not support expressions at all, just “variables” (dotted names). The only operation is “filter”, and in analogy of the Unix shell they decided to use |. The target audience for this notation is specifically not Python users. (For example, “content managers” with little or no programming experience.)

I presume the others were inspired by Django templates.

If we’re looking for a filter operator that doesn’t conflict with expressions, we could extend !r, !s, and !a with !identifier, keeping the original three as shorthands for !repr, !str and !ascii, respectively.

That would work in f-strings too. But I would recommend making that a separate PEP.

davidism · December 1, 2024, 12:04am

I guess I don’t know the true design history, but my assumption has always been that | filter was a convenience for template authors who are not developers, where take_this | do_this | then _do_this is easier to read in an otherwise text document than then_do_this(do_this(take_this)). Given that PEP 750 does not provide a way to write templates separate from Python code, we can assume anyone writing templates is also familiar with writing Python.

Also, I’m not convinced I would add filter syntax if I was rewriting Jinja today. It’s caused confusion about operator precedence; many people are uncertain/surprised how the expression a + b | c + d evaluates. It still requires understanding Python syntax to pass arguments. And given that so much of the rest of Jinja looks like (and is) Python anyway, having a second way to apply functions doesn’t make the template as a whole particularly more readable. Similarly for the a is test syntax that converts to test(a), and how those tests can also be used in filters.

larry · December 1, 2024, 12:59am

Do you also propose also to support

dotted identifiers,
multiple !identifiers to apply multiple filters (presumably left-to-right), and
arguments to the filter, perhaps spelled !identifier('arguments', 'here', 33) ?

Also, just to touch on this aspect: with f-strings, the conversion is applied before the format. It works for me if the filters are applied before the format here too. I believe the templating libraries don’t have the equivalent of a “format spec” for their expansions–they just use filters to format the value–so they don’t express an opinion here. Also, they have some filters that definitely expect to operate on non-string values, which suggests they’d have to be called before the format. (On the other hand, I suppose template strings don’t actually have an opinion about whether you apply the conversion/filter or the format first–the code rendering the template could do whatever it wants.)

Finally, if it were me, I’d be sorely tempted to reserve all one-character strings after the ! for future predefined converters. So !q wouldn’t work even if you had def q(s): ... available.

guido · December 1, 2024, 5:12am

Those are all excellent questions for the team working on that PEP — not for me nor for the PEP 750 team.

ncoghlan · December 8, 2024, 6:24am

The neatest variant on this that has occurred to me is the version we had in the last pre-withdrawal iteration of PEP 501:

add operator.convert_field (ignore the bit about () as a new conversion spec option): New field conversion API in theoperator module
add a conversion_spec option to the format builtin: Conversion specifier parameter added to format()

The operation isn’t really specific to template strings, so this seemed like a better approach to me than putting it on the interpolation fields or in a new library module.

ncoghlan · December 8, 2024, 6:33am

This is the position PEP 750 takes by default.

Since the handling of the format strings is up to the template processor, it can decide to apply its own filters. Each : after the first also isn’t special, so a template processor can define filter handling this way:

my_value:|preprocessing_filter:format_spec:|postprocessing_filter

Substitution fields in format specs are eagerly evaluated, so there are also multiple ways to handle filters with arguments (either passing the entire filter in via a substitution field, or the individual arguments to the filter).

The leading : also avoids any potential confusion with | as a set union or bitwise numeric operator.

alexmojaki · December 8, 2024, 7:37pm

Adding my support for this PEP. I developed a workaround for the lack of this feature in Logfire, a structured logging library that uses OpenTelemetry. In particular, these two lines of code are equivalent:

logfire.info("Hello {name}", name=name)
logfire.info(f"Hello {name}")

Both emit something like the following data among other things:

{
  "span_name": "Hello {name}",
  "message": "Hello Bob",
  "attributes": {
    "name": "Bob"
  }
}

If name contains something that looks sensitive it will be redacted by default, and if it’s too long it’ll be truncated.

The documentation of this feature is here: Add Logfire Manual Tracing - Pydantic Logfire Documentation

This works by using my library executing to analyze the source code and bytecode to obtain the AST node of the method call. The code which processes this to format the code and extract the attributes is here.

There’s a few notable problems with this:

The underlying implementation is very dark magic.
The source code has to be available.
Values inside {} have to be evaluated a second time, the first time being for the f-string whose value is discarded.

PEP 750 perfectly solves all these problems, it’s exactly what’s needed. Users can just replace f with t. In particular I’m very glad that this proposes a new syntax (like PEP 501) instead of the older version with arbitrary callables/prefixes which would have been more cumbersome to use.

Logfire also integrates with the stdlib logging module, so existing logging calls can emit a Logfire log. This works well if the user writes e.g. logger.info('Hello %s', name) instead of logger.info(f'Hello {name}'). In the latter case we just receive the formatted string so we don’t have structured data. We could use the same dark magic to inspect the original calls, I just haven’t gotten around to it. But it would be really great if logger.info(t'Hello {name}') (i.e. using a t-string) was commonplace, i.e. if logging made it ‘just work’ by default and kept the Template in the log record.

BTW this isn’t the first time I’ve worked around this, I also previously wrote a library which converted f-strings to a class very similar to Template: GitHub - oughtinc/fvalues

markshannon · December 18, 2024, 11:52am

I don’t have an opinion on whether this is a worthwhile addition overall, but I do think the proposed changes are more complex and less efficient than they need to be.

Here are my suggestions:

Remove `args`

The args property seems to add no value, it just gets in the way of accessing strings and interpolations.

From a usability perspective args complicates the interface, and leads users to more awkward code.

Much of the example code and explanation goes into explaining how the str and Interpolation values are interleaved. If args were removed, then all that could be removed.

For example, the code for implementing f-strings with t-strings includes this:

    for arg in template.args:
        match arg:
            case str() as s:
                parts.append(s)
            case Interpolation(value, _, conv, format_spec):
                value = convert(value, conv)
                value = format(value, format_spec)
                parts.append(value)

without args it would be:

    for s, i in zip(template.strings, template.interpolations):
        parts.append(s)
        value = convert(i.value, i.conv)
        value = format(value, i.format_spec)
        parts.append(value)
    parts.append(template.strings[-1])

with no need for instance checks.

In general:

    for arg in template.args:
        if isinstance(arg, str):
            process_str(arg)
        else:
            process_interpolation(arg)

becomes

    for s, i in zip(template.strings, template.interpolations):
        process_str(s)
        process_interpolation(i)
    process_str(template.strings[-1])

Performance

The strings property of any template is a tuple of compile-time constant strings,
and is thus itself a compile-time constant.

Rather than constructing the template from 2n+1 objects (n+1 strings and n interpolations) using args it can be created from n+1 objects using n interpolations and 1 strings tuple.

Typing

The term “concrete type” is used in a few places.
I don’t know what that means, unless it means “class”, in which case just use “class”.

The PEP claims that “Template.strings and Template.interpolations” provide “strongly-typed” access.
It is Python, everything is strongly typed. It should be “statically typed”.
Although, I would argue that args is also statically typed, just not conveniently so.
If args is removed then this is all moot and can be removed.

The templatelib module

I don’t see why a new module is necessary.
If the Template and Interpolation classes need to be accessed by name, then add them to the types module.
2 lines of Python instead of 50 lines of C.

The `Template` and `Interpolation` classes

Why does Template class need an __init__ method? The PEP has no examples of it being used.

Why define __hash__ and __eq__? Given templates are syntactic constructs, it seems surprising that two template strings in different parts of a program would be regarded as equal.
Supporting hashing and equality of potentially mutable objects seems fragile.
It also makes computing the hash and equality a lot more expensive than using simple id comparison.

Finally, would it make sense to merge the conv and the format_spec atributes of the Interpolation class?
They are both strings, come from a single string in the source, and are almost never processed independently.

steve.dower · December 18, 2024, 3:03pm

Mark Shannon:

without args it would be:

    for s, i in zip(template.strings, template.interpolations):
        parts.append(s)
        value = convert(i.value, i.conv)
        value = format(value, i.format_spec)
        parts.append(value)
    parts.append(template.strings[-1])

with no need for instance checks.

Unfortunately, you’d need itertools.zip_longest and None checks, or else you may cut off the final string (as your example does) or cause an AttributeError (as the example would if the last (absent) interpolation was None)^[1] that extra append looks like a wart and bug magnet. It’s about break even, no matter which way we go here. (Though I think I’d prefer the separate lists as well.)

Agreed, and agreed on the types module and the simpler type implementations.

The simpler implementations is helpful as it clarifies that these types are primarily to transfer data from the user’s source code to the string processor, and not for more general interop between separate parts of the application. We expect template strings to be very quickly passed to a processor, which may then return its own type that is for interop (e.g. it might make the t-string into a parameterized SQL statement object), but there shouldn’t be any need to encourage keeping the templated strings themselves around any longer. (This of course doesn’t prevent processors from passing them to helper functions. And caching can only realistically be performed on the original text, ignoring the value of the interpolations that were just passed in, so making the whole template hashable is unnecessary.)

Edit: Just noticed the extra append. ↩︎

markshannon · December 18, 2024, 3:27pm

I agree, the extra append is a bit ugly.

Maybe last string could be a separate attribute?
Then strings and interpolations would be the same length, and the code would become:

for s, i in zip(template.strings, template.interpolations):
        process_str(s)
        process_interpolation(i)
    process_str(template.closing_string)

It looks a bit better, but maybe no less error prone.

ncoghlan · December 18, 2024, 11:02pm

If we make templates directly iterable with the correct behaviour, I think we would avoid the bug magnet:

def __iter__(self):
    yield from zip(self.strings, self.interpolations)
    yield self.strings[-1], None

Consumers would need to handle the interpolation potentially being None, but that’s a noisier mistake than forgetting the trailing string.

guido · December 19, 2024, 12:48am

That None check would be annoying and expensive, given that you have to do it for each iteration – you don’t know when you’re at the end until you are.

But all the other approaches are annoying and expensive too, or bug magnets. I honestly don’t know what to do about this.

An earlier design didn’t require literal strings and interpolations to alternate – for t"prefix{x}{y}suffix" you’d get “prefix”, interpolation(x), interpolation(y), “suffix” as the four item, and for `t"{x}{y}" you’d just get two interpolations.

Currently the other PEP authors (IIRC – more from personal communication than from reading the latest version of the PEP) seem to like the alternation mostly because you can do certain things more efficiently, in case you only care about the interpolations, or you only care about the literal strings. That feels like premature optimization to me, but it’s hard to change your mind about such an API detail once the PEP is live.

Maybe we should build two working prototypes so that we can experiment implementing various realistic examples both ways and observe more objectively which API style feels more natural.

ncoghlan · December 19, 2024, 7:49am

While I don’t love the idea, writing the loop out explicitly and then dealing with the final string segment would still be available as a way to avoid checking “Is this the final iteration?” on each pass.

Offering both “obvious but slower than it could be” and “faster, but more verbose and error-prone” iteration patterns is clearly dubious from a “one obvious way to do it” perspective, though

markshannon · December 19, 2024, 8:37am

seem to like the alternation mostly because you can do certain things more efficiently

OOI, what things can be done more efficiently?

effigies · December 19, 2024, 1:51pm

Just to bring the discussion from CONSIDER removing the section on interleaving from the spec entirely, or only mentioning it in passing outside of the spec itself · Issue #30 · davepeck/peps · GitHub here, it is mostly about strings being a memoization key.

@dkp:

After discussion, @jimbaker and I agree that interleaving needs to be described, if for no other reason than to explain why Template("Red ", "Leicester") == Template("Red Leicester").

@effigies:

I think you could describe constraints on implementations in terms of __hash__ and __eq__, and how the initial implementation does/intends to satisfy them, without setting that specific detail in stone to the point where developers would feel comfortable relying on that behavior.

If, for some reason, you wanted to change the implementation, you would then need to expose a .args property that recreated this detail to avoid breaking people.

@dkp:

The other consideration is that interleaving explains why t"{a}".strings == ("", "") whereas t"{a}{b}".strings == ("", "", ""). We want to keep that distinction because strings so structured is likely to be a useful as a memoization key.

@effigies:

My feeling is still that .strings is a better interface to encourage dependency on than .args[::2] but it’s really just a nagging gut feeling now. It seems difficult to have a structural signature that doesn’t invite the interleaved implementation. Ultimately, if people depend on template.strings == (a, b, c), that is only slightly less binding on reimplementations than template.args[::2] == [a, b, c].

dkp · December 19, 2024, 6:45pm

That was definitely where we started!

But as we’ve spent more time with the API I think we’ve come to a more nuanced understanding. At this point, I tend to think of alternation as a natural outcome of some basic observations:

For starters, we have to decide what t"".args should be. Since t"stilton".args seems to naturally be ("stilton",), it feels natural for t"".args == ("",) rather than (). As a consequence, args will never be empty, and args can and will contain empty strings.
We want to support template equality since, for instance, devs will probably expect assert t"red " + t"windsor" == t"red windsor" to hold. Following that trail, we think the following asserts should hold:

assert Template("red ", "windsor") == Template("red windsor")
assert Template(Interpolation(42)) == Template("", Interpolation(42)) == Template(Interpolation(42), "")

When implementing __eq__(), it becomes clear that to support these asserts a correct implementation effectively must coalesce these sequences into alternating strings/interpolations in order to do its work.
3. Finally, it feels natural to say that two Templates are equal if and only if their args are equal. This leads us directly to args themselves being alternating.

Developers can mostly ignore this, since the PEP states that Template’s constructor always performs coalescing/alternation. It’s fine to send any sequence of strings and interpolations into that constructor if you’re using it directly.

(As for perf: the fact that an often-useful cache key (the tuple of static string parts) falls out of this seems like a nice downstream consequence. The fact that the static string parts alone help distinguish between t"{1}" and t"{1}{2}" because it’s ("", "") vs ("", "", "") feels nice to me, too.)

guido · December 20, 2024, 1:58am

I’m sure we could make up an equally plausible argument for dropping empty strings from args, doing concatenation “right” shouldn’t be any harder. But let’s assume you’re on to something with this.

I suspect the hardest part is to figure out how to make life easier for less sophisticated users (who don’t use caching, just doing an immediate processing of the template into something useful for the application). They need to be able to iterate over something, and it feels a bit awkward to have to use a for-loop over args that type-checks each item (whether using isinstance() or match).

Alternate proposals so far seem to have in common that they iterate over (string, interpolation) tuples and then have to process the final string specially – either by seeing a final pair (string, None), or by having to make a separate extra call process_string(the_last_string) – the latter either just being strings[-1] or a separate field. Especially the latter feels very awkward, and easy to forget.

Let me put another alternative on the table: iterate over (interpolation, string) tuples instead, and receive (None, string) for the first tuple (which is guaranteed to exist). This tries to address the likely potential bug of skipping the final string by putting the exceptional case first – this forces users writing this code to get the first iteration right before their code works. Basically you have to write the None check to get even the most trivial example (t"") to work, e.g.:

for interp, string in tmpl.pairs:
    if interp is not None:
        process_interp(interp)
    process_string(string)

And users prepared to do something extra for performance can simply treat the first item special – and you already guarantee that there always is a first item:

it = iter(tmpl.pairs)
_, string = it.next()
process_string(string)
for interp, string in it:
    process_interp(interp)
    process_string(string)

I’m flexible about the details – basically we should probably expose tmpl.strings and tmpl.interpolations (both tuples!) as the low-level API (with one more string than interpolations, so if you use this you don’t need to check for None, you just need to be careful of the end case), and e.g. tmpl.pairs being a property returning an iterator as I described above – or possibly this could be done as iterating over the template itself (for interp, string in tmpl: ...).

Pseudo-code:

def __iter__(self):
    yield None, self.strings[0]
    yield from zip(self.templates, self.strings[1:])

I’m sure the slice can be optimized out by using another iterator, but I’m out of time.

PS. I’ve come to realize that __eq__ and __hash__ are needed (or at least very useful) to create a cache for pre-processed templates.