PEP750: Template Strings (new updates)

I’m not a domain expert… not really a web guy. But in talking to people who do this all the time, quoting possibly-tainted values when spitting out web pages is a crucial feature. All three have built-in filters for escaping HTML and URLs. I observe Mako gave these filters super short names (h and u respectively), I’m guessing because you use them so often.

1 Like

It seems to me that what goes inside of an interpolation is an expression, and if you want to filter something, that should be part of the expression syntax. There are already clear ways to spell that in standard expressions, e.g. function calls or method calls (in some cases).

I don’t like the idea of taking an existing operator and changing its meaning completely. (There’s a reason the existing extra notation uses characters that are not operators.)

If you want a filter operator it should be a proposal for an additional expression operator, not just in the context of t-strings.

8 Likes

Python has a long history of overloading its operators. I observe that today’s Python ships with several different distinct meanings for the | operator: “boolean or”, “union of sets”, and the recently-added “create union of types”. Also, pathlib.Path overloads the / operator to mean “smart-concatenate elements of a path”, a wildly different meaning from division–and % has meant “perform value substitution inside a string” for decades. So I suggest that Python programmers are mentally flexible enough to understand context-specific meanings for operators.

Also, | has meant “pipe things together” for 40+ years in the UNIX and DOS shells. Again, this spelling was so obvious to the developers of three separate template libraries that they all used it. Each of these libraries could have spelled apply-filter as

    filter(expression)

but felt it was conceptually important enough for some reason to use the spelling

   expression | filter

On the other hand, it’s worth noting the ticklish problem of using an existing legal operator to mean “filter” here. What if the user innocently wants to evaluate this expression in a template?

set1 | set2

The template might interpret that to mean “evaluate set1, then run it through the filter set2”, oops!

Obviously picking a different operator solves this problem. But I feel like we’re running out of ASCII punctuation that looks nice and would be unambiguous here.

I observe you could solve this problem a couple other ways. For example, you could declare that you can only specify filters after the colon, and if you don’t need a format-spec just leave it blank:

my_value :| filter1 | filter 2

Alternatively, if you want the filter syntax to take precedence inside template strings, users could use a normal | operator by putting parentheses around the expression:

(set1 | set2)

I fear I have no more to offer this conversation. As mentioned I’m not really an expert in this area. Maybe y’all could rope in some sort of domain expert, Armin Ronacher or somebody.

[Edit: oops, you concatenate pathlib.Path elements with the / operator, not the | operator. D’oh!]

At least in the case of Django templates, the reason is pretty clear: the design does not support expressions at all, just “variables” (dotted names). The only operation is “filter”, and in analogy of the Unix shell they decided to use |. The target audience for this notation is specifically not Python users. (For example, “content managers” with little or no programming experience.)

I presume the others were inspired by Django templates.

If we’re looking for a filter operator that doesn’t conflict with expressions, we could extend !r, !s, and !a with !identifier, keeping the original three as shorthands for !repr, !str and !ascii, respectively.

That would work in f-strings too. But I would recommend making that a separate PEP.

I guess I don’t know the true design history, but my assumption has always been that | filter was a convenience for template authors who are not developers, where take_this | do_this | then _do_this is easier to read in an otherwise text document than then_do_this(do_this(take_this)). Given that PEP 750 does not provide a way to write templates separate from Python code, we can assume anyone writing templates is also familiar with writing Python.

Also, I’m not convinced I would add filter syntax if I was rewriting Jinja today. It’s caused confusion about operator precedence; many people are uncertain/surprised how the expression a + b | c + d evaluates. It still requires understanding Python syntax to pass arguments. And given that so much of the rest of Jinja looks like (and is) Python anyway, having a second way to apply functions doesn’t make the template as a whole particularly more readable. Similarly for the a is test syntax that converts to test(a), and how those tests can also be used in filters.

6 Likes

Do you also propose also to support

  • dotted identifiers,
  • multiple !identifiers to apply multiple filters (presumably left-to-right), and
  • arguments to the filter, perhaps spelled !identifier('arguments', 'here', 33) ?

Also, just to touch on this aspect: with f-strings, the conversion is applied before the format. It works for me if the filters are applied before the format here too. I believe the templating libraries don’t have the equivalent of a “format spec” for their expansions–they just use filters to format the value–so they don’t express an opinion here. Also, they have some filters that definitely expect to operate on non-string values, which suggests they’d have to be called before the format. (On the other hand, I suppose template strings don’t actually have an opinion about whether you apply the conversion/filter or the format first–the code rendering the template could do whatever it wants.)

Finally, if it were me, I’d be sorely tempted to reserve all one-character strings after the ! for future predefined converters. So !q wouldn’t work even if you had def q(s): ... available.

Those are all excellent questions for the team working on that PEP — not for me nor for the PEP 750 team. :slight_smile:

2 Likes

The neatest variant on this that has occurred to me is the version we had in the last pre-withdrawal iteration of PEP 501:

The operation isn’t really specific to template strings, so this seemed like a better approach to me than putting it on the interpolation fields or in a new library module.

This is the position PEP 750 takes by default.

Since the handling of the format strings is up to the template processor, it can decide to apply its own filters. Each : after the first also isn’t special, so a template processor can define filter handling this way:

my_value:|preprocessing_filter:format_spec:|postprocessing_filter

Substitution fields in format specs are eagerly evaluated, so there are also multiple ways to handle filters with arguments (either passing the entire filter in via a substitution field, or the individual arguments to the filter).

The leading : also avoids any potential confusion with | as a set union or bitwise numeric operator.

2 Likes

Adding my support for this PEP. I developed a workaround for the lack of this feature in Logfire, a structured logging library that uses OpenTelemetry. In particular, these two lines of code are equivalent:

logfire.info("Hello {name}", name=name)
logfire.info(f"Hello {name}")

Both emit something like the following data among other things:

{
  "span_name": "Hello {name}",
  "message": "Hello Bob",
  "attributes": {
    "name": "Bob"
  }
}

If name contains something that looks sensitive it will be redacted by default, and if it’s too long it’ll be truncated.

The documentation of this feature is here: Add Logfire Manual Tracing - Pydantic Logfire Documentation

This works by using my library executing to analyze the source code and bytecode to obtain the AST node of the method call. The code which processes this to format the code and extract the attributes is here.

There’s a few notable problems with this:

  1. The underlying implementation is very dark magic.
  2. The source code has to be available.
  3. Values inside {} have to be evaluated a second time, the first time being for the f-string whose value is discarded.

PEP 750 perfectly solves all these problems, it’s exactly what’s needed. Users can just replace f with t. In particular I’m very glad that this proposes a new syntax (like PEP 501) instead of the older version with arbitrary callables/prefixes which would have been more cumbersome to use.

Logfire also integrates with the stdlib logging module, so existing logging calls can emit a Logfire log. This works well if the user writes e.g. logger.info('Hello %s', name) instead of logger.info(f'Hello {name}'). In the latter case we just receive the formatted string so we don’t have structured data. We could use the same dark magic to inspect the original calls, I just haven’t gotten around to it. But it would be really great if logger.info(t'Hello {name}') (i.e. using a t-string) was commonplace, i.e. if logging made it ‘just work’ by default and kept the Template in the log record.

BTW this isn’t the first time I’ve worked around this, I also previously wrote a library which converted f-strings to a class very similar to Template: GitHub - oughtinc/fvalues

5 Likes