PEP750: Template Strings (new updates)

I personally think so.

I don’t think so as it isn’t a subclass of str.

Or from string import templatelib which also disambiguates everything coming from that module.

4 Likes

Naming precision / readability aside, I just learned that there is already string.Template, which is currently even called “template strings” in the docs. Then introducing another concept “template strings” under the name string.templatelib.Template is a recipe for confusion.

There should be different names in both the class names and in conceptual naming (how to unambiguously call these when speaking about them, something like “template strings” / “$-template strings” / “f-string templates”, … ).

Suggestion:

Conceptual naming for the new templating mechanism: Let’s start with the analogy to f-strings. f”…” is a f-string literal that is evaluated to a str. It’s conventionally called “f-string”. t”…” is formally a t-string literal (or maybe template-string literal) that is evaluated to a string template, which can later be turned into a str. I believe calling t”…” a “template string” in analogy to “f-string” is reasonable (“t-string” would also be possible but is less clear and we also have the precedence for full word prefixes in “raw string” - r”…”). The “template string” is evaluated to a “string template”; class names for that could be StrTemplate or StringTemplate.

The existing string.Template mechanism should then be reworded to “$-string templates” or similar to free the conceptual name “template string” for the new mechanism. I believe we fundamentally want “template string” for the new concept and string.Template is niche enough so that this rewording is possible.

3 Likes

This is the terminology used in the PEP’s Specification section (but not the title), and I believe the reference implementation: https://peps.python.org/pep-0750/#template-string-literals

Perhaps the PEP title could be updated, if deemed valuable?

A

I think colloquially, most people talk about these things as f-strings, t-strings, and $-strings (dollar-strings). There’s not too much confusion about them, although string formatting syntax rivals arg parsing libraries in the stdlib :smiley:

3 Likes

In my experience I don’t think I’ve ever used string.Template and thus would not refer to them by any name. Do for me I don’t think having two things named Template is confusing, since I only expect to see f and t strings. But I do no i18n work.

1 Like

Have to pile on: there’s absolutely a chance of confusion when there’s already “Template Strings” in Python since all the way back in 2.4, which by the way use a class named Template. Of course code can disambiguate this, not the point. I know not many people use these $-strings (huh, not heard that term before), but they’re all over the internet in (often low quality) tutorials which try to describe all the string formatting approaches “just for completeness”. We’ll have some fun at first when the new breed of AI-first programmers ask itheir IDE about template strings and the Template class, when the body of knowledge includes the previous 20+ years worth of data when that meant something a bit different from what it’s going to mean now. This is just a request to mitigate some confusion by tweaking terminology.

4 Likes

Hi, I’m a maintainer of Ruff.

We started working on the new Python 3.14 features, and I read the template literal PEP. Thank you for writing this proposal. I’m very excited for template literals. They’ll unlock so many new possibilities for safer string operations in Python and I’m especially happy about the possibility that Ruff can start formatting embedded languages.

I have a few clarifications/remarks regarding the PEP. I’m sorry that I’m only reaching out after the PEP has been accepted. I wasn’t aware that it was close to being accepted and what the implications were for us. Feel free to ping me or any other Ruff maintainer on future PEPs if you want feedback from tooling authors.

I understand from the PEP that Interpolation.expression stores the string representation of an interpolation expression. Is my understanding correct that it stores a + b for t"test {a + b}" or does Python perform any form of normalization on the expression string? I’m asking because including the raw expression string as it appears in the source has the downside that formatting any expression part is a semantic change, preventing formatters from formatting the expression parts. A similar problem exists with debug expressions in f-strings today, but it’s less of a concern because they’re not commonly used. Is there a strong need to access the raw expressions, or is it just for debugging, in which case the trade-off might be worth reconsidering?

If t-strings prove popular, it may be useful to have a way to describe the “kind” of content found in a template string: “sql”, “html”, “css”, etc. This could enable powerful new features in tools such as linters, formatters, type checkers, and IDEs. (Imagine, for example, black formatting HTML in t-strings, or mypy checking whether a given attribute is valid for an HTML tag.) While exciting, this PEP does not propose any specific mechanism. It is our hope that, over time, the community will develop conventions for this purpose.

I’m excited about bringing this functionality to Ruff’s formatter and our type checker. The way this convention works in the web ecosystem is that there are some known template literal tags, e.g. graphql and gql that mark a template string as graphql, regardless of where the tag function is imported from (source). To my knowledge, this works pretty well, and I’m confident that it will work for Python too. However, I think the fact that “tagging” the kind of Python template literals is lazy makes this more challenging and has the risk that tools may not always recognize the right kind.

What I mean by this is that JavaScript’s syntax requires authors to specify the kind when writing the template literal, e.g., sqltest``. Python’s proposal is more flexible in that regard. Creating the template literal doesn’t yet define the template’s kind. The processing function the literal is passed to defines the kind, and the distance between where the literal is defined and when the processing function is called can be arbitrarily large.

Ideally, the processing function is called directly on the template string. Static analysis tools can easily detect this.

result = connection.select(sql(t"SELECT * from table where X ={x}"))

However, the proposed syntax allows APIs to e.g. hide the explicit sql invocation to simplify the API:

result = connection.select(t"SELECT * from table where X ={x}")

and a library could define multiple methods accepting sql template literals

result = connection.select(t"SELECT * from table where X ={x}")
result = connection.insert(t"INSERT ({x},1) into table")
result = connection.delete(t"DELETE from table where X= {x}")

and a user might wrap some library functions:

result = my_fancy_query(connection, t"SELECT * from table where X = {x}")

or use temporary variables

if condition:
	query = t"Select * from users where id = {id}"
else:
	query = t"Select * from deleted_users where id = {id}"

result = connection.select(query)

Identifying the right kind for those invocations is harder for static analysis tools because it requires tracking the template literal through multiple layers unless the tool uses a hardcoded or customizable list of known functions (which are cumbersome to maintain). It may be possible to rely on types to identify the kind of a literal, but querying type information isn’t feasible for most formatters (because they don’t have the capability or prefer not to because it’s expensive and requires cross-module analysis).

Have there been other ideas on how static analysis tools (including formatters) would detect the template literal’s kind, or is the hope that the community will adopt the convention of always wrapping the template literal with the processing function?

7 Likes

I wouldn’t be at all disappointed if linting results are limited for cases where the t-string is far removed from its use.

My expectation was that, other than syntactic validity of the substitutions, linters would only handle other contents of a t-string if its use were specifically recognised. So either a tool that handles type annotations, or a specific list of functions (which I’ve likely manually enabled, as the maintainer of my project).

2 Likes

Based on the implementation:

>>> a=b=1
>>> t = t"test {a   + b}"
>>> t.interpolations[0].expression
'a   + b'

This matches the PEP, which states:

The expression attribute is the original text of the interpolation

This could perhaps be an opt-in setting? I imagine that there will be many cases where whitespace in the expression doesn’t matter.

2 Likes

There was some discussion of this earlier, with the use of typing.Annotated suggested, e.g. def select(self, query: Annotated[Template, 'sql']): .... Though as Steve suggests, another viable option is an explicitly configured set of e.g. functions taking SQL-like t-strings in a tool’s settings.

A

1 Like

I wouldn’t be at all disappointed if linting results are limited for cases where the t-string is far removed from its use.

This makes sense to me, but I’m not sure if I’d think about it when designing my APIs. But maybe it’s something we can teach.

This could perhaps be an opt-in setting? I imagine that there will be many cases where whitespace in the expression doesn’t matter.

I’d find it somewhat disappointing when a newly introduced syntax can’t be safely formatted by default to support some less common use case unless these are very import use cases – in which case we simply can’t format expressions. This is especially true considering that it would either be a global opt-in or opt-out because tools can’t easily identify the template literal kind (which would enable the option for an allow or deny list). Is there a strong need to retain the raw expressions as strings that justify the trade-off that they can’t be safely formatted?

There was some discussion of this earlier, with the use of typing.Annotated suggested, e.g. def select(self, query: Annotated[Template, 'sql']): .... Though as Steve suggests, another viable option is an explicitly configured set of e.g. functions taking SQL-like t-strings in a tool’s settings.

That’s an interesting idea and it would certainly help type checkers. I’m not sure if it will be feasible (because performance) for formatters to make use of type annotations.

3 Likes

I’m not one of the authors, but the rationale in the PEP is:

We expect that the expression attribute will not be used in most template processing code. It is provided for completeness and for use in debugging and introspection.

Given this, I think it would be reasonable for Ruff to globally format the expression parts of t-strings, as an out-of-band activity.

I can’t immediately think of precedent for normalising source code in the way you propose in the language, and I think it could be surprising: as .expression is a string, rather than e.g. an AST representation, I wouldn’t expect any normalisation.

A

2 Likes

That mirrors concerns I raised previously with this suggested approach and those and similar concerns are probably part of reason why this part of the PEP was dropped entirely.

Early in the life of the PEP I proposed an orthogonal feature that could be used for all string/byte literals, so it would be useful for all kinds of strings, not just template strings[1].

Namely string/language tags:

t:sql'SELECT * FROM {table};'
# adding a language tag to a plain string requires the `u` prefix
u:sql'SELECT * FROM foo;'

The compiler could also add this tag to the resulting object in a dunder attribute like e.g. __language_tag__ so it could be used at runtime to change the behavior of functions that accept these objects based on the given tag.

If someone feels very strongly about improving the ability for tools to detect what kinds of data is contained within a string/bytes/template literal for the purposes of syntax highlighting and formatting, I would encourage them to lead standardization efforts.

This could be something as simple as introducing new pragma style comments at the beginning, which when successful could pave the way for a syntax extension, like the one I proposed, that makes it part of the language and more ergonomic to use.


  1. You don’t need a template for e.g. a fully static SQL query, but you would still like it to be inline-rendered/formatted as SQL ↩︎

2 Likes

I agree that it’s reasonable for linters to autoformat t-string interpolation expressions in the same way they format f-string interpolation fields. The specification of preserving whitespace in the runtime representation is a matter of not wanting to define a whitespace normalisation algorithm, rather than expecting consumers to care much either way.

Since the interpolation text is a Python expression, if a template processor treats the internal whitespace as semantically significant in a way that autoformatters might distort, it’s the template processor that would be doing something a bit odd.

For the syntax highlighting, the PEP did actually start out as a tagged string proposal, but over the course of the discussions, the authors ultimately decided that a single syntactic prefix with regular callables as template processors fit in better with Python as a whole, and to defer describing the formatting of the templates themselves. That said, two plausible conventions did get mentioned during the discussions:

  • immediate rendering, like `sql(t"SELECT {column} FROM {table}“)”
  • type hinting, like query: Template["sql"] = t"SELECT {column} FROM {table}"
3 Likes

Indeed, and it was largely due to how arbitrary tags would either pollute the namespace or be polluted by the namespace.

There’s no way to reliably interpret sql as a tag independent from the library that provides the implementation, which is just as difficult for a non-introspective tool to determine as the current approach.

2 Likes

Thanks, this is useful. It would be great to document the assumption that processors should treat the internal whitespace as semantically insignificant (doesn’t have to be part of the PEP but maybe something to consider for the final documentation).

Does this assumption also expand to non-whitespace formatting? Examples here are:

  • The quotes from inner strings
  • Escapes used in inner strings
  • Trailing commas (e.g. from a multiline list)
1 Like

I realize that ship has probably sailed, but couldn’t a dedicated syntax (here, inspired from restructuredText roles) have worked for this?

import mypkg

evil = "<script>alert('evil')</script>"
template = :mypkg.safe_html:"<p>{evil}</p>"
assert template == "<p>&lt;script&gt;alert('evil')&lt;/script&gt;</p>"

The critical part is that you have to fill the namespace in some way other than using import, class or def, which are the standard tools we have for putting callable names into the current namespace. Adding a new mechanism might have helped with some problems, but it doesn’t make it easier for formatters (or users) to figure out what the name means, so it doesn’t solve the current issue.

Why wouldn’t import, class or def work for this?

It wouldn’t be any different from mypkg.safe_html(t"<p>{evil}</p>"), or x = mypkg.safe_html; x(t"..."), ultimately. Either the analyser knows what mypkg.safe_html does, or it doesn’t know, and whether the syntax uses colons or parentheses doesn’t matter.

The degenerate example is this:

if random() < 0.5:
    safe_html = safe_xml
else:
    safe_html = safe_html5

return :safe_html:"<p>{evil}</p>"

How is a tool like ruff supposed to know how to validate the string in the last line? Without a deeper level of flow analysis and external information, it can’t, and a tool that has the additional information can probably [learn to] handle a type annotation on a function call.

Alternatively, a different syntax for putting safe_html into the set of valid string prefixes could at least prevent the use of random to choose it, forcing it to be static. This gives a tool like ruff more chance of knowing what the format should be, though it doesn’t entirely solve the issue. Hence, we didn’t go with either of these ideas.

2 Likes