PEP 750: disallow str + Template

dvarrazzo · May 1, 2025, 1:24am

Hello, it’s still Daniele here, the guy who is making Python talking with Postgres since 2010

The discussion PEP 750: Please add Template.join() was based on the idea that if Template + str and str + Template are deemed safe operation, then an eventual t"".join() would be safe to implement on top of them.

Turns out that Template + str is an insecure operation, allowing to authorize whatever unsafe input:

>>> evil = "<evil>"
>>> t"<good>" + evil
Template(strings=('<good><evil>',), interpolations=())

What was the rationale to allow Template.__add__() to accept a string? Is there any reference?

Can this footgun be defused before releasing the feature in Python 3.14?

It doesn’t seem -as proposed in the other thread- related to implicit concatenation of string and t-string literals, dis says that it’s handled by the parser and doesn’t seem to need __add__:

>>> dis.dis("""
... template = t"foo {name} bar"  "baz"
... """)
  0           RESUME                   0

  2           LOAD_CONST               3 (('foo ', ' barbaz'))
              LOAD_NAME                0 (name)
              LOAD_CONST               1 ('name')
              BUILD_INTERPOLATION      2
              BUILD_TUPLE              1
              BUILD_TEMPLATE
              STORE_NAME               1 (template)
              LOAD_CONST               2 (None)
              RETURN_VALUE

Just trying to figure out t-string safety implications before building a database driver on top of it. If an user takes a field name from unsafe input and wants to build a query:

field = input()

currently they have to use a sql.Identifier object before being able to merge it to a string:

query = SQL("SELECT {field} FROM table WHERE id = %s").format(field=Identifier(field))
cur.execute(query, [id,])

If anyone forgot the Identifier() wrapper and passed field=field, the field name would be inserted as harmless and well escape string literal: SELECT 'field_name' FROM table....

With t-strings the same operation could be, for example:

cur.execute(t"SELECT {field:i} FROM table WHERE id = {id}")

This is safe for the same reason as above: if anyone forgot the :i “identifier” format, the field name would be passed as literal

But someone can write too easily:

cur.execute(t"SELECT " + field + t" FROM table WHERE id = {id}")

which is a door open to injection. This is not worse than just using strings to compose queries, which we have actively discouraged for years, but definitely not better and definitely worse than the safety we currently have in place thanks to the psycopg.sql objects.

I am afraid this veers towards being a deal breaker.

MegaIng · May 1, 2025, 2:03am

It was mentioned here: PEP 750: Tag Strings For Writing Domain-Specific Languages - #196 by jimbaker

And has been discussed in some depth starting here: PEP 750: Tag Strings For Writing Domain-Specific Languages - #207 by nhumrich

With a resolution of this subdiscussion AFAICT at PEP 750: Tag Strings For Writing Domain-Specific Languages - #226 by dkp

The final resulting note in the rejected ideas section doesn’t mention security concerns: PEP 750 – Template Strings | peps.python.org

It also doesn’t distinguish between implicit literal-only and explicit concatenation.

dvarrazzo · May 1, 2025, 2:26am

Thank you very much for the references. Very appreciated.

dvarrazzo · May 1, 2025, 2:49am

So, the pep says:

In the end, we decided that the surprise to developers of a new string type not supporting concatenation was likely to be greater than the theoretical harm caused by supporting it. (Developers concatenate f-strings all the time, after all, and while we are sure there are cases where this introduces bugs, it’s not clear that those bugs outweigh the benefits of supporting concatenation.)

This seems pretty misguided to me. Developers concatenate f-strings and strings all the time because… they are exactly the same, strings, expressed in two different syntactic forms. Strings and templates are two different objects.

it’s not clear that those bugs outweigh the benefits of supporting concatenation

It is dramatically clear to me.

AA-Turner · May 1, 2025, 2:59am

As a process note, the PEP has been accepted as-is, including explicit (+) concatenation. The reference implementation has now been merged, and will be released in 3.14.0 beta 1, on Tuesday.

If you have severe concerns, the governance authority that can sanction removing + concatenation is the Steering Council. I would suggest, should you wish to pursue this, writing up a detailed summary of the concerns with explicit concatenation in an issue on the Steering Council repository, requesting a decision from them. That is the most productive course of action given the context.

A

dvarrazzo · May 1, 2025, 3:28pm

Thank you very much, @AA-Turner. I have opened an issue to the Steering Council

github.com/python/steering-council

PEP 750: danger with explicit concatenation

opened 03:26PM - 01 May 25 UTC

dvarrazzo

Dear Steering Council, to introduce myself, I have been involved in the [Psycop…g project](https://www.psycopg.org/) (the de facto standard PostgreSQL driver for Python) since 2005, I have been the main maintainer of psycopg2 since 2010, and, in 2020, I designed and implemented [Psycopg 3](https://www.psycopg.org/psycopg3/docs/), in order to use new Python features (typing, async), making new choices based on the experience gathered in the previous 15 years. As you can imagine, one of the main preoccupations of our project is safety: how to enable end users to craft any statement they need to execute on a database while guarding them from unsafe input. Therefore, we have always stressed [using the best safety practices](https://www.psycopg.org/psycopg3/docs/basic/params.html) when dealing with untrusted user input. I have recently come across the efforts of [PEP 750](https://peps.python.org/pep-0750/) to provide a template string object in Python, and I found it extremely fitting for the project. A few days ago I have implemented [a first version](https://github.com/psycopg/psycopg/pull/1054) of template strings execution, and I must say that it is the most important and refreshing change I have seen in the language, positively affecting our project. However, during [a discussion](https://discuss.python.org/t/pep-750-please-add-template-join/90202/1) about whether to include the implementation of a `Template.join()` method (which would be [very desirable](https://github.com/psycopg/psycopg/blob/646f2023b24744bd32a96d1a621e8fed8cc48409/tests/test_tstring.py#L171-L187) in my opinion, but this is a different matter), it dawned on me that [explicit string concatenation](https://peps.python.org/pep-0750/#template-string-concatenation) is a very dangerous feature, and it pretty much defeats entirely the safety that the PEP declares as being one of the design goals. Because templates and strings can be concatenated without any safety check, it is very easy to include insecure input in a template. Taking the same example from the [PEP's Motivation](https://peps.python.org/pep-0750/#motivation) section: ```python evil = "<script>alert('evil')</script>" template = t"<p>"+ evil + t"</p>" assert html(template) == "<p><script>alert('evil')</script></p>" # Will fail ``` The hypothetical `html()` function will receive a `Template` object on which _it cannot put any trust_. The design of the Template object _makes any insecure input string instantly secure_. Please note that this design goes pretty much in the opposite direction of the [`LiteralString`](https://typing.python.org/en/latest/spec/literal.html#literalstring) defined in [PEP 675](https://peps.python.org/pep-0675/): concatenating a safe `LiteralString` and an unsafe `str` produces an unsafe `str`. This is bizarre: - with a `LiteralString`, anything safe, when it comes in contact with something unsafe, becomes unsafe. - with template strings, anything unsafe, when it comes in contact with something safe, becomes safe! This, I am afraid, is not a well-thought-out design from the safety point of view. `LiteralString`, in our project, is so relevant that [it is actually the only string accepted](https://www.psycopg.org/psycopg3/docs/advanced/typing.html#checking-literal-strings-in-queries) by the `execute()` function: we [define our `Query` type](https://github.com/psycopg/psycopg/blob/b799f50ae4905539037cdeabcf571441d62ec0fc/psycopg/psycopg/abc.py#L29) as: ```python Query: TypeAlias = Union[LiteralString, bytes, sql.SQL, sql.Composed] ``` What we require is for a statement to be either a literal string or to have been produced by composition of the [`psycopg.sql`](https://www.psycopg.org/psycopg3/docs/api/sql.html) family of objects, which are designed to compose the different parts of a SQL statement employing the correct escaping method (and whose use would become largely marginal with a good template string solution). While [there isn't widespread support](https://github.com/python/mypy/issues/12554) for this feature yet in type checkers, this is our formal requirement for a query, and even if, every now and then, [some user is confused by linter errors](https://github.com/psycopg/psycopg/issues?q=is%3Aissue%20LiteralString), it is no problem to explain our design. I have [looked for information](https://discuss.python.org/t/pep-750-disallow-str-template/90281/1) about the rationale of the current design and I have been [kindly provided some references](https://discuss.python.org/t/pep-750-disallow-str-template/90281/2). There were [some discussions](https://discuss.python.org/t/pep-750-tag-strings-for-writing-domain-specific-languages/60408/207) mixing explicit and implicit concatenation, with a [final resolution](https://discuss.python.org/t/pep-750-tag-strings-for-writing-domain-specific-languages/60408/226) stating: > Added full support for both explicit and implicit concatenation. `template+template`, `template+str`, and `str+template` are all supported. Concatenation always results in a `Template`. In the end, we decided the arguments in favor of allowing concatenation outweighed the potential disadvantages. We’ve updated the “rejected ideas” section of the PEP to describe this. There are indeed [explanations in the PEP](https://peps.python.org/pep-0750/#disallowing-string-concatenation) stating: > In the end, we decided that the surprise to developers of a new string type not supporting concatenation was likely to be greater than the theoretical harm caused by supporting it. (Developers concatenate f-strings all the time, after all, and while we are sure there are cases where this introduces bugs, it’s not clear that those bugs outweigh the benefits of supporting concatenation.) This statement is misguided. People can concatenate f-strings and normal strings without a problem because 1) they are the same type and 2) there is no safety semantics behind `str`. The type of bug that can be caused by disallowing `str + Template` is an immediate `TypeError`; the type of bug that can be caused by allowing it is a safety bug. Using the current design, accepting a template string in a query cannot be considered safe. We are back to the point of people being able to compose queries such as: ```python name = input() cur.execute(t"INSERT INTO names VALUES ('" + name + "')") ``` and no runtime or static checker should have any problem with it. This is less safe than `LiteralString` or `sql.SQL` objects, which require an active action from the user to allow a `str` to be part of a statement, signifying that the author has taken their measures to prevent problems: ```python snip: str cur.execute(sql.SQL("SELECT * FROM table WHERE ") + sql.SQL(snip)) # I know what I am doing cur.execute("SELECT * FROM table WHERE " + cast(LiteralString, snip)) # I know what I am doing cur.execute(t"SELECT * FROM table WHERE " + Template(snip)) # I know what I am doing cur.execute(t"SELECT * FROM table WHERE " + snip) # This might be an error ``` This goes in the opposite direction of where we want to go, in terms of safety. Therefore, we cannot, in our conscience, allow the use of template strings as query input, and, [despite the initial enthusiasm](https://github.com/psycopg/psycopg/discussions/1044), we will prefer to not merge the feature. I understand that [the template string branch](https://github.com/python/cpython/pull/132662) was merged to the Python 3.14 branch only yesterday; version 3.14a7 didn't include the feature, and 3.14b1 is due to be released in a few days, after which no change would be accepted. I believe we are still in time to fix this design. Thank you very much. -- Daniele

dkp · May 1, 2025, 3:35pm

Thank you for bringing this to our attention, @dvarrazzo

After discussion, we agree that this footgun should be eliminated by removing Template.__radd__ and removing support for Template + str in Template.__add__. We will provide PRs for both the PEP and for cpython and propose their adoption to the SC.

A simple argument in favor:

One way to look at Template is that it’s a language-provided tool for tracking trusted and untrusted parts of strings
We don’t know whether we should trust an arbitrary str. But __add__ in the current spec always treats arbitrary strings as trusted. That seems wrong.

The change is restrictive, rather than additive, from the current spec.

If we make this change:

developers can still concatenate by first marking their string as trusted with Template(my_str), or marking as untrusted with Template(Interpolation(my_str))
we will continue to support Template + Template, which is always safe
we will continue to support implicit str + Template and Template + str since here the str is a presumed-safe literal (and, as you note, goes through a different mechanism).

I went back through the full discussion history (thanks @MegaIng!) and could find no examples that would be negatively impacted by these changes.

But if, for some reason, this update is not adopted, where does that leave us? You say:

Daniele Varrazzo:

But someone can write too easily:
cur.execute(t"SELECT " + field + t" FROM table WHERE id = {id}")
which is a door open to injection. This is not worse than just using strings to compose queries, which we have actively discouraged for years, but definitely not better and definitely worse than the safety we currently have in place thanks to the psycopg.sql objects.

Maybe I’m missing something, but doesn’t the same footgun exist with SQL()? That is, can’t I write query = SQL("SELECT " + field + "FROM table ...") and break security guarantees there, too?

dkp · May 1, 2025, 3:37pm

Ah, looks like we crossed streams. I’ll comment on that issue, too. Thanks!

dvarrazzo · May 1, 2025, 3:46pm

Hello Dave,

This is great news! Thank you very much for considering this improvement

The input of sql.SQL is a LiteralString, therefore a good type checker would pick up on it. Currently, I undersand, Mypy doesn,'t, but Pyre and Pyright do.

It’s not perfect, and it cannot be checked at runtime, but we do what we can…

dkp · May 1, 2025, 3:50pm

Sure. I suppose tools could provide a similar lint rule for t-string concat and, as currently spec’d, t-strings would be effectively no worse (but also no better!) than sql.SQL.

dvarrazzo · May 1, 2025, 3:54pm

AFAICS Template.strings should be a tuple[LiteralString, ...], yes.