PEP 750: disallow str + Template

Hello, it’s still Daniele here, the guy who is making Python talking with Postgres since 2010 :waving_hand:

The discussion PEP 750: Please add Template.join() was based on the idea that if Template + str and str + Template are deemed safe operation, then an eventual t"".join() would be safe to implement on top of them.

Turns out that Template + str is an insecure operation, allowing to authorize whatever unsafe input:

>>> evil = "<evil>"
>>> t"<good>" + evil
Template(strings=('<good><evil>',), interpolations=())

What was the rationale to allow Template.__add__() to accept a string? Is there any reference?

Can this footgun be defused before releasing the feature in Python 3.14?

It doesn’t seem -as proposed in the other thread- related to implicit concatenation of string and t-string literals, dis says that it’s handled by the parser and doesn’t seem to need __add__:

>>> dis.dis("""
... template = t"foo {name} bar"  "baz"
... """)
  0           RESUME                   0

  2           LOAD_CONST               3 (('foo ', ' barbaz'))
              LOAD_NAME                0 (name)
              LOAD_CONST               1 ('name')
              BUILD_INTERPOLATION      2
              BUILD_TUPLE              1
              BUILD_TEMPLATE
              STORE_NAME               1 (template)
              LOAD_CONST               2 (None)
              RETURN_VALUE

Just trying to figure out t-string safety implications before building a database driver on top of it. If an user takes a field name from unsafe input and wants to build a query:

field = input()

currently they have to use a sql.Identifier object before being able to merge it to a string:

query = SQL("SELECT {field} FROM table WHERE id = %s").format(field=Identifier(field))
cur.execute(query, [id,])

If anyone forgot the Identifier() wrapper and passed field=field, the field name would be inserted as harmless and well escape string literal: SELECT 'field_name' FROM table....

With t-strings the same operation could be, for example:

cur.execute(t"SELECT {field:i} FROM table WHERE id = {id}")

This is safe for the same reason as above: if anyone forgot the :i “identifier” format, the field name would be passed as literal

But someone can write too easily:

cur.execute(t"SELECT " + field + t" FROM table WHERE id = {id}")

which is a door open to injection. This is not worse than just using strings to compose queries, which we have actively discouraged for years, but definitely not better and definitely worse than the safety we currently have in place thanks to the psycopg.sql objects.

I am afraid this veers towards being a deal breaker.

17 Likes

It was mentioned here: PEP 750: Tag Strings For Writing Domain-Specific Languages - #196 by jimbaker

And has been discussed in some depth starting here: PEP 750: Tag Strings For Writing Domain-Specific Languages - #207 by nhumrich

With a resolution of this subdiscussion AFAICT at PEP 750: Tag Strings For Writing Domain-Specific Languages - #226 by dkp

The final resulting note in the rejected ideas section doesn’t mention security concerns: PEP 750 – Template Strings | peps.python.org

It also doesn’t distinguish between implicit literal-only and explicit concatenation.

2 Likes

Thank you very much for the references. Very appreciated.

So, the pep says:

In the end, we decided that the surprise to developers of a new string type not supporting concatenation was likely to be greater than the theoretical harm caused by supporting it. (Developers concatenate f-strings all the time, after all, and while we are sure there are cases where this introduces bugs, it’s not clear that those bugs outweigh the benefits of supporting concatenation.)

This seems pretty misguided to me. Developers concatenate f-strings and strings all the time because… they are exactly the same, strings, expressed in two different syntactic forms. Strings and templates are two different objects.

it’s not clear that those bugs outweigh the benefits of supporting concatenation

It is dramatically clear to me.

7 Likes

As a process note, the PEP has been accepted as-is, including explicit (+) concatenation. The reference implementation has now been merged, and will be released in 3.14.0 beta 1, on Tuesday.

If you have severe concerns, the governance authority that can sanction removing + concatenation is the Steering Council. I would suggest, should you wish to pursue this, writing up a detailed summary of the concerns with explicit concatenation in an issue on the Steering Council repository, requesting a decision from them. That is the most productive course of action given the context.

A

8 Likes

Thank you very much, @AA-Turner. I have opened an issue to the Steering Council

2 Likes

Thank you for bringing this to our attention, @dvarrazzo

After discussion, we agree that this footgun should be eliminated by removing Template.__radd__ and removing support for Template + str in Template.__add__. We will provide PRs for both the PEP and for cpython and propose their adoption to the SC.

A simple argument in favor:

  1. One way to look at Template is that it’s a language-provided tool for tracking trusted and untrusted parts of strings
  2. We don’t know whether we should trust an arbitrary str. But __add__ in the current spec always treats arbitrary strings as trusted. That seems wrong.

The change is restrictive, rather than additive, from the current spec.

If we make this change:

  • developers can still concatenate by first marking their string as trusted with Template(my_str), or marking as untrusted with Template(Interpolation(my_str))
  • we will continue to support Template + Template, which is always safe
  • we will continue to support implicit str + Template and Template + str since here the str is a presumed-safe literal (and, as you note, goes through a different mechanism).

I went back through the full discussion history (thanks @MegaIng!) and could find no examples that would be negatively impacted by these changes.

But if, for some reason, this update is not adopted, where does that leave us? You say:

Maybe I’m missing something, but doesn’t the same footgun exist with SQL()? That is, can’t I write query = SQL("SELECT " + field + "FROM table ...") and break security guarantees there, too?

5 Likes

Ah, looks like we crossed streams. :slight_smile: I’ll comment on that issue, too. Thanks!

1 Like

Hello Dave,

This is great news! Thank you very much for considering this improvement :slight_smile:

The input of sql.SQL is a LiteralString, therefore a good type checker would pick up on it. Currently, I undersand, Mypy doesn,'t, but Pyre and Pyright do.

It’s not perfect, and it cannot be checked at runtime, but we do what we can…

2 Likes

Sure. I suppose tools could provide a similar lint rule for t-string concat and, as currently spec’d, t-strings would be effectively no worse (but also no better!) than sql.SQL.

AFAICS Template.strings should be a tuple[LiteralString, ...], yes.