Concerned about Virality of T-Strings (PEP 750)

I know that PEP 750 is already accepted and its already in python beta, so there shouldn’t be any changes at this point; however, I am concerned with one specific feature of PEP 750 that I never fully realized until I started implementing it: the “string virality” portion.

Context: I am working on a SQL library for escaping SQL using t-strings. (python 3.14b1)

When writing my test cases, I was separating my t-strings by lines so that it’s more readable, example:

sql_string = t"SELECT id,name "
                   t"FROM users "
                   f"WHERE name={col}"

except… note that I accidentally put an f on the last line instead of t. This caused the user inputed value to become part of the “template” rather than an interpolation and opened me up to sql injection. The main idea with t-strings, is that I can enforce at the user api level that I receive a Template (or some other object) and disallow raw strings in order to prevent injection.
If I was to accidentally pass sql(f"some string") to my library, I would receive a type error. However, if I accidentally append an f-string onto a t-string, I completely open myself up to SQL injection. No amount of typing or API can prevent it.

I know this is correct behavior according to the spec. However, this feels like it defeats the security aspect of the PEP. I guess this could be solved with linters? But I wonder if it would be more correct to not allow t-strings to be concatenated with normal strings. Is it too late to do anything about this?

14 Likes

See PEP 750: disallow str + Template and the resulting PEP 750: restrict support for `Template` + `str` addition by davepeck · Pull Request #4395 · python/peps · GitHub

4 Likes

Apologies for missing earlier discussions. Thanks for the references.

That link has discussion about removing support for explicit addition, Template.add(str), but the mistake at the top of this thread was implicit concatenation in the parser, which is a different path. Did implicit concatenation also get removed?

3 Likes

Pinging @dkp

Good observation, thank you.

The amended version of the pep suggests implicit concatenation between template and str but it’s not immediately clear if the expression resulting from an f-string would be consider such. I have added a comment referencing this thread.

This simple experiment with Python 3.14 b1 shows that the concatenation of a t-string with an f-string should fail:

>>> dis.dis("""
... t"t1 {t2}" f"f1 {f2}"
... """)
  0           RESUME                   0

  2           LOAD_CONST               0 ('t1 ')
              LOAD_CONST               1 ('f1 ')
              LOAD_NAME                0 (f2)
              FORMAT_SIMPLE
              BINARY_OP               13 (+=)
              BUILD_TUPLE              2
              LOAD_NAME                1 (t2)
              LOAD_CONST               2 ('t2')
              BUILD_INTERPOLATION      2
              BUILD_TUPLE              1
              BUILD_TEMPLATE
              RETURN_VALUE

the f-string and the t-string are joined with a binary op += which the PEP authors are suggesting for removal. Therefore this would fail at runtime if the PEP amendment is accepted and add/radd removed.

A better iteration would be to make it fail at parsing time, but at least this danger exposed by this thread should be defused already by disallowing explicit concatenation.


Edit actually, no: I misread the disassembly. The expression currently becomes part of the template.

We’re still waiting for SC response to our open GitHub issue.

The request in the issue is to remove Template + str (and str + Template) concatenation, but to retain implicit concatenation. This includes implicit concatenation with both string literal and f-string literal; either we keep both forms of implicit concatenation (and let linters complain about t"" f"") or we revisit and remove both cases (removing the plausibly useful and safe t"" "" case).

3 Likes

This is the key point - it can’t be prevented.

At some point, it’s up to the developer to get their code right. We can’t hand-hold all the way to a final working product (it’s been tried many times, and we’re in the midst of an AI-driven attempt to do it, and most of us recognise all the limitations).

Linters can definitely help. Your library can also help by reporting that a t-string was passed with no substitutions,[1] or possibly even warn that a value was provided directly[2] and recommend that all values, even constants, should be specified by interpolation.

Not at all, because no matter what approach is taken here developers can still write bugs and those bugs may cause their applications to be exploitable.

With my PSRT hat on, if this was reported as a “security issue” in Python we’d reject it immediately and point towards the application that has the bug. There’s nothing inherently insecure about allowing implicit string concatenation here, since it’s well-defined.[3]

The entire point of a programming language is to give developers ways to write bugs. Without that ability, it wouldn’t be a language - it would be a finished product.

That said, it’s not too late to change it. But the only change I’d suggest is to disable concatenation entirely (and hence break line wrapping as you started from) until we can figure out how to do it safely (this is what we did for backslashes in f-strings and it worked out perfectly). But since there really aren’t many alternatives here, and they aren’t mutually exclusive, I don’t think we gain anything by buying more time.


  1. Which I know only works for this particular example, but it helps as much as anything else we might no. ↩︎

  2. Since you have the semantic understanding of a SQL statement, but Python does not, so you know what a constant is. ↩︎

  3. Though personally I’m always a bit uncomfortable with implicit string concatenation. Still, the same issue exists with explicit concatenation, so that’s irrelevant to the issue. ↩︎

7 Likes

This seems like perhaps the only remaining footgun and with very little usefulness to boot - I’m a little baffled by the opposition. In what other ways can you imagine a user accidentally intermixing strings with t-strings with ease, provided that the SC approve dropping +? Which is to say, why is this the precise point where the developer must “get their code right”? Unless someone can provide a very good reason for being able to “cross” the string boundary syntactically, why not just get rid of both explicit and implicit concatenation? I also fail to see what’s to be gained from removing implicit concatenation of t-strings themselves.

19 Likes

Confirmed that it’s on our agenda, but we’ve obviously taken a couple of weeks off for PyCon.

6 Likes

I think implicit concatenation between two t-strings is still useful, but I agree that implicit concatenation between plain string literals and t-string literals is questionable too. Although technically safe, it creates a point of potential confusion, where people are going to ask why you can implicitly concatenate a plain string, but not a f-string.

Rather than create this potential confusion, just make the rules the same as for the + operator. Or how implicit concatenation works for byte literals. Just disallow mixing types. I.e. you can’t mix strings and bytes, so you also shouldn’t be able to mix strings and templates. That’s much easier to explain to people, than “f-strings may be insecure”.

This also avoids another potential footgun, although this one has little to no security implications and already exists for f-strings:

foo = 1
bar = 2
 t'{foo}' '{bar}'  # foo is interpolated, but bar is not
15 Likes

Is this actually consistent with how things have been treated though? int was broken in a patch release because of web applications not doing safe user input handling that prevented pathological exploitable cases in their app. I personally think that was a mistake, and should have been fixed in the app, because it was consistent with the language’s goal of having arbitrary sized ints (maybe that’s a mistake, but neither here nor there for this)

However, part of the goals of t-strings are to have safe templating not subject to the issues of using normal strings as templates. Implicit concatenation seems at odds with the goal. This seems worse than something it was deemed reasonable to break in a patch release.

I would feel much better about changing this now, before it’s in a stable release, than exposing users to this footgun, and I certainly don’t want a repeat of breaking a builtin later on.

12 Likes

Hm. I agree with the OP that t"" f"" is harmful, and I feel that t"" "" is also harmful – IMO it’s just as easy to forget the ‘t’ as it is to type ‘f’ instead.

I would be happier if implicit concatenation of t"" with a non-t-string was forbidden. Making you write t"" t"" feels safer, and no burden.

I actually feel more comfortable with keeping t"" + f"" which looks more likely to be intentional, though I don’t feel strongly about it.

28 Likes

I agree that implicit concatenation is a footgun, and easily resolved by adding a t prefix to your literal strings, but addition has the benefit of brevity when you’ve got string variables. If we lose concatenation, t'my {template} str' + my_str has to become appendstr(t'my {template} str', my_str) or something of the sort.

My personal preference would be: Disable implicit concatenation, and let linters warn about explicit concatenation with literals.

While we haven’t discussed this as a SC yet, my personal take is that it would be fine to be conservative in our initial t-strings implementation and disallow implicit concatenation of t"" and any other form of strings. If that means this text from the PEP:

The Template type supports the __add__() and __radd__() methods between two Template instances and between a Template instance and a str .

gets shorter and we disallow Template + str add/radd in 3.14… Is that really a problem?

Loosening this restriction in the future is easy, something that was a SyntaxError before would simply no longer be. (like except parenthesis in PEP 758).

Tightening such a restriction after we’ve shipped the feature is not… as code would exist that depends on it not being a SyntaxError.

food for thought.

23 Likes

It can be Template(*(t'my {template} str'), my_str), which is practical enough to handle for people writing a templates handler (e.g. the author of a sql or html library), and impractical enough for users of such libraries to write it accidentally.

2 Likes

We’re getting off topic, but the differences here is that int is used deep in the standard library in places that users can’t possibly patch or protect against, and also the int restriction requires a per-app option to disable it, while compile-time string concatenation does not.

A user who writes a bug with implicit string concatenation can fix it directly. A user whose application breaks because one of their libraries calls a standard library function that needs an arbitrary limit applied but doesn’t, cannot. (Yes, I know that’s a complicated sentence. The reasoning is complicated, which is why it doesn’t apply directly here.)

This sounds fine to me. I think concatenation is important, but implicit concatenation less so.

agreed. It became an unexpected mini-footgun with f-strings as well because auto formatters either needed to gain complexity/risk and become accurate parsers of f-string syntax and python expressions :weary_cat: or just give up ever reflowing string contents across an f-string boundary.

I like implicit concatenation. It’s just a lot more complicated once “strings” potentially participating have internal meanings.

3 Likes

That text relates to explicit concatenation, not implicit. The part to change is immediately before it:

Python’s implicit concatenation syntax is also supported. The following code will work as expected:

name = "World"
assert (t"Hello " t"World").strings == ("Hello World",)
assert ("Hello " t"World").strings == ("Hello World",)

Which either becomes “…is not supported…” or “…is only supported when all literals have a t prefix…” (and the example updated appropriately).

2 Likes