Concerned about Virality of T-Strings (PEP 750)

More OT: Even implicit concatenation is a footgun. IIRC while at Google we had code containing long lists of comma-separated string literals (e.g. a list of countries) and devs would occasionally update the list but forget to add the comma. For this reason Blaze’s Starlark doesn’t support implicit concatenation at all.

17 Likes

I don’t see how that’s any different here if it can compile time concatenate with an f-string that might have a user provided value, but I also think it was inherently unsafe for webapps to pass unrestricted user input anywhere prior to basic safety checks unless the function was documented to be safe for arbitrary inputs.

1 Like

Revisiting older conversations about Template concatenation, my read is that there are roughly three different perspectives. From most to least permissive:

  1. The developer experience view. Reflected in the current PEP and implementation. A Template literal looks like any other (f-)string. Python developers expect string-like values to combine with other string-like values. Prohibiting explicit or implicit concatenation of Template and str would lead to needless surprise/frustration. (A variant holds that implicit concatenation can often be a footgun, so we’ll avoid extending it further.)

  2. The security view. Templates are a language level feature to help prevent injection vulnerabilities. Template.strings is synonymous with “trusted”; Template.interpolations with “untrusted”. Concatenating a Template with an arbitrary str is unsafe because we can’t know the str ’s trust level, so the operation should be disallowed. Developers can mark a string as trusted with Template(my_str) or untrusted with Template(Interpolation(my_str)) . The current PEP treats all str as trusted by default; from the security view, a clear footgun.

  3. The DSL view. Templates are building blocks for domain-specific languages. Interesting template processing code parses template content against some backing grammar. Some grammars allow concatenating conforming strings; others do not. As a result, even allowing Template + Template is probably a mistake. Code that processes a Template can instead return a domain type with __add__ /__radd__ when concatenation is appropriate, or rely on other composition mechanisms.

I suppose these views aren’t so far apart: if you take a more restrictive view than what ships in 3.14, you can configure a lint rule. Likewise, if you take a more permissive view, you can write a tiny helper function. For what it’s worth, I personally align with bucket (2): I think it’s likely to strike the right balance for the 80% case and lead to the least overall dev confusion + grumblingly extra configured lint rules. :slight_smile:

9 Likes

As the maintainer of MarkupSafe, I support your option 2, the security view.

MarkupSafe’s model is very similar. escape(str) always produces a processed string (t"str"), and Markup(str) produces a trusted string (Template(str)). Then all string operations on Markup escape their arguments.

  • Unsafe: str + str
  • Safe: escape(str)
  • Safe: escape(str) + str or str + escape(str), argument to __add__/__radd__ is automatically escaped

The downside is MarkupSafe needs to implement most dunder and str methods in order to catch all the ways strings can be manipulated, and there are non-trivial aspects to that. That’s sidestepped in t-strings because Template is not a string class.

Again, I agree with option 2. I think you should modify the implementation to disallow Template + string etc, but keep Template + Template because it doesn’t result in a loss of safe/unsafe information.

9 Likes

I am curious where this is going. It seems many people feel that implicit concatenation between Template and string is a footgun, and should be avoided. Is there a reason this isn’t being considered? How can I help progress this idea?

At least according to the PEP, security (preventing injection) seems to be the main motivator for PEP 750. So why is the security view #2 not the primary view?

The linting argument is somewhat valid, but I would argue that you could already get all the guarantees that PEP 750 provides by using types (with Literal) and linting. If PEP 750 only exists to allow linters to catch possible injection, I feel it has missed the mark.

1 Like

See the github issue.

You are not seeing any progress just because the Steering Council hasn’t made a ruling - they were occupied with PyCon and probably have more time now to make a decision in the next week(s).

2 Likes

Thank you!