PEP 750: Tag Strings For Writing Domain-Specific Languages

pf_moore · August 24, 2024, 2:20pm

+1 on this idea. It seems like a nice way of making lazy evaluation available, without making it mandatory. My only question is what would this look like to the renderer? Choosing whether or not to use !() is a decision made by the caller, so renderers need to be able to handle both possibilities without doing any explicit checks, otherwise there’s a really nasty coupling between the interface and the implementation.

thejcannon · August 24, 2024, 5:22pm

(I just wanna say that I appreciate that my desire to have the discussion/PEP mention editor support has blossomed, after worrying that the lazy/eager discussion might take the entire stage (rather than the center stage). I hope this means we can have something that is the best of both worlds (at the risk of derailing into neither of either world). Thank y’all for discussing/considering this aspect as well )

ncoghlan · August 25, 2024, 2:03am

Processing conversion specifiers is already up to template renderers in PEP 750 (they receive it as a string, same as the format specifier). PEP 501 currently evaluates them eagerly, but (assuming @nhumrich approves), I plan to amend it to follow PEP 750 in this regard (but relax the syntax to allow more than just a, r, and s).

To help with conventional processing, PEP 501 is going to propose that the format built-in gain lazy conversion support by accepting another optional parameter for the conversion specifier. I’m also inclined to offer a standard conversion API somewhere but am still considering potential spellings (adding operator.convert_field is my current thought, since a static method on types.TemplateLiteralField would be annoying to access when using the structural types rather than the concrete ones). This idea would be equally applicable to PEP 750.

steve.dower · August 25, 2024, 4:52pm

Actually, yeah, this is pretty nice. While broadly I’m not a fan of type annotations the way they’ve turned out, I am a big fan of inferring “types” from usage, so using “it gets assigned to an HTML specific type” or “it gets passed to a function that expects an HTML type” as the signal for an editor to treat it as HTML works for me.

But that now suggests that we may have separate typing and language PEPs - the first providing a way for editors to recognise DSLs store in string literals, and the second providing a way for the language to capture interpolations without immediately formatting them. It does appear to nullify the motivation of PEP 750, but I think we’re basically okay with “f-strings but generic” as a motivation anyway.

Strictly speaking, you wouldn’t tie the two PEPs together. But using the same “hypothetical” SQL statement function in both, to allow tagging a string literal as “this is a SQL statement” and to allow substitutions “following SQL statement rules”, ought to make it pretty clear that the most value comes from having both together.

So I’m leaning towards +1 on the “one generic interpolation for strings and arbitrary specific types for string literals” idea(s). Interested to hear if there are other concerns though (apart from the “harder for a simple highlighter to handle”, which I’m sympathetic towards, but I expect that would be the case anyway for any format other than copying Markdown - the quoting rules alone will make it incredibly hard to have “simple” highlighting for any structured string literal).

flyinghyrax · August 28, 2024, 9:44pm

I have a negative take on the annotation-based Template[L] proposal, essentially because:

It should also be emphasized that Python will remain a dynamically typed language, and the authors have no desire to ever make type hints mandatory, even by convention.

— PEP 484 - Type Hints - Rationale (emphasis original)

(Not that such a statement is a guarantee, but I strongly agree with the sentiment.^[1])

I really like the look of it, and it seems very ergonomic. A feature that requires annotation syntax in a library is one thing, but adding it to the core language would be something else, that I do not like the trend of.

It’s very possible I’ve misunderstood or overlooked a relevant part of the proposal, in which case please accept my apologies for the digression.

I use type hints in all Python I write and have for years, but I feel the optionality is important. ↩︎

jimbaker · August 29, 2024, 2:55am

First, I’m camping this week, and I’ve only had intermittent internet access. This is probably a good thing for my vacation, however!

It’s not required to specialized Template, but it’s possible to do so. In addition, the actual argument that is passed to the template function is only an object that supports the Template protocol as specified in the typing module; Template is not the concrete type. We have kept this in general separate so it’s possible to use type annotations (such as for structural typing) if desired via protocols like Interpolation and Decoded, but it’s not required.

Lastly, while I think a definition like

def html(template: Template[HTML]) -> HTML:
  ...

is aesthetically pleasing in its symmetry, there’s no requirement that this be done, nor the specific Annotation setup that was used.

In the PEP update, we will hopefully address this, by describing what is required to use templates, which is fairly minimal; and sketch out what might be best practices for our goal of better working with DSLs.

Our goal in all of this is to provide a delightful developer experience, so hopefully we can continue to work towards this.

flyinghyrax · August 29, 2024, 11:47am

Thank you for your response Jim! That addresses my concern and I look forward to following the updates.

Enjoy your vacation!

ncoghlan · August 29, 2024, 3:45pm

The work you’ve all been doing in that regard has been wonderful, including the advocacy for the value of delayed interpolation in general.

I suspect once your PEP 750 update and my PEP 501 updates are done, the main difference between the two PEPs (aside from cosmetic naming details) will just be whether we say “let’s start with t-strings now, and consider tagged strings later after we have more experience with t-strings^[1]” (PEP 501) or “let’s just go with tagged strings immediately” (PEP 750).

The ironic echoes between the PEP relationship here and the one between PEP 498 and PEP 501 several years ago are not lost on me. History doesn’t repeat, but it does rhyme ↩︎

pauleveritt · September 5, 2024, 6:09pm

Hi everybody, thanks for all the comments. We catalogued everything, talked it over, and made some decisions about changes. To make it easier: here’s an explainer. Next up: update the PEP.

Don’t wanna click? Here’s the top-line.

No more lazy evaluation of interpolations
No more tag functions, no more “tag”
Instead, a single t prefix which passes a Template to your function
Template.source which has the full-string
A normative section on how to signal the DSL to tools
Better examples and explanations of the need

fungi · September 5, 2024, 6:34pm

Apologies for the tangent, but where does this word “explainer”
originate? My English language dictionary says it’s a person who
explains (a commentator or interpreter), but that doesn’t seem to
fit with your usage of the term.

Jelle · September 5, 2024, 7:21pm

I think it can also mean a document that explains something. See the second definition in explainer - Wiktionary, the free dictionary.

ncoghlan · September 5, 2024, 11:39pm

Nice! I had already noted in PEP 501 that it and PEP 750 had become very close in how they would work (since we adopted most of the design after 750 was published), and this largely brings the syntactic proposals together.

I’ll rework it to focus on the remaining major difference, which is that PEP 501 goes a bit further than PEP 750 does where conversion specifiers are concerned:

!() to call the expression at rendering time
!!custom for template renderers to define their own custom conversion specifiers

My main rationale for proposing this is that if we don’t standardise mechanisms for these up front, I’d expect to see a variety of ad hoc renderer specific solutions jamming this info into the already customisable format specifier field.

Standardisation also means we can provide API support for handling them in custom renderers.

steve.dower · September 6, 2024, 10:01am

Isn’t the entire point of a t-string that renderers get to define their own custom conversion? If you’re just having a normal __format__() conversion, then use an f-string?

I think it’d be a shame if the supported syntax of f-strings and t-strings diverged, even though the interpretation/behaviour should. (IOW, I think we should be adding !() to f-strings as well.)

ncoghlan · September 6, 2024, 7:53pm

It depends on how you teach it. PEP 501 explicitly builds up from the notion that format(t"some format string") means the same thing as f"some format string" (just with “interpolate values” and “render to a formatted string” as separate steps), so any template can be converted to a text string that way (it’s also what TemplateLiteral.__str__ would do internally, whereas TemplateLiteral.__repr__ would print the equivalent TemplateLiteral constructor call).

You’re right that if that’s all you want, an f-string will be faster and clearer (and I’d expect linters to warn about a redundant template literal if they encounter format(t"some format string") in linted code).

As far as !!custom conversion specifiers go, I expect to see custom template renderers fall into three distinct categories:

invidual field formatting is the same as f-strings, the custom renderer behaviour lies elsewhere (e.g. shlex.sh wrapping the post-conversion-and-formatting field values in shlex.quote)
individual field formatting is mostly the same as f-strings, but there are some additional renderer-specific directives (this is where !!custom is intended to be useful: allowing that level of customisation, while still keeping the default __str__, __format__, and render methods on the literal instance working)
individual field formatting is nothing like f-strings. The conversion specifier and field specifier are both processed by the renderer without reference to their conventional meanings (so the default __str__, __format__, and render methods on the literal instance won’t be useful). Such templates would be passed directly to an object that actually understands them rather than being kept around as raw template literals (this is the level where PEP 750’s original tagged strings proposal seemed to expect all custom renderers to operate)

Yeah, I just convinced myself of that too. The extra conversion specifier syntax isn’t useful in f-strings, but it means the equivalence between f-strings and formatted t-strings is complete in both directions (the current unpublished draft of PEP 501 has t-strings supporting some conversion specifiers that f-strings don’t, which means there would be some cases where changing format(t"some format string") to f"some format string" would fail to compile instead of producing the same result).

jimbaker · September 9, 2024, 12:42am

First, let’s look at what PEP 750 supports, as revised to use a template string approach (or t-strings). Template functions take a template defined by a t-string, and return some object relevant to the domain specific language (DSL). Our goals remain the same:

Support using DSLs within Python, with a Pythonic syntax. Such DSLs include HTML and SQL.
Developer experience is considered for both template function writers as well as users of template functions.
Minimize opportunities for security holes, specifically injection attacks. In particular, t-strings are source code for the DSL.

As seen in this discussion, we believe we addressed these goals, including by refining our approach (t-strings instead of tag strings, removal of deferred evaluation of interpolation values, typing considerations, etc). Most DSLs - certainly HTML and SQL - require context sensitivity to appropriately fill (or render) interpolations, especially when considering the nesting enabled by PEP 701. This can be accomplished by the following:

Parse the provided template, including a mapping of interpolations to placeholders, to an AST.
Walk the AST, fill with respect to this context any interpolations; or alternatively compile/transpile code to do the same for potential greater efficiency.

A straightforward example is to consider that interpolations should be filled differently if used as an attribute for an HTML element, vs a child text element. (We are keeping it simple by not considering building some DOM; of course it can help here with context, but one still needs to get a DOM from the t-string; the parse must be done for that abstraction.)

With this in mind, I will now review the current PEP 501 in PEP 501 – General purpose template literal strings | peps.python.org

Rendering templates

Prior to the recent update of PEP 501 to use classes derived from PEP 750, PEP 501 – General purpose template literal strings | peps.python.org, the core functionality provided to work with templates is the equivalent of the current TemplateLiteral.render. This function is reminiscent of WSGI in that it uses a callback approach, in this case three callbacks (this aspect has not changed in the latest version of PEP 501). First callbacks for render_text (default is str) and render_field (default is format) are called successively; then the overall callback render_template is called.

The problem here is that the bottom-up process supported here is not suitable for nearly all DSLs except possibly shell and other similar simple languages that can work with a simple text substitution model with quoting. In order to work with DSLs, it’s necessary to do one of two things:

render_text and render_field are passed identity functions for their callbacks; render_template then is a given a list of the TemplateLiteralField (= InterpolationConcrete in PEP 750) and TemplateLiteralText (= DecodedConcrete). It can iterate over this list again.
Using bound methods, it should be possible to use some sort of continuation scheme to avoid this extra iteration. However this results in significant extra complexity for the template function developer, thereby impacting their development experience.

Given this limitation, this render method is no longer necessary in PEP 501, given its recent updates.

Concatenation of template strings

As seen in the current implementation of TemplateLiteral.__add__ and TemplateLiteral.__radd__, regular strings can be added as text to any template. As mentioned earlier, such such should be considered as source code for the target DSL. This introduces a potential injection vulnerability that can be hard to detect. Such support should be removed.

In addition, arguably one should not concatenate source code at all in this way. A classic example in JavaScript is the following (run on Node) illustrates this point:

> function square(x) {
... return x * x
... }
undefined
> square(5)
25
> square + square
'function square(x) {\nreturn x * x \n}function square(x) {\nreturn x * x \n}'

One can also multiply the square function (returns a NaN) etc. While concatenation may suggest itself in SQL, say by adding a where clause, it can be easy to lose track of the required syntax, such as spacing. This also complicates how IDEs might provide support for typing the DSL source code, especially with respect to using +=.

Instead, one can simply use interpolations to compose recursively the desired source code.

Therefore, I suggest removing these methods - they promote a complicated composition scheme that often does not work for DSLs. In addition, removing these methods further simplifies the proposed Python equivalents of C code for PEP 501 by removing the need for a complicated merging process.

`TemplateLiteral.format` injection attack

In order to support the near equivalence of format(t'...') and f'...', a __format__ method is provided. However, this is also a potential vector for an injection attack as follows:

Suppose that there is some variable x bound to a user-provided malicious value, eg ; drop student_tables; or cat /etc/passwd (complexified as necessary to get through).
Further suppose that y is t'...{x}... and y is used in some function that provides HTML, SQL, etc, but without a template function, but instead uses the default __format__. One example that might slip through, but of course we can complexify as necessary: vulnerable_function(f'{y}').
Bang.

This method should therefore be removed. Templates should support repr output, and possibly some sort of pretty printing. But we cannot use the default Template.render which uses f-string formatting, namely the default callback for render_text of str.

`!!custom` rendering

Such support is redundant. Simply do t'{custom(...)}'. A similar observation is seen in PEP 498, but it decided to accept existing conversion support, much like PEP 750. However, we do not need to extend this further with this proposed change. See PEP 498 – Literal String Interpolation | peps.python.org

In addition, the relaxation of the parser support does allow for arbitrary composition of !!custom with !r, but it makes it difficult to follow. In addition, prolific use of conversion specifiers such as !()!custom!r may make it difficult to read (“line noise”).

`!()` lazy evaluation support

As mentioned earlier with respect to removing deferred evaluation of interpolations for PEP 750, this is not necessary. One can simply wrap the interpolation in a number of ways, including through frameworks. A prominent example is Django’s QuerySet, which is lazy. Such support also can enjoy static type analysis.

In addition, PEP 750 provided additional analysis, thanks to the review from Jelle Zijlstra and reference implementation work by Dave Peck, to support annotation scope. !() would presumably need this similar support for any class variables that use t-strings.

steve.dower · September 9, 2024, 3:55pm

Agreed with this. Keeping !s and !r (and… sigh… !a) are convenient and consistent (for bypassing DSL-specific processing), but the only reason to have a different custom conversion is to tell the DSL how something should be processed/formatted. A DSL-specific type called normally (as in t'{custom(...)}') or a DSL-specific format (as in t'{value:cdata}) should work just fine.

I’m neutral on this one (or perhaps “torn” is a better word). Telling a DSL “this should be called before rendering” is generic enough to apply equally to a range of languages, including f-strings, and is more reliable than a wrapper object that overrides __format__.^[1]

Because you’re explicitly passing in a callable, it doesn’t need annotation scope - that’s special because it’s specified as eagerly evaluated but we want to lazily evaluate it. These would just follow normal (i.e. lambda, or inner function) rules (and so you get all the risks we went through earlier, but because it’s opt-in then it’s okay).

The main thing I like about !() is that it encouraged the use of __call__ for this. Otherwise we end up with some additional protocol for lazy evaluation, which is too big to smuggle in via this PEP (and I’m not overly thrilled about anyway).

Anyway, I’m just thinking out loud here. If !() doesn’t make it, I’ll be just as happy as if it does. But I don’t think there’s another way to specify “the real value will be provided by this object later” that would satisfy me.

Which is fine for an f-string, where __format__ is the defined protocol, but not generic enough for a DSL where you don’t know whether it’s going to call str, repr or otherwise pull out the contents and render it itself. ↩︎

jimbaker · September 10, 2024, 12:20am

Some additional concerns:

Discoverability of this feature. Searching for !() brings up Rust macros and using boolean not in the top results with Google. Obviously this aspect could change with this feature being available.
If we ever did support Mark Shannon’s work on syntactic macros with that Rust-like syntax, that would preclude its use in this position (PEP 638 – Syntactic Macros | peps.python.org). Of course PEP 638 can also be used to support more powerful schemes than the originally proposed fexpr/call-by-expression approach (and its limited support for metaprogramming by working with __code__).

With that in mind, it’s quite reasonable to use some sort of descriptor-based approach instead:

class Wrapper:
    def __init__(self, callable):
        self._callable = callable

    @property
    def value(self):
        return self._callable()

and then use like so

x = Wrapper(lambda: 42)
do_something_with(t'{x.value}')

More importantly there are existing libraries/frameworks that make similar use of descriptors. One example is Param, as used in Panel, that provides more powerful capabiliites with descriptors such as its dynamic parameters and reactive parameters, including incrementally computing the expression graph.

ncoghlan · September 10, 2024, 2:17am

I still need to think about it some more, but I’m leaning towards accepting all of @jimbaker’s concerns with the PEP 501 amendments as valid, which would leave bikeshedding over names as the only remaining differences between the PEPs.

I’ll note that the suggested property based wrapper would still eagerly evaluate at interpolation time (since the attribute lookup is also eager), but a format based lazy evaluation wrapper would work:

class on_format:
    def __init__(self, callable):
        self._callable = callable

    def __format__(self, spec):
        return format(self._callable(), spec)

do_something_with(t'{on_format(lambda: 42)}')

Template renderers could also easily define their own wrapper class for deferred evaluation rather than relying on an existing one.

Edit: now that I’ve been talked back out of making changes to conversion specifiers, I’m also back to thinking they should be applied eagerly by the compiler so template renderers never see them outside the full source string (they just see the resulting strings in the individual interpolation fields). The original version of PEP 750 needed lazy conversion specifiers because it had lazy field evaluation, but that isn’t true anymore.

fried · October 4, 2024, 8:21pm

Was talking about T-Strings yesterday with a co-worker and So I made a use case.
And tested it on GitHub - lysnikolaou/cpython at tag-strings-rebased

dkp · October 17, 2024, 9:30pm

We’re excited to announce a revised version of PEP 750, now called “Template Strings”, based on the feedback we received from everyone here and throughout the Python community. Thank you all!

Quite a lot has changed since the previous version of the PEP. If you’re interested, now is a great time to re-read.

The PEP also references a small set of example code and tests. If you’d like to play with an early cpython build that supports PEP 750, the examples repo might be a good place to start.

PEP 750: Tag Strings For Writing Domain-Specific Languages

Rendering templates

Concatenation of template strings

TemplateLiteral.__format__ injection attack

!!custom rendering

!() lazy evaluation support

`TemplateLiteral.format` injection attack

`!!custom` rendering

`!()` lazy evaluation support