+1 on this idea. It seems like a nice way of making lazy evaluation available, without making it mandatory. My only question is what would this look like to the renderer? Choosing whether or not to use !()
is a decision made by the caller, so renderers need to be able to handle both possibilities without doing any explicit checks, otherwise there’s a really nasty coupling between the interface and the implementation.
(I just wanna say that I appreciate that my desire to have the discussion/PEP mention editor support has blossomed, after worrying that the lazy/eager discussion might take the entire stage (rather than the center stage). I hope this means we can have something that is the best of both worlds (at the risk of derailing into neither of either world). Thank y’all for discussing/considering this aspect as well )
Processing conversion specifiers is already up to template renderers in PEP 750 (they receive it as a string, same as the format specifier). PEP 501 currently evaluates them eagerly, but (assuming @nhumrich approves), I plan to amend it to follow PEP 750 in this regard (but relax the syntax to allow more than just a
, r
, and s
).
To help with conventional processing, PEP 501 is going to propose that the format
built-in gain lazy conversion support by accepting another optional parameter for the conversion specifier. I’m also inclined to offer a standard conversion API somewhere but am still considering potential spellings (adding operator.convert_field
is my current thought, since a static method on types.TemplateLiteralField
would be annoying to access when using the structural types rather than the concrete ones). This idea would be equally applicable to PEP 750.
Actually, yeah, this is pretty nice. While broadly I’m not a fan of type annotations the way they’ve turned out, I am a big fan of inferring “types” from usage, so using “it gets assigned to an HTML specific type” or “it gets passed to a function that expects an HTML type” as the signal for an editor to treat it as HTML works for me.
But that now suggests that we may have separate typing and language PEPs - the first providing a way for editors to recognise DSLs store in string literals, and the second providing a way for the language to capture interpolations without immediately formatting them. It does appear to nullify the motivation of PEP 750, but I think we’re basically okay with “f-strings but generic” as a motivation anyway.
Strictly speaking, you wouldn’t tie the two PEPs together. But using the same “hypothetical” SQL statement function in both, to allow tagging a string literal as “this is a SQL statement” and to allow substitutions “following SQL statement rules”, ought to make it pretty clear that the most value comes from having both together.
So I’m leaning towards +1 on the “one generic interpolation for strings and arbitrary specific types for string literals” idea(s). Interested to hear if there are other concerns though (apart from the “harder for a simple highlighter to handle”, which I’m sympathetic towards, but I expect that would be the case anyway for any format other than copying Markdown - the quoting rules alone will make it incredibly hard to have “simple” highlighting for any structured string literal).
I have a negative take on the annotation-based Template[L]
proposal, essentially because:
It should also be emphasized that Python will remain a dynamically typed language, and the authors have no desire to ever make type hints mandatory, even by convention.
— PEP 484 - Type Hints - Rationale (emphasis original)
(Not that such a statement is a guarantee, but I strongly agree with the sentiment.[1])
I really like the look of it, and it seems very ergonomic. A feature that requires annotation syntax in a library is one thing, but adding it to the core language would be something else, that I do not like the trend of.
It’s very possible I’ve misunderstood or overlooked a relevant part of the proposal, in which case please accept my apologies for the digression.
I use type hints in all Python I write and have for years, but I feel the optionality is important. ↩︎
First, I’m camping this week, and I’ve only had intermittent internet access. This is probably a good thing for my vacation, however!
It’s not required to specialized Template
, but it’s possible to do so. In addition, the actual argument that is passed to the template function is only an object that supports the Template
protocol as specified in the typing
module; Template
is not the concrete type. We have kept this in general separate so it’s possible to use type annotations (such as for structural typing) if desired via protocols like Interpolation
and Decoded
, but it’s not required.
Lastly, while I think a definition like
def html(template: Template[HTML]) -> HTML:
...
is aesthetically pleasing in its symmetry, there’s no requirement that this be done, nor the specific Annotation
setup that was used.
In the PEP update, we will hopefully address this, by describing what is required to use templates, which is fairly minimal; and sketch out what might be best practices for our goal of better working with DSLs.
Our goal in all of this is to provide a delightful developer experience, so hopefully we can continue to work towards this.
Thank you for your response Jim! That addresses my concern and I look forward to following the updates.
Enjoy your vacation!
The work you’ve all been doing in that regard has been wonderful, including the advocacy for the value of delayed interpolation in general.
I suspect once your PEP 750 update and my PEP 501 updates are done, the main difference between the two PEPs (aside from cosmetic naming details) will just be whether we say “let’s start with t-strings now, and consider tagged strings later after we have more experience with t-strings[1]” (PEP 501) or “let’s just go with tagged strings immediately” (PEP 750).
The ironic echoes between the PEP relationship here and the one between PEP 498 and PEP 501 several years ago are not lost on me. History doesn’t repeat, but it does rhyme ↩︎
Hi everybody, thanks for all the comments. We catalogued everything, talked it over, and made some decisions about changes. To make it easier: here’s an explainer. Next up: update the PEP.
Don’t wanna click? Here’s the top-line.
- No more lazy evaluation of interpolations
- No more tag functions, no more “tag”
- Instead, a single
t
prefix which passes aTemplate
to your function Template.source
which has the full-string- A normative section on how to signal the DSL to tools
- Better examples and explanations of the need
Apologies for the tangent, but where does this word “explainer”
originate? My English language dictionary says it’s a person who
explains (a commentator or interpreter), but that doesn’t seem to
fit with your usage of the term.
I think it can also mean a document that explains something. See the second definition in explainer - Wiktionary, the free dictionary.
Nice! I had already noted in PEP 501 that it and PEP 750 had become very close in how they would work (since we adopted most of the design after 750 was published), and this largely brings the syntactic proposals together.
I’ll rework it to focus on the remaining major difference, which is that PEP 501 goes a bit further than PEP 750 does where conversion specifiers are concerned:
!()
to call the expression at rendering time!!custom
for template renderers to define their own custom conversion specifiers
My main rationale for proposing this is that if we don’t standardise mechanisms for these up front, I’d expect to see a variety of ad hoc renderer specific solutions jamming this info into the already customisable format specifier field.
Standardisation also means we can provide API support for handling them in custom renderers.
Isn’t the entire point of a t-string that renderers get to define their own custom conversion? If you’re just having a normal __format__()
conversion, then use an f-string?
I think it’d be a shame if the supported syntax of f-strings and t-strings diverged, even though the interpretation/behaviour should. (IOW, I think we should be adding !()
to f-strings as well.)
It depends on how you teach it. PEP 501 explicitly builds up from the notion that format(t"some format string")
means the same thing as f"some format string"
(just with “interpolate values” and “render to a formatted string” as separate steps), so any template can be converted to a text string that way (it’s also what TemplateLiteral.__str__
would do internally, whereas TemplateLiteral.__repr__
would print the equivalent TemplateLiteral
constructor call).
You’re right that if that’s all you want, an f-string will be faster and clearer (and I’d expect linters to warn about a redundant template literal if they encounter format(t"some format string")
in linted code).
As far as !!custom
conversion specifiers go, I expect to see custom template renderers fall into three distinct categories:
- invidual field formatting is the same as f-strings, the custom renderer behaviour lies elsewhere (e.g.
shlex.sh
wrapping the post-conversion-and-formatting field values inshlex.quote
) - individual field formatting is mostly the same as f-strings, but there are some additional renderer-specific directives (this is where
!!custom
is intended to be useful: allowing that level of customisation, while still keeping the default__str__
,__format__
, andrender
methods on the literal instance working) - individual field formatting is nothing like f-strings. The conversion specifier and field specifier are both processed by the renderer without reference to their conventional meanings (so the default
__str__
,__format__
, andrender
methods on the literal instance won’t be useful). Such templates would be passed directly to an object that actually understands them rather than being kept around as raw template literals (this is the level where PEP 750’s original tagged strings proposal seemed to expect all custom renderers to operate)
Yeah, I just convinced myself of that too. The extra conversion specifier syntax isn’t useful in f-strings, but it means the equivalence between f-strings and formatted t-strings is complete in both directions (the current unpublished draft of PEP 501 has t-strings supporting some conversion specifiers that f-strings don’t, which means there would be some cases where changing format(t"some format string")
to f"some format string"
would fail to compile instead of producing the same result).
First, let’s look at what PEP 750 supports, as revised to use a template string approach (or t-strings). Template functions take a template defined by a t-string, and return some object relevant to the domain specific language (DSL). Our goals remain the same:
-
Support using DSLs within Python, with a Pythonic syntax. Such DSLs include HTML and SQL.
-
Developer experience is considered for both template function writers as well as users of template functions.
-
Minimize opportunities for security holes, specifically injection attacks. In particular, t-strings are source code for the DSL.
As seen in this discussion, we believe we addressed these goals, including by refining our approach (t-strings instead of tag strings, removal of deferred evaluation of interpolation values, typing considerations, etc). Most DSLs - certainly HTML and SQL - require context sensitivity to appropriately fill (or render) interpolations, especially when considering the nesting enabled by PEP 701. This can be accomplished by the following:
-
Parse the provided template, including a mapping of interpolations to placeholders, to an AST.
-
Walk the AST, fill with respect to this context any interpolations; or alternatively compile/transpile code to do the same for potential greater efficiency.
A straightforward example is to consider that interpolations should be filled differently if used as an attribute for an HTML element, vs a child text element. (We are keeping it simple by not considering building some DOM; of course it can help here with context, but one still needs to get a DOM from the t-string; the parse must be done for that abstraction.)
With this in mind, I will now review the current PEP 501 in PEP 501 – General purpose template literal strings | peps.python.org
Rendering templates
Prior to the recent update of PEP 501 to use classes derived from PEP 750, PEP 501 – General purpose template literal strings | peps.python.org, the core functionality provided to work with templates is the equivalent of the current TemplateLiteral.render
. This function is reminiscent of WSGI in that it uses a callback approach, in this case three callbacks (this aspect has not changed in the latest version of PEP 501). First callbacks for render_text
(default is str
) and render_field
(default is format
) are called successively; then the overall callback render_template
is called.
The problem here is that the bottom-up process supported here is not suitable for nearly all DSLs except possibly shell and other similar simple languages that can work with a simple text substitution model with quoting. In order to work with DSLs, it’s necessary to do one of two things:
render_text
andrender_field
are passed identity functions for their callbacks;render_template
then is a given a list of theTemplateLiteralField
(=InterpolationConcrete
in PEP 750) andTemplateLiteralText
(=DecodedConcrete
). It can iterate over this list again.- Using bound methods, it should be possible to use some sort of continuation scheme to avoid this extra iteration. However this results in significant extra complexity for the template function developer, thereby impacting their development experience.
Given this limitation, this render
method is no longer necessary in PEP 501, given its recent updates.
Concatenation of template strings
As seen in the current implementation of TemplateLiteral.__add__
and TemplateLiteral.__radd__
, regular strings can be added as text to any template. As mentioned earlier, such such should be considered as source code for the target DSL. This introduces a potential injection vulnerability that can be hard to detect. Such support should be removed.
In addition, arguably one should not concatenate source code at all in this way. A classic example in JavaScript is the following (run on Node) illustrates this point:
> function square(x) {
... return x * x
... }
undefined
> square(5)
25
> square + square
'function square(x) {\nreturn x * x \n}function square(x) {\nreturn x * x \n}'
One can also multiply the square
function (returns a NaN
) etc. While concatenation may suggest itself in SQL, say by adding a where clause, it can be easy to lose track of the required syntax, such as spacing. This also complicates how IDEs might provide support for typing the DSL source code, especially with respect to using +=
.
Instead, one can simply use interpolations to compose recursively the desired source code.
Therefore, I suggest removing these methods - they promote a complicated composition scheme that often does not work for DSLs. In addition, removing these methods further simplifies the proposed Python equivalents of C code for PEP 501 by removing the need for a complicated merging process.
TemplateLiteral.__format__
injection attack
In order to support the near equivalence of format(t'...')
and f'...'
, a __format__
method is provided. However, this is also a potential vector for an injection attack as follows:
- Suppose that there is some variable
x
bound to a user-provided malicious value, eg; drop student_tables;
orcat /etc/passwd
(complexified as necessary to get through). - Further suppose that
y
ist'...{x}...
andy
is used in some function that provides HTML, SQL, etc, but without a template function, but instead uses the default__format__
. One example that might slip through, but of course we can complexify as necessary:vulnerable_function(f'{y}')
. - Bang.
This method should therefore be removed. Templates should support repr
output, and possibly some sort of pretty printing. But we cannot use the default Template.render
which uses f-string formatting, namely the default callback for render_text
of str
.
!!custom
rendering
Such support is redundant. Simply do t'{custom(...)}'
. A similar observation is seen in PEP 498, but it decided to accept existing conversion support, much like PEP 750. However, we do not need to extend this further with this proposed change. See PEP 498 – Literal String Interpolation | peps.python.org
In addition, the relaxation of the parser support does allow for arbitrary composition of !!custom
with !r
, but it makes it difficult to follow. In addition, prolific use of conversion specifiers such as !()!custom!r
may make it difficult to read (“line noise”).
!()
lazy evaluation support
As mentioned earlier with respect to removing deferred evaluation of interpolations for PEP 750, this is not necessary. One can simply wrap the interpolation in a number of ways, including through frameworks. A prominent example is Django’s QuerySet
, which is lazy. Such support also can enjoy static type analysis.
In addition, PEP 750 provided additional analysis, thanks to the review from Jelle Zijlstra and reference implementation work by Dave Peck, to support annotation scope. !()
would presumably need this similar support for any class variables that use t-strings.
Agreed with this. Keeping !s
and !r
(and… sigh… !a
) are convenient and consistent (for bypassing DSL-specific processing), but the only reason to have a different custom conversion is to tell the DSL how something should be processed/formatted. A DSL-specific type called normally (as in t'{custom(...)}'
) or a DSL-specific format (as in t'{value:cdata}
) should work just fine.
I’m neutral on this one (or perhaps “torn” is a better word). Telling a DSL “this should be called before rendering” is generic enough to apply equally to a range of languages, including f-strings, and is more reliable than a wrapper object that overrides __format__
.[1]
Because you’re explicitly passing in a callable, it doesn’t need annotation scope - that’s special because it’s specified as eagerly evaluated but we want to lazily evaluate it. These would just follow normal (i.e. lambda, or inner function) rules (and so you get all the risks we went through earlier, but because it’s opt-in then it’s okay).
The main thing I like about !()
is that it encouraged the use of __call__
for this. Otherwise we end up with some additional protocol for lazy evaluation, which is too big to smuggle in via this PEP (and I’m not overly thrilled about anyway).
Anyway, I’m just thinking out loud here. If !()
doesn’t make it, I’ll be just as happy as if it does. But I don’t think there’s another way to specify “the real value will be provided by this object later” that would satisfy me.
Which is fine for an f-string, where
__format__
is the defined protocol, but not generic enough for a DSL where you don’t know whether it’s going to callstr
,repr
or otherwise pull out the contents and render it itself. ↩︎
Some additional concerns:
- Discoverability of this feature. Searching for
!()
brings up Rust macros and using boolean not in the top results with Google. Obviously this aspect could change with this feature being available. - If we ever did support Mark Shannon’s work on syntactic macros with that Rust-like syntax, that would preclude its use in this position (PEP 638 – Syntactic Macros | peps.python.org). Of course PEP 638 can also be used to support more powerful schemes than the originally proposed fexpr/call-by-expression approach (and its limited support for metaprogramming by working with
__code__
).
With that in mind, it’s quite reasonable to use some sort of descriptor-based approach instead:
class Wrapper:
def __init__(self, callable):
self._callable = callable
@property
def value(self):
return self._callable()
and then use like so
x = Wrapper(lambda: 42)
do_something_with(t'{x.value}')
More importantly there are existing libraries/frameworks that make similar use of descriptors. One example is Param, as used in Panel, that provides more powerful capabiliites with descriptors such as its dynamic parameters and reactive parameters, including incrementally computing the expression graph.
I still need to think about it some more, but I’m leaning towards accepting all of @jimbaker’s concerns with the PEP 501 amendments as valid, which would leave bikeshedding over names as the only remaining differences between the PEPs.
I’ll note that the suggested property based wrapper would still eagerly evaluate at interpolation time (since the attribute lookup is also eager), but a format based lazy evaluation wrapper would work:
class on_format:
def __init__(self, callable):
self._callable = callable
def __format__(self, spec):
return format(self._callable(), spec)
do_something_with(t'{on_format(lambda: 42)}')
Template renderers could also easily define their own wrapper class for deferred evaluation rather than relying on an existing one.
Edit: now that I’ve been talked back out of making changes to conversion specifiers, I’m also back to thinking they should be applied eagerly by the compiler so template renderers never see them outside the full source string (they just see the resulting strings in the individual interpolation fields). The original version of PEP 750 needed lazy conversion specifiers because it had lazy field evaluation, but that isn’t true anymore.