First, let’s look at what PEP 750 supports, as revised to use a template string approach (or t-strings). Template functions take a template defined by a t-string, and return some object relevant to the domain specific language (DSL). Our goals remain the same:
-
Support using DSLs within Python, with a Pythonic syntax. Such DSLs include HTML and SQL.
-
Developer experience is considered for both template function writers as well as users of template functions.
-
Minimize opportunities for security holes, specifically injection attacks. In particular, t-strings are source code for the DSL.
As seen in this discussion, we believe we addressed these goals, including by refining our approach (t-strings instead of tag strings, removal of deferred evaluation of interpolation values, typing considerations, etc). Most DSLs - certainly HTML and SQL - require context sensitivity to appropriately fill (or render) interpolations, especially when considering the nesting enabled by PEP 701. This can be accomplished by the following:
-
Parse the provided template, including a mapping of interpolations to placeholders, to an AST.
-
Walk the AST, fill with respect to this context any interpolations; or alternatively compile/transpile code to do the same for potential greater efficiency.
A straightforward example is to consider that interpolations should be filled differently if used as an attribute for an HTML element, vs a child text element. (We are keeping it simple by not considering building some DOM; of course it can help here with context, but one still needs to get a DOM from the t-string; the parse must be done for that abstraction.)
With this in mind, I will now review the current PEP 501 in PEP 501 – General purpose template literal strings | peps.python.org
Rendering templates
Prior to the recent update of PEP 501 to use classes derived from PEP 750, PEP 501 – General purpose template literal strings | peps.python.org, the core functionality provided to work with templates is the equivalent of the current TemplateLiteral.render
. This function is reminiscent of WSGI in that it uses a callback approach, in this case three callbacks (this aspect has not changed in the latest version of PEP 501). First callbacks for render_text
(default is str
) and render_field
(default is format
) are called successively; then the overall callback render_template
is called.
The problem here is that the bottom-up process supported here is not suitable for nearly all DSLs except possibly shell and other similar simple languages that can work with a simple text substitution model with quoting. In order to work with DSLs, it’s necessary to do one of two things:
render_text
and render_field
are passed identity functions for their callbacks; render_template
then is a given a list of the TemplateLiteralField
(= InterpolationConcrete
in PEP 750) and TemplateLiteralText
(= DecodedConcrete
). It can iterate over this list again.
- Using bound methods, it should be possible to use some sort of continuation scheme to avoid this extra iteration. However this results in significant extra complexity for the template function developer, thereby impacting their development experience.
Given this limitation, this render
method is no longer necessary in PEP 501, given its recent updates.
Concatenation of template strings
As seen in the current implementation of TemplateLiteral.__add__
and TemplateLiteral.__radd__
, regular strings can be added as text to any template. As mentioned earlier, such such should be considered as source code for the target DSL. This introduces a potential injection vulnerability that can be hard to detect. Such support should be removed.
In addition, arguably one should not concatenate source code at all in this way. A classic example in JavaScript is the following (run on Node) illustrates this point:
> function square(x) {
... return x * x
... }
undefined
> square(5)
25
> square + square
'function square(x) {\nreturn x * x \n}function square(x) {\nreturn x * x \n}'
One can also multiply the square
function (returns a NaN
) etc. While concatenation may suggest itself in SQL, say by adding a where clause, it can be easy to lose track of the required syntax, such as spacing. This also complicates how IDEs might provide support for typing the DSL source code, especially with respect to using +=
.
Instead, one can simply use interpolations to compose recursively the desired source code.
Therefore, I suggest removing these methods - they promote a complicated composition scheme that often does not work for DSLs. In addition, removing these methods further simplifies the proposed Python equivalents of C code for PEP 501 by removing the need for a complicated merging process.
TemplateLiteral.__format__
injection attack
In order to support the near equivalence of format(t'...')
and f'...'
, a __format__
method is provided. However, this is also a potential vector for an injection attack as follows:
- Suppose that there is some variable
x
bound to a user-provided malicious value, eg ; drop student_tables;
or cat /etc/passwd
(complexified as necessary to get through).
- Further suppose that
y
is t'...{x}...
and y
is used in some function that provides HTML, SQL, etc, but without a template function, but instead uses the default __format__
. One example that might slip through, but of course we can complexify as necessary: vulnerable_function(f'{y}')
.
- Bang.
This method should therefore be removed. Templates should support repr
output, and possibly some sort of pretty printing. But we cannot use the default Template.render
which uses f-string formatting, namely the default callback for render_text
of str
.
!!custom
rendering
Such support is redundant. Simply do t'{custom(...)}'
. A similar observation is seen in PEP 498, but it decided to accept existing conversion support, much like PEP 750. However, we do not need to extend this further with this proposed change. See PEP 498 – Literal String Interpolation | peps.python.org
In addition, the relaxation of the parser support does allow for arbitrary composition of !!custom
with !r
, but it makes it difficult to follow. In addition, prolific use of conversion specifiers such as !()!custom!r
may make it difficult to read (“line noise”).
!()
lazy evaluation support
As mentioned earlier with respect to removing deferred evaluation of interpolations for PEP 750, this is not necessary. One can simply wrap the interpolation in a number of ways, including through frameworks. A prominent example is Django’s QuerySet
, which is lazy. Such support also can enjoy static type analysis.
In addition, PEP 750 provided additional analysis, thanks to the review from Jelle Zijlstra and reference implementation work by Dave Peck, to support annotation scope. !()
would presumably need this similar support for any class variables that use t-strings.