PEP 750: Tag Strings For Writing Domain-Specific Languages

sirosen · October 20, 2024, 10:15am

This feels like a job for linters to catch. I would definitely add a rule for this to my linter.

As long as the AST contains this information, it can be done easily.

(I’m just "yes and…"ing your post!)

Really good point about dedent not being usable. I think that raises this from a “consistency” issue to an ergonomic issue which could impact users.

Overall an exciting PEP. I feel that logging will benefit in particular, but there are a variety of other use cases I’m looking forward to.

ncoghlan · October 20, 2024, 12:28pm

It isn’t literal f-strings that are the point of concern, it’s string variables that contain unescaped untrusted input (f-strings are just a likely way for that to happen). Only the following are categorically safe:

Implicit concatenation of template literals with each other
Implicit concatenation of template literals with static string literals (not f-strings)
Explicit concatenation of template literals (using + or +=)

Anything else poses a risk of untrusted user input ending up in an unescaped section of the resulting template.

Type checkers can mitigate that risk for the Template constructor by typing the inputs as literal strings or interpolation fields, rather than allowing untrusted strings for the text portions (runtime checks can’t tell the difference).

For concatenation though, we can entirely refuse the temptation to guess in the face of ambiguity and require that people explicitly write either some_template + t"additional literal text" or else some_template + t"{some_untrusted_var}" (assuming concatenation is allowed at all).

ncoghlan · October 21, 2024, 4:03am

When writing Feature Proposal: Multi-String Replacement Using a Dictionary in the .replace() Method - #10 by blhsing it occurred to me that it would be interesting if either string.Formatter or the literal Template type offered a way to dynamically convert runtime format strings to interpolation template instances.

The code to implement a basic version of that with only name lookups (and without dynamic field format definitions) isn’t overly complicated:

def template_from_format_map(
    fmt: LiteralString, values: Mapping[str, Any]
) -> Template:
    parser = string.Formatter()
    segments = []
    for prefix, name, field_fmt, field_conv in parser.parse(fmt):
        segments.append(prefix)
        if name is not None:
            field = Interpolation(values[name], name, field_conv, field_format)
            segments.append(field)
    return Template(*segments)

(The format string would be typed as LiteralString instead of str as a reminder that any static template content should always come from a trusted source, whether that’s an actual literal string, or something that has been explicitly cast to one)

Allowing the same level of dynamic reference flexibility as str.format and str.format_map is substantially more complicated.

Adding an actual string.Formatter.as_template() method would likely be confusing (given the ambiguity between the new Template literal type and string.Template), but I think we could unambiguously offer a string.Formatter.get_segments method that worked for both the default formatting and any custom subclasses (since it would only be calling subclass APIs that string.Formatter.vformat already calls - presumably _vformat would be redesigned to call self.get_segments instead of calling self.parse directly):

def get_segments(self,
    fmt: LiteralString, args: Sequence[Any], values: Mapping[str, Any]
) -> Sequence[LiteralString|Interpolation]:
    segments = []
    for prefix, value_ref, field_fmt, field_conv in self.parse(fmt):
        segments.append(prefix)
        if value_ref is not None:
            # This branch would do everything `_vformat` does, but
            # writing that out would make this example far too long
            value, _lookup_key = self.get_field(value_ref, args, values)           
            field = Interpolation(value, value_ref, field_conv, field_fmt)
            segments.append(field)
    return Template(*segments)

That way, any str.format_map call could be turned into an interpolation template instance by replacing:

formatted = pattern.format_map(values)

with:

segments = string.Formatter().get_segments(pattern, (), values)
template = Template(*segments)

Along similar lines, while it definitely doesn’t need to be in the initial design, we may also eventually want to add a replace_values method to template instances to make it easier to use statically defined templates for dynamic formatting tasks:

def replace_values(self, values: Sequence[Any]) -> Self:
    value_iter = iter(values)
    def update_segments():
        for segment in self.args:
            match segment:
                case str() as s:
                    updated = s
                case Interpolation(_, _, conv, fmt_spec):
                    value = next(value_iter)
                    updated = Interpolation(value, repr(value), conv, fmt_spec)
            yield updated
    return type(self)(*update_segments())

Tangent: string.Formatter.convert_field() may be worth mentioning in the PEP as the current dynamic stdlib implementation of the standard conversion specifiers (it’s an instance method rather than a static method, as it’s designed to allow Formatter subclasses to override it)

anentropic · October 21, 2024, 2:02pm

Yes please to this feature!

Especially if it helps IDEs and type-checkers to prompt and validate the formatting params.

Would love to see a related solution for deferred formatting cases (“lazy evaluation”) i.e. where you’d currently define a string with {x} vars and then later call format(...) on it.

The description under here, Approaches to Lazy Evaluation, doesn’t really capture the times when I want to use that approach.

For me it’s usually nothing to do with the interpolation being expensive to calculate, but rather a desire to define the template once (without the values in scope) and then reuse it in multiple places with different context values in scope.

e.g. I often define things like external API url template, or AWS CDK resource name template, as a constant in some central file like Django settings.py and then call format on that from various places.

I don’t love the idea of passing a separate callable for every var in the string. Also I am already a little uncertain about the behaviour of proposed workaround:

name = "World"
template = t"Hello {lambda: name}"

The example is too simple to illustrate the scoping rules for name, and allowing lambda here is a completely new feature - doing this in an f-string in 3.13 is a syntax error.

What I want is something like:

template = t"Hello {lambda: name}, you have {lambda: x} things"

def format_template(name: str, x: int) -> str:
    return template

I guess this is supposed to work in the current proposal but it’s not very clear from current wording.

If so then adding the lambdas is a quite verbose and boilerplate-y but workable.

Can I also use f-string qualifiers on the lambda return value? e.g. !r for repr etc

template = t"An error occurred: {lambda: error!r}"

def format_template(error: Exception) -> str:
    return template

If not then we no longer have parity with f-strings when using the proposed lazy evaluation workaround, which seems a shame.

dkp · October 21, 2024, 4:58pm

All,

Thank you for the continued feedback. We made another round of updates to the PEP, including:

Added a new top-level templatelib to house Template and Interpolation.
Added full support for both explicit and implicit concatenation. template+template, template+str, and str+template are all supported. Concatenation always results in a Template. In the end, we decided the arguments in favor of allowing concatenation outweighed the potential disadvantages. We’ve updated the “rejected ideas” section of the PEP to describe this.
Rewrote the “How to Teach” section.
Fixed several bugs and omissions that y’all caught (thank you).

See the documentation preview for the latest.

(I’ll also reply separately to specific comments above. Thanks again!)

dkp · October 21, 2024, 5:18pm

Sorry, this was a bug in the PEP. I’ve corrected this to template = t"Hello {(lambda: name)}" to match expected f-string syntax.

Depending on your needs, move your template inside format_template() and you’re good to go. (And keep in mind that return template returns a Template instance, not a str.)

This said, “true” laziness of the sort that you’re looking for is no longer supported in this PEP. (There’s some good discussion in this thread, particularly around issues with static analysis, that led to its rejection.)

dkp · October 21, 2024, 5:21pm

Yes, this is my future hope too! There’s a nod to it at the very end of the PEP: basically, the PEP doesn’t specify any mechanism to describe what “kind” of template a given template is, but we hope the tooling community will converge around a standard set of behaviors here.

dkp · October 21, 2024, 5:22pm

You seem to be looking at an outdated version of the PEP. See here for the latest:

dkp · October 21, 2024, 5:26pm

We’ve updated the PEP to describe Template and Interpolation as immutable.

dkp · October 21, 2024, 5:28pm

Ah, that’s a great use case. It fits nicely with the “structured logging” example in the PEP.

dkp · October 21, 2024, 5:30pm

Huzzah; we came to the same conclusion and updated the PEP. Yes, allowing concatenation may introduce a category of bugs, but (a) then again, those bugs exist with f-strings today, and (b) disallowing it seems likely to introduce more confusion than not.

Oh wow. Worlds collide. So glad to hear you’ve enjoyed it; thank you.

dkp · October 21, 2024, 5:34pm

I added a section about this. I could personally go either way (make conv part of Interpolation, or remove it and make Interpolation.value post-conversion). Overall, we landed on keeping it in.

I would like to see something like convert() find its way into the standard library, I think? Would hate for everyone to have to copy/pasta it.

I think this would be nice, too, and maybe solves where convert() should live…

AA-Turner · October 21, 2024, 6:12pm

The rationale for this is to ‘avoid polluting the types module with seemingly unrelated types’. I’m not sure that this holds though, as types is where types built in to the interpreter go, and t-strings are a new syntax construct.

If you have other reasons for not using types, fair enough, but the current justification feels a little weak to me.

A

MegaIng · October 21, 2024, 6:21pm

This shows a clear failure to understand the usecase proposed. No, this does not work for the kind of things people want lazy evaluation for. IMO, the proposal to use lambda: in template strings as shown in the PEP is fundamentally useless. Just say explicitly that lazy evaluation is not a support usecase, don’t propose non-functional workarounds.

pf_moore · October 21, 2024, 6:40pm

Apologies if I’m confusing two different things here. I feel like the whole idea of “laziness” is a minefield because people assume different things. So I’d like to ask what may be an obvious question, in the interests of being explicit. I suspect that to get what I’m asking would need “true laziness”, but I don’t think of it as laziness, just as “how string formatting works”.

In my head, template strings are conceptually similar to format strings, it’s just that they stop short of rendering to a string, leaving the rendering to a helper function. Is that a fair analogy? Given that you have an example of implementing f-strings with templates in the PEP, it seems close enough.

But that leaves me wondering - how do I implement the equivalent of str.format or str.format_map using templates? It feels like a very obvious thing to want to do. What I mean by this is:

template = t"Hello, {name} - welcome to {location}!"
print(my_format((template, name="Joe", location="paradise"))
print(my_format_map(template, {"name": "Joe", "location": "paradise"))

I guess that at least in part, the answer is “you don’t need a template for that, you just use a normal string and the format/format_map methods”. But that feels oddly unsettling, because templates are concrete objects that somehow encapsulate the formatting process, but then you can’t render them to strings with values that you calculate at rendering time.

As I say, it may be just that I’m misinterpreting what templates can or should be used for. And that’s fine, but it still suggests that there’s a potential for confusion here that needs to be faced, and dealt with, in the “how to teach this” section. The section currently says

At a glance, template strings look just like f-strings. Their syntax is familiar and the scoping rules remain the same.

That’s true, and it’s a good thing. But there’s also an “uncanny valley” type of effect, where they are enough like f-strings that people can assume more similarities than there actually are, and maybe the str.format{_map} analogy is an example of that?

MegaIng · October 21, 2024, 6:46pm

This is not possible (unless you count something like t"Hello, {'name'}"). It would have been possible with proper lazy templates or one of the suggests I made, but the community and PEP authors decided against it - as I said at the time, this means the value of these templates is very debatable IMO and I personally am probably not going to end up using it. I am weakly in favor of this PEP still in the hope of a later PEP fixing this mistake and allowing some kind of lazy evaluation/template filling in a clean way.

pf_moore · October 21, 2024, 6:46pm

I think that’s a bit harsh. It’s a reasonable compromise if you’re thinking in terms of lazy evaluation. But it doesn’t work if you’re thinking about something like str.format, which feels nothing like lazy evaluation.

What the PEP doesn’t do is address how we should think of str.format when looking at templates as analogous to partially-evaluated f-strings.

MegaIng · October 21, 2024, 6:48pm

IMO no, it’s not, and it wont end up being used. I am willing to bet money of this. The result of the PEP in it’s current form will be that lazy evaluation just is not a supported usecase - that is fine, but it’s IMO annoying that the PEP acts like they are supporting it.

dkp · October 21, 2024, 6:58pm

This use case you are thinking of is not supported by the PEP, as I mentioned explicitly in the next paragraph and as we describe in the “rejected ideas” section of the PEP.

The workaround is imperfect. We can say it’s no workaround whatsoever since the PEP doesn’t support the feature we want. But because “laziness” is a loaded term in this discussion – people can and have used it to mean quite different things even in this single thread – I want to make sure we’re covering all bases here.

Hope that clarifies things!

dkp · October 21, 2024, 7:03pm

I agree.

I think we need to at least mention the relationship (or lack thereof) somewhere in the spec.

I’m not sure where the right place to land here is.

For now, the answer is either use format/format_map and ignore t-strings, or take the unsatisfying (but all-too-common in other languages like JS, and probably in Python where f-strings are concerned) step of wrapping your template in a callable that takes as args the values you wish to interpolate.