PEP750: Template Strings (new updates)

dkp · November 18, 2024, 5:44pm

Hi all,

Thank you again for all the helpful feedback on PEP 750! We’ve just posted a PR with a set of updates based on that feedback.

You can read the updated PEP 750 here.

Some key updates include:

Introduction of convenience accessors on Template including strings, interpolations, and values
Discussion of how t-strings and old-style format strings do and don’t relate, along with new example code to take an old-style format string and convert it into a Template
Many smaller bug fixes and improvements (see the PR for details)

zhangyx · November 18, 2024, 5:49pm

The link lands me at PEP0, perhaps that is not what you intended?

dkp · November 18, 2024, 5:50pm

Sorry, fixed now.

effigies · November 18, 2024, 7:01pm

This looks great! +1 on the PEP.

Two comments:

I think .args as an interleaved list is an implementation detail that should not be baked into the PEP.

For example, a template processor could cache the static parts of the template and only reprocess the dynamic parts when the template is evaluated with different values.

This seems like it is now redundant with .strings, which would allow you to keep it as an implementation detail and not encourage users to depend on it or require other implementations to maintain compatibility.
The rejection of tb-strings is done in-passing, without justification:

Like f-strings, t-strings may not be combined with the b or u prefixes.

For reference, PEP 498 said:

For the same reason that we don’t support bytes.format(), you may not combine 'f' with 'b' string literals. The primary problem is that an object’s __format__() method may return Unicode data that is not compatible with a bytes string.

This restriction does not apply to a template, since it is the function the template is passed to that needs to decide what to do with it. I’m assuming that you don’t want to complicate this PEP by actually defining a Template[bytes], but it would be nice to see this explicitly declared out-of-scope or in Rejected Ideas (and my preference would be out-of-scope, since it seems like a natural extension in the future).

steve.dower · November 18, 2024, 10:56pm

I would gladly take a struct.pack(tb"{x:<I}{y:<h}") whenever

dkp · November 19, 2024, 6:55pm

This is a good catch, thank you!

We’ve updated the PEP to describe tb as out-of-scope, but something to consider for a future PEP. (Steve’s struct.pack() example is compelling.)

Yes; we should revisit this. I created a GitHub issue to track until we update the PEP. Thanks!

zhangyx · November 19, 2024, 7:16pm

Overall this proposal looks great. I like it!

For the “structured logging” example, it appears to me that in order to comply with a predefined structure, users have to name their local variables exactly identical to what’s expected by the structure. Could there be a bit more flexibility (to allow specifying a different name for a slot)?

In addition, a slightly off-topic question: In case a logger expects a specific type for a specific key, is there a way to support static type checking inside templates? e.g. If a structured logger expects type(timestamp) == float, is there a way to warn user when they pass in a timestamp variable of type str?

pauleveritt · November 19, 2024, 9:20pm

That’s an interesting point about slots. Short answer: no. t-strings are a step up from f-strings and as such, just use normal scope rules.

The encapsulation you’re looking for is functions that act like components, mediating the input/output for a t-string. Here’s an example from a predecessor project.

methane · November 19, 2024, 11:08pm

Structured logging allows developers to log data in both a human-readable format and a structured format (like JSON) using only a single logging call.

I don’t like this paragraph. This paragraph is talking about having both of plaintext log AND structured logging.

Structured logging is not about human readability. For example, OTLP Event/LogRecord is serialized in protobuf that is not human readable.

Some other examples:

Structured logging is the process of producing, transmitting, and storing log messages in a format that’s easily machine-readable, such as JSON. The main advantage here is that by ensuring logs are structured consistently, you’ll get faster and more accurate automated processing and analysis.

Python Structured Logging | New Relic

Writing logs in a machine-readable format ensures the log can be leveraged by software.

https://www.loggly.com/use-cases/what-is-structured-logging-and-how-to-use-it/

Structured logging involves recording log events in a well-defined, machine-readable format such as JSON. Instead of writing log messages as plain text, log data is organized into key-value pairs, making it easier to search, filter, and analyze.

Why Structured Logging is Fundamental to Observability | Better Stack Community

This is why I dislike “Structured logging allows developers to log data in both a human-readable format and …”

Additionally, mixing plaintext log and JSON log in one line doesn’t seem structured logging well.
JSON/XML/PROTOBUF/etc parsers doesn’t understahd the >>> separater.

You can write plaintext log into Console and send structured log to log transfer agent in one log call. But mixing two style log into one string is not good practice. It is not good for example.

dkp · November 19, 2024, 11:33pm

We chose this approach because the venerable Python logging cookbook — part of the official Python docs — already takes it: its default “structured logging” example emits a single string with a human readable portion, a separator (>>>>), and a JSON-structured section.

I think the second “approach” to structured logging in PEP is nicer in that it cleanly separates human-readable from structured output; devs can ignore the human-readable stuff entirely if they want.

“Structured logging allows developers to log data in both a human-readable format and…”

I could see altering this sentence to “Structured logging allows developers to log data in a machine-readable format”, but the reason we opted for its current wording was mostly to keep in line with the existing cookbook example.

methane · November 20, 2024, 12:05am

Example code contains more than the definition about what is structured logs.
We should not change “what is structured log” definition only to keep in line with the example. It confuse readers. I don’t want make Python users to use technical word in wrong way.

By the way, it seems the cookbook is bit old. Observability has spread at a very rapid pace over the past 5 years. Along with that, best practices for structlog have also become established.

Common practice for structlog puts human readable log message in the struct log fields, like {"message": "message 1", "snowman": "\u2603", "set_value": [1, 2, 3]}.

dkp · November 20, 2024, 12:16am

Neither do I!

I just updated the paragraph. It now reads:

Structured logging allows developers to log data in machine-readable
formats like JSON. With t-strings, developers can easily log structured data
alongside human-readable messages using just a single log statement.

Does that seem like a good improvement?

effigies · November 20, 2024, 12:24am

Thanks for considering the tb-string case, and I’m glad you also feel that it’s a plausible extension! One point that I brought up in passing in the (closed) GitHub thread that I want to re-raise:

I think the implementation of a tb-string could very easily be close enough to what you have proposed here that, instead of being two separate types Template and BytesTemplate, it would make sense to use Template[str] and Template[bytes]. In that case, it might save one future headache by renaming .strings to .literals, which will be less jarring to have type tuple[bytes, ...].

Possible type signature

class Template[T: str | bytes]:
    args: tuple[T | Interpolation, ...]

    def __init__(self, *args: T | Interpolation):
        ...

    @property
    def literals(self) -> tuple[T, ...]:
        ...

    @property
    def interpolations(self) -> tuple[Interpolation, ...]:
        ...

    @property
    def values(self) -> tuple[object, ...]:
        ...

In any case, thanks again for this PEP! IMO this is the most exciting development since match expressions.

methane · November 20, 2024, 3:43am

I will update the logging HOWTO doc.

anentropic · November 20, 2024, 1:12pm

direct link to relevant anchor point: PEP 750 – Template Strings | peps.python.org

in this example:

def from_format(fmt: str, /, *args: object, **kwargs: object) -> Template:
    """Parse `fmt` and return a `Template` instance."""
    ...

I wondered how such an implementation could be realised, but it seems like string.Formatter().parse output will give the raw material needed to instantiate a Template in such a way that the examples here (“Interleaving of Template.args”) make sense…?

dkp · November 20, 2024, 5:43pm

Yes indeed!

You can find an implementation of from_format() in the pep750 examples repo.

There are a bunch of tests; I hope it’s a fairly complete implementation.

pgjones · November 21, 2024, 4:05pm

I’d like to express my support as I would like to use template strings to build SQL queries and I’ve recently written SQL-tString in anticipation. This library currently supports,

from sql_tstring import sql

a = 2
query, values = sql("SELECT a, b, c FROM tbl WHERE a = {a}", locals())
assert query == "SELECT a, b, c FROM tbl WHERE a = ?"
assert values == [2]

With this PEP it can become,

query, values = sql(t"SELECT a, b, c FROM tbl WHERE a = {a}")

Which will be much easier to use and explain as well as actually being a supported syntax rather than a “hack” of sorts as it is now.

What makes SQL-tString useful to me is that it also accepts values that rewrite the query. For example the special value Absent which will result in the expression (and potentially clause if empty) to be absent (removed) from the resultant query,

from sql_tstring import Absent, sql

a = Absent
query, values = sql("SELECT a, b, c FROM tbl WHERE a = {a} AND b = 2", locals())
assert query == "SELECT a, b, c FROM tbl WHERE b = 2"
assert values == []

I find this technique much better than any of the existing tools I’ve found that build SQL as it requires writing SQL rather than a pseudo SQL language.

TabAtkins · November 21, 2024, 7:36pm

I definitely appreciate the new convenience accessors, but I found it puzzling that one was still lacking - a way to get the value of an interpolation, after conv and fmt_spec have been applied. Multiple examples in the spec end up having to reuse the f() convenience function to handle the formatting; this strongly suggests that authors will have to write this convenience function and use it most times as well.

It is the normal case that you’ll want to apply those in every string interpolation; fancier interpolations that do something unusual will (based on JS experience) likely be the exception. When those fancier interpolations do occur, they’ll be part of a more complex interpolation anyway, and those authors can more easily eat the complexity of understanding which value to use. Requiring authors of simpler interpolations that just produce strings to remember to apply conv and fmt_spec every time is, I think, asking for those specifiers to just not work by default.

My suggestion would be to move the current .value property to a less-conveniently-named property, like .unformatted_value, and then let .value be the automatically-formatted version. In the absence of conv or fmt_spec, this would continue to be the object value that the expression evaluated to, but when either of those are specified, it would become a str, formatted appropriately.

ruro · November 23, 2024, 2:27pm

Can you include more rationale / reasoning for making the debug specifier ({foo=}) bake into the string? It seems a bit counterintuitive given that !r/s/a don’t get “baked into” the value.

Also, it’s not 100% clear how the debug specifier interacts with Interpolation.expr (is the equals sign included? is the whitespace before the equals sign included?).

Why not include the equals sign (plus trailing whitespace) as Interpolation.debug or something and make it the responsibility of the formatting code to deal with it.

This would allow stuff along the lines of

foo = {"key": "value", "other": 123}
assert yaml(t"""
blah:
    {foo=}
""") == """
blah:
    foo:
        key: "value"
        other: 123
"""

efimov-mikhail · November 24, 2024, 4:30pm

In general, it looks like a very good addition to the Python language. Thank you for this PEP!

I’m agree. Interleaving, odd length of args, concatenation of the last string of template1 and the first string of template2 all sound like implementation details. Maybe those can be removed from the PEP text?

Moreover, I’m not sure that direct indexing is a best interface for end users of templates.