Add support for t-strings in the stdlib logging library

Deferring the evaluation of log messages by passing a message template and args is recommended to avoid formatting messages until they’re actually emitted, as well as allow handlers to access the unformatted args.

Unfortunately, since the c-style % formatting is less capable (and less-loved) than f-strings and str.format(), many developers pass a single pre-formatted string to the logging calls instead of using a %-style template and arguments to format into it. This prevents the deferring of the formatting until output time, as well as doesn’t allow the arguments to be set on the log record for any of the handlers to use.

Extending the stdlib logging library to handle t-strings natively would make it easier to properly use all the features of the logging system without sacrificing the convenience that leads many developers to use f-strings instead.

More details available in the GitHub issue here: Add support for t-strings in the stdlib logging library · Issue #134394 · python/cpython · GitHub

10 Likes

T-strings do not defer evaluation. As soon as the code with the t-string is reached, the Template is created and evaluates its dynamic parts. This wouldn’t add any performance benefit to logging beyond a minor deferral of string conversion and concatenation.

2 Likes

The string conversion and concatenation is what I’m referring to by deferred evaluation. Instead of having to strinigify and format all the variables at call time, you only have to do it if and when the record is formatted. It’s one of the reasons why the existing template + args syntax is useful (besides preserving the unformatted args for the handlers to potentially use).

Consider a contrived example:

class ExpensiveRepr:
    def __repr__(self):
        time.sleep(5)
        return super().__repr__()

When logging out these objects:

>>> import logging, timeout
>>> logging.basicConfig(level=logging.ERROR)
>>>
>>> er = ExpensiveRepr()
>>>
>>> # current %s-style formatting (debug not enabled --> no repr --> fast)
>>> timeit.timeit("logging.debug('%s', er)", number=1, globals=globals())
2.4950015358626842e-05
>>> # proposed t-string method (debug not enabled --> no repr --> fast)
>>> timeit.timeit("logging.debug(t'{er}')", number=1, globals=globals())
3.0764000257477164e-05
>>> # f-string (repr is called regardless of log level --> slow)
>>> timeit.timeit("logging.debug(f'{er}')", number=1, globals=globals())
5.005317411007127
5 Likes

Simplest solution that I have seen for this:

class LazyString:
    def __init__(self, func):
        self.func = func
    def __str__(self):
        return str(self.func())
timeit.timeit("logging.debug(LazyString(ExpensiveRepr.__repr__))", number=1, globals=globals())
6.563961505889893e-06
2 Likes

f-strings are sufficiently fast that I just use them when logging. Especially if there’s > 10% chance that the line will be logged.

I do think it might be useful to have the expression values available.

6 Likes

I am +1000 in favor of the proposal! Frankly, I planned to propose it myself :slight_smile:

There would finally exist a way to use the modern {}-based formatting capabilities without resigning from the traditional logging resource-saving mechanism of deferred conversion and formatting.

Currently, having to use the old % formatting for logging everywhere just to make use of that standard logging mechanism, as in:

logger.debug('Using %r while x=%r', obj, x)
raise ValueError(f'wrong {x=}')

…is tedious.

Some people, therefore, tend to write:

logger.debug(f'Using {obj!r} while {x=}')
raise ValueError(f'wrong {x=}')

…but then they lose the benefit from that standard mechanism (which sometimes, depending on the logging configuration, e.g., prevents log entries from being suppressed based on detection of repeated occurrences of particular message templates; in the case of the system I develop it would result in highly increased counts of log messages and, consequently, higher license fees for using log storage/analysis systems…).

The proposed feature would make it easy to use the modern {}-based formatting capabilities consistently:

logger.debug(t'Using {obj!r} while {x=}')
raise ValueError(f'wrong {x=}')
5 Likes

See the style argument for logging formatters. the default is %, { has been there for a very long time for str.format semantics, as has $ for string.Template support. You can set the formatter for your loggers (I would suggest not doing this to the root logger though) to { and just use it as if you were using str.format, you don’t need t-string support to use braces here, and the extra intermediary object doesn’t serve a useful purpose.

Edit: to be clear, this isn’t the end of what’s required, the docs cover both this and some compatibility concerns that have kept it from being what is done by default that would apply to t-strings too

t-strings won’t play nicely with existing logging api, such as logging filters

As logging — Logging facility for Python — Python 3.13.3 documentation states:

This only applies to fmt and datefmt (e.g. '%(message)s' versus '{message}'), not to the actual log messages passed to the logging methods.

See also: Logging Cookbook — Python 3.13.3 documentation

There is a reason the modern {}-based format capabilities were not adopted as widely as one could assume they could be: backward (in)compatibility (with the lack of a transition path able to keep things both backward compatible and elegant+concise).

Extending the logging machinery to support t-strings could be the missing element which would make it possible.

2 Likes

To add to this, a concrete example:

>>> import logging
>>> logging.basicConfig(style="{", level=logging.INFO)
>>> logging.info("%s", "test")  # still works, even with style="{"
INFO:root:test
>>> logging.info("{}", "test")  # still fails
--- Logging error ---
Traceback (most recent call last):
  File ".../python3.13/logging/__init__.py", line 1150, in emit
    msg = self.format(record)
  File "/.../python3.13/logging/__init__.py", line 998, in format
    return fmt.format(record)
           ~~~~~~~~~~^^^^^^^^
  File "/.../python3.13/logging/__init__.py", line 711, in format
    record.message = record.getMessage()
                     ~~~~~~~~~~~~~~~~~^^
  File ".../python3.13/logging/__init__.py", line 400, in getMessage
    msg = msg % self.args
          ~~~~^~~~~~~~~~~
TypeError: not all arguments converted during string formatting
1 Like

Right, you have to modify the logrecord factory as well, but that’s pointed to there.

I’m not sure how you could ever extend the logging machinery by default to be compatible with t-strings without breaking all of the existing logging machinery in the wild that expects strings. t-strings are very intentionally not strings and separate from them.

The existing logging machinery expects strings to come back from LogRecord.getMessage(). With the proposed feature, this is still the case. The only thing this changes is that LogRecord.msg (which is typically the message template, ie. the "%s" when using logging.info("%s", "test")) is now a t-string instead of a normal template string. Furthermore, it should already be expected that the LogRecord.msg can be whatever random object the user provides, since the logging docs state that you can provide any arbitrary object there.

Therefore, unless something is improperly using LogRecord.msg or overriding getMessage(), there should be no changes required to any libraries.

The linked GitHub issue links to a proof of concept PR and the tstringlogger PyPI package that patches the functionality onto the existing stdlib logging. In both cases, I haven’t observed any breakage in using the logging interface, or in libraries that provide extra logging functionality.

3 Likes

This right here is the problem. Overriding getMessage is supported and explicitly documented, including what behaviors are expected. The only contract the user overriding has had is ensuring str() is called on the message and that % formatting still works. You’re adding a new requirement that t-string formatting still works to what all subclasses overriding this have to do (otherwise anyone overriding getMessage breaks t-string users who expected this to “just work”), which is a breaking change.

There’s a lot more that could be done with t-strings. Format strings are not limited to the f-string style, so you can do interesting things like:

from tstring import render
def hello(name):
    print(f"hello {name}")

def test_lazy():
    who = 'bob'
    flavor = 'spicy'
    embedx = t'Call function {hello:!fn} {who} {flavor}'
    who = 'jane'
    r = render(embedx)
    assert r ==  "Call function hello jane spicy"

tstring-util · PyPI

Or maybe
deferred = t"my message{‘symbol’:defer}" where symbol would not be evaluated until the logger actually needed to generate a record

This approach is a breaking change. This is better illustrated by the big comment block in collections.abc; Any time you have an API where users are expected to supply a subclass, additive behaviors that they need to continue to uphold for the ecosystem of that type to work are breaking changes.

You may still make this work by converting a t-string to a string + args somewhere else in the process without it breaking, and if you can do so, that would be an improvement for logging IMO

2 Likes

If code is expecting that str(record.msg) is a normal string template (and not a t-string), then unfortunately yes, this change would break that code.

I did a search on GitHub to get a sense for how widespread this is with: language:python (( /(?-i)def getMessage\(self\):/ AND /(?-i)str\(self.msg\)/ ) OR /(?-i)str\(record.msg\)/ ) NOT is:archived NOT is:fork NOT path:*/logging/__init__.py . This looks for str(self.msg) + getMessage() to catch subclasses that rely on it, or str(record.msg) to catch formatters/filterers using it. It also excludes forks and archived projects, and excludes /logging/__init__.py to exclude cases where the Python stdlib has been vendored into the project.

It returned ~1.1k instances at time of posting. Obviously this search isn’t perfect (and wouldn’t be a perfect indicator even if it was), but I think it’s useful to get a sense of how much code relies on this. From going through the first several pages, it seems like most of the results are legitimate usages that would break.

I guess this proposal would have to be gated behind a flag or something so that projects that control their entire logging stack (handlers, formatters, etc.) could take advantage of it, with the fallback being that the t-string is immediately converted to a normal string (with no args) in cases where that flag isn’t set. This would allow producers of logs to always be able to use t-strings, but allow consumers to opt-in to accepting them.

Not sure whether it eliminates the backward compatibility problem entirely, but for sure it minimizes its potential surface:

  • in contrast to the current version of the PR proposed by @pR0Ps, let’s do not touch the LogRecord.getMessage() method at all
  • instead, let’s modify only LogRecord.__init__() – somewhat along the following lines…
from string.templatelib import Template as TStringTemplate, Interpolation
...

class LogRecord:
    ...
    def __init__(self, name, level, pathname, lineno,
                 msg, args, exc_info, func=None, sinfo=None, **kwargs):
        """
        Initialize a logging record with interesting information.
        """
        ct = time.time_ns()
        self.name = name
        if isinstance(msg, TStringTemplate):
            if args:
                raise ValueError(
                    "no positional args allowed when t-string is used",
                )
            msg = _TStringMsgWrapper(msg)
        self.msg = msg
        ... # the rest of the `LogRecord` definition...

...

class _TStringMsgWrapper:

    def __init__(self, msg):
        self._msg = msg

    def __repr__(self):
        return f"{self.__class__.__qualname__}({self._msg!r})"

    def __str__(self):
        return "".join(
            (
                _str_formatter.format_field(
                    _str_formatter.convert_field(x.value, x.conversion),
                    x.format_spec,
                )
            ) if isinstance(x, Interpolation)
            else x
            for x in self._msg
        )

As an extension to this idea, if str(record.msg) was wrapped so that it ignored any args to format into it, this would allow for record.args to still be added in the same way as they are currently.

Implementation
def _tstring_to_logstring(t_string: TStringTemplate):
    # format a t-string as a log message (aka as if it was an f-string)
    return "".join(
        (
            _str_formatter.format_field(
                _str_formatter.convert_field(x.value, x.conversion),
                x.format_spec
            )
        ) if isinstance(x, Interpolation)
        else x
        for x in t_string
    )

class FormattedTString(str):
    # subclass of str that discards args when using %-based formatting
    def __mod__(self, _):
        return self

class TStringLogMsg:
    # t-string wrapper that returns a FormattedTString from __str__
    # note: would be better as a subclass, but TStringTemplates cannot be subclassed
    def __init__(self, msg):
        self._msg = msg
    def __repr__(self):
        return f"{self.__class__.__qualname__}({self._msg!r})"
    def __str__(self):
        return FormattedTString(_tstring_to_logstring(self._msg))

Then, in the LogRecord.__init__, extract the args (to make them available as record.args) and wrap the msg in a TStringLogMsg:

         self.name = name
-        self.msg = msg
+        if isinstance(msg, TStringTemplate):
+            args = (
+                {x.expression: x.value for x in msg if isinstance(x, Interpolation)},
+                *args
+            )
+            msg = TStringMsg(msg)
+        self.msg = msg
         #
         # The following statement allows passing of a dictionary as a sole

Result:

  • str(record.msg) returns a str object that can be used as a %-based template without breaking
  • str(record.msg) % record.args returns the formatted log message
  • Therefore no changes to getMessage are required, therefore no changes to subclasses that override it.

Downsides:

  • Probably too complex/unexpected to be a realistic option
  • Doesn’t handle cases where code modifies the record.msg (ie. record.msg = "prefix: " + str(record.msg)), though this could probably be worked around by making .msg a @property at the cost of yet more complexity

I like this proposal a lot, actually I though the lazy evaluation applied to logging was one of the main benefits of the t-strings, I’m surprised to see people thinking otherwise.

f-strings may be fast enough as mentioned here, but they are often used with expensive expressions inside that totally undermine their performance. The official python logging and also many linterns recommendation is to use “%” formatting and completely avoid “f-strings” for this precise reason.

From pylint:

logging-format-interpolation (W1202):

Use % formatting in logging functions and pass the % parameters as arguments Used when a logging statement has a call form of “logging.(format_string.format(format_args…))”. Such calls should use % formatting instead, but leave interpolation to the logging function by passing the parameters as arguments.

Python logging docs :

Optimization

Formatting of message arguments is deferred until it cannot be avoided. However, computing the arguments passed to the logging method can also be expensive, and you may want to avoid doing it if the logger will just throw away your event. To decide what to do, you can call the isEnabledFor() method which takes a level argument and returns true if the event would be created by the Logger for that level of call. You can write code like this:

if logger.isEnabledFor(logging.DEBUG):
    logger.debug('Message with %s, %s', expensive_func1(),
                                        expensive_func2())

Following this recommendations results in very ugly code in my opinion, and I would much rather write this instead:

logger.debug(t'Message with {expensive_func1()}, {expensive_func2()}')

But not this:

logger.debug(f'Message with {expensive_func1()}, {expensive_func2()}')

As it will evaluate both functions.

Being able to deactivate them by simply changing the logger level makes them usable in performance critical code (that is set to debug only). For now the only way of lazy evaluating is using it with the if statement or other methods mentioned that won’t look as nice as just swapping the f-string by a t-string.

As an additional benefit, using t-strings may fix the recurring issue of the programs that fail precisely evaluating the f-string during a log that is precisely there to catch a error. So the real error is masked by a f-string evaluation error and you don’t have the log of the real one.

For example this may fail:
logger.debug(f'Message with {expensive_func1()}, {func_that_may_fail()}')

While this may handle the issue gracefully:

logger.debug(t'Message with {expensive_func1()}, {func_that_may_fail()}')

Am I missing something about t-strings?

Template strings do not lazily evaluate interpolations.

3 Likes

Omg I if missed something, literally the key point, thanks for the clarification.

Now I don’t see the value of t-strings in logging, and also part of the value of t-strings just vanished for me, but I still see that they may have another applications.