Safer logging methods for f-strings and new-style formatting

ariebovenberg · February 19, 2022, 12:24pm

(I was advised to move this discussion from the Python bug tracker to here)

Logging f-strings is quite popular, but is potentially unsafe with the current logging methods. There are different ways to solve this problem – while keeping new-style formatting.

The problem with logging f-strings

Consider the examples below:

logger.info('look: %s', untrusted_string)                # OK
logger.info('look: %(foo)s', {'foo', untrusted_string})  # OK
logger.info(f'look: {untrusted_string}')                 # OK
logger.info(f'look: {untrusted_string}', some_dict)      # DANGER!

In the last example, untrusted_string will be interpreted as a string template by the logger.
If it has the value %(foo)999999999s (where foo is present in some_dict),
logging gets stuck trying to add over a gigabyte of whitespace to a string.
In other words: a Denial-of-Service attack.

PEP675 and `typing.overload` to the rescue

Assuming you wouldn’t want to ban logging of f-strings entirely (it is readable and often faster), you’d want the logging API to detect and disallow combining f-strings with logger arguments (i.e. the 4th line in the code sample above).

One way to do this is to use LiteralString (from PEP675) and typing.overload to define better logger methods (the debug variant show below):

# Case for handling any (potentially untrusted) string. 
# Additional arguments are explicitly forbidden.
@overload
def debugf(self, msg: str, exc_info=False):
    ...

# Case for performing formatting by the logger.
# Untrusted strings are explicitly forbidden.
@overload
def debugf(self, msg: LiteralString, *args, exc_info=False, **kwargs):
    ...

This new API would disallow dangerous situations. Anybody using a type checker or IDE would get notified.

Why the f suffix? Since changing the existing methods is off the table, f suffixes are a nice and common way to indicate it is formatting-related.

Big bonus: switching `logging` to the modern formatting style

An additional advantage to a new set of logger methods is that it’s possible to migrate away from the old %-style string formatting. The new logging methods could support:

logger.debugf("hello {planet}", planet='world')

Even better: the implementation for this already exists in the logging cookbook, so it’s already guaranteed to work idiomatically with the stdlib .

Additional links

Discussion #46200 on bugs.python.org

CAM-Gerlach · February 20, 2022, 3:51pm

I’m not a core dev, but I have seen a consistently high level of demand for more modern formatting options for logging (which appears to be a large driver of why users preformat in the first place), and this will only get worse as more and more users are much more familiar with .format() and f-strings than legacy printf-style formatting (if they’re familiar with the latter at all), and more features get added to format() and not to %, like PEP 682 (Signed Zero Formatting). Given the compat issues or other issues with alternate approaches, adding a new set of functions seems like the most obvious viable option, as has been discussed before.

As for the type safety issues, there has been a lot of reticence to add typing to the standard library, to avoid the perception of requiring contributors to use it or imposing any runtime burden on all Python users. Could you describe what effects, if any, your proposal would have on users not interested in using static typing? Could the typing.overload implementation that provides type safety be outsourced to a typeshed stub to defer these two concerns, while keeping the actual standard library runtime code fully compatible with it and thus still retaining its benefits for users that are using static typing (as it won’t benefit users that don’t use typecheckers anyway)?

ariebovenberg · February 20, 2022, 6:41pm

@CAM-Gerlach indeed I’m aware I’m not the first to suggest this – although I do think the security concerns were not highlighted before.

Regarding the type annotations, I expect they can be fully outsourced to stubs. The code change would be to wrap msg in BraceMessage (from the logging cookbook):

def debugf(self, msg, *args, exc_info=False, **kwargs):
    if self.isEnabledFor(DEBUG):
        self._log(DEBUG, BraceMessage(msg, args, kwargs), exc_info=exc_info)

I can already imagine two possible improvements to the example above:

We could omit the BraceMessage wrapper if no args or kwargs are given. This would improve performance for the f-string case.
We could wrap strings without args/kwargs in something like UnformattableString to ensure it will never be %-formatted, even if logging Handlers or Filters add arguments. This would provide some measure of safety even for non-typing users.

ericvsmith · February 20, 2022, 6:51pm

Just for the record, f-strings are faster than %-formatting, but %-formatting is faster than .format (mostly, I think, to looking up the name “format” on the string object).

So while I would probably use a logging interface that used .format for some of my logging, I’d probably still continue to use f-strings if I knew that the string was ultimately going to be logged, and if the string didn’t contain any user-provided data.

The big win for using .format is that in addition to str, float, int (and maybe some other built in types that I probably missed), you can provide format specifiers for other types, including user types. In particular, datetime is a huge win, and I assume means we won’t need the datefmt argument to formatters any more (though I haven’t dug in to this in detail).

ariebovenberg · February 20, 2022, 8:05pm

Some other things that crossed my mind:

Should the ‘old’ logging methods be deprecated? I think so. If not, I doubt any f-string users would make the switch on their own (becuase the use of old methods wouldn’t trigger typing errors).
Instead of overload, a separate method is possible: debugs() where the s-suffix stands for “static” (i.e. no formatting will be applied). An advantage of this would be slightly better performance (no args/kwargs checks) and additional safety for non-typing users. One could argue whether it’s better to have 2 explicitly separate methods for logging with/without formatting.

vsajip1 · February 26, 2022, 8:13am

I think there’s not a sufficient reason to deprecate the existing methods. There’s a large body of existing third-party code using them, which might never be changed, nor need to change.

ariebovenberg · February 26, 2022, 2:32pm

@vsajip1 I definitely agree that removal is not a good idea.

With “deprecated” I meant to mark the old methods as ‘superseded’, not to remove them. Similar to the approach taken with superseded modules.

gpshead · March 19, 2022, 1:37am

I don’t like the idea of new logging methods in the logging module or on logger instances that look similar to the original ones. logging.infof(f'The {foo} barred') is easier to mistype as info and is repetitive to read or write with too many f's in it. Adding parallel methods here just adds confusion and adds forever APIs…

Lets assume PEP-675, and do the legwork to see how feasible it is to just define the existing logging APIs as that str with no args vs LiteralString with args. Code being statically analyzed by a type checker is generally modern, maintained, and amenable to being updated when checkers point out design flaws. So I predict there won’t be an issue with “breaking” existing code (it’s an analysis time failure only unless someone seriously uses a runtime type checker everywhere - but even those could be taught) as anything type checked that is found to be constructing format strings to pass to logging functions likely has a genuine red flag worthy issue.

gpshead · March 19, 2022, 2:09am

If we want a genuinely new API for logging literals, we could go a bit towards the wild side and do something I’m not convinced will go over well among Pythonistas:

    LOG(INFO) << f"The {foo} barred"
    LOG(DEBUG) << f"Some complicated {computed()} reason."
    LOG(ERROR) << "Nobody expects the Spanish Inquisition!"

That syntax inspired loosely by the Google C++ logging framework’s API. But while it could be implemented today, would not be efficient as written. Thus I wouldn’t propose it as is.

What we’d want for a new style of syntax to support logging is at least a way to allow such a construct to prevent the f-str or other parameter computation from ever happening when the log level won’t be emitted. The equivalent of this kind of logic in a single statement if we were to assume that LOG(level) for example raised if level wasn’t configured go anywhere:

try:
    LOG(DEBUG) << f"A very {verbose_slow()} thing."
except DiscardedLogError:
    pass

which may be better thought of as:

if DEBUG not in logger.ignored_levels:
    logger.log(DEBUG, f"...")

logically equivalent regardless of how this got implemented. The point being that doing that kind of smarts would require new, efficient for logging’s purpose, Python internals tricks. Because you need all of that magic hidden by a simple easy to read & write line syntax - never requiring conditionals of exceptions directly.

There’s another desire I hear at work among logging users: To see the string prior to formatting for aggregation of the same log messages across processes potentially running different versions of the code, so regardless of file and line number data which they already have. % formatting allows this today so they’ve been using it as such. We currently have no way for an original raw f-str source to be captured and exposed to a lower level API. Hacky extremely slow inspect based possibilities exist, but nobody should accept that pain.

This is an area where enhancing Python itself to expose an f-str prior to str rendering computation happens would be an interesting feature. It’d either need to expose it somehow, or save state and allow deferring the f-str rendering work. Much more than what lamda: f"{thing} says {name}" would provide.

Lots of possibilities to think about if we don’t assume it must fit within CPython as we know it today.

From a practicality beats purity standpoint, I expect PEP-675 enforced LiteralString on top of logging APIs as they are may allow some big users to move on and adopt f-str using logging as is. But that obviously cannot satisfy all needs and is going to leave some complaining.

I’m just trying to seed ideas…

smontanaro · March 19, 2022, 2:46am

Probably a dumb idea, but thinking out of the box, what you really want is for the f-string evaluation to be deferred until the logger method decide it wants that. If you had a d-string syntax, the bytecode compiler could effectively construct a function which is passed to the logger method and called when required. So, when the bytecode compiler sees something like

logger.info(d"A very {verbose_slow()} thing.")

it would pass a zero arg lambda function to logger.info:

logger.info(lambda: f"A very {verbose_slow()} thing.")

It would recognize the message is a function and call it instead of treating it like a string.

It’s definitely not well thought out, but I thought I’d toss it out there. Just in case it makes even a smidgen of sense.

ariebovenberg · March 19, 2022, 10:52am

@gpshead interesting! I do see how separate logging methods would be confusing – especially if you’d just want to log f-strings.

Fixing this only in the type stubs makes sense, but note that this would cause a contradiction with the logging docs. It is currently mentioned that any object with __str__ may be logged (link).
I’d be fine with deprecating this behavior in the type stubs, while leaving the functionality intact so nobody’s code breaks.

@smontanaro lazily evaluated f-strings would solve a lot of problems here indeed. Makes sense to me.

gpshead · March 19, 2022, 7:57pm

Yep, it’d be breaking to do that in the stdlib itself so I wouldn’t suggest that. Just that type checkers could optionally be taught to treat the logging logger functions in a special manner. There is definitely code depending on being able to pass any object with a __str__ into logging methods. And no doubt some of it wants to pass type checkers.

pf_moore · March 19, 2022, 7:59pm

That seems dangerous, there’s a number of other logging frameworks on PyPI, how would they get access to the same capabilities? (Or are you suggesting that they wouldn’t, and the stdlib logger would be in a privileged position?)

gpshead · March 19, 2022, 8:00pm

The new string prefix (d-str or whatnot) for deferring their computation has come up before as well. I don’t love the idea of growing a our plethora of string prefix letters. Adding another one is a cognitive burden, teaching burden, chance for a mistake, or annoying thing to remember to change when refactoring between APIs. Though a case can be made.

The semantics of

info(d"A very {verbose_slow()} {thing}.")

vs the semantics of today’s

info("A very %s %s.", verbose_slow(), thing)

are challenging. if the call to verbose_slow() is deferred… that is a potentially major change in side effects. Sure, people shouldn’t write code that relies on side effects of things being called to generate a log message. But someone will. Possibly without knowing or realizing it. Quite nasty to debug.

info(lambda: f"A {verbose_slow()} {thing}")

Doesn’t accurately capture what we’d want a d-str to do. Some context to show why:

def func(thing):
    for suffix in range(3):
        thing += str(suffix)
        info(lambda: f"A {thing} goes here")

What that does depends on when the deferred rendering call happens. If not rendered before info() returns, all renderings would contain the final iteration’s thing. We tell people not to use outer loop variables in nested functions for a reason. Would a d-str face the same fate of a reference to the local namespace that could change? Or does a d-str perform name lookups at creation time and carry those with it?

I’d lean towards the latter. Making the lambda analogy incorrect. d-str at that point is presumably a str like class. You could implement a poor performing one in Python today as a proof of concept probably looking like this:

class DeferredFStr(str):
    def __init__(self, *args, **kwargs):
        super.__init__(*args, **kwargs)
        # ... extract {expressions} from our value ...
        expressions = [ ... ]
        # ... extract all names from those expressions ...
        names = [ ... ]
        # ... populate our own dict of names -> values from our callers frame ...
        # (gross inspect or co_frame traversal code?)
        self._ns = { ... }

    def __str__(self):
        return exec(f"f{super()!r}", locals=self._ns, globals=self._ns)

    def __repr__(self):
        return f"d{super()!r}"

Using a subclass for that may be a bad idea. Subclassing str both opens doors while probably causes awkward bugs. A container is safer, but more annoying to get existing type checked code demanding a str rather than something with a __str__ method to accept. Regardless, an actual implementation that does not vacuum all performance out of the program would be in C and need to be tied in with existing f-str parsing internals as that is something that shouldn’t be duplicated, meaning this would be a language level thing rather than mere extension defined type level.

At which point you can instead ask for other language level things like knowing ahead of time if a f-str statement is going to be consumed or thrown away so you can avoid computing it entirely, as well as a way to preserve an unrendered f-str along with its rendered value when used with whatever new thing allows that. How that could or should look is still something my mind hasn’t worked out.

gpshead · March 19, 2022, 8:10pm

type checkers aren’t in the stdlib so this wouldn’t be either, it’d be a feature entirely up to the typing community and type checkers. I suspect an overloaded (msg: LiteralString) + (msg: str, *args: Sequence) annotation on the logging methods is not feasible to be the typeshed default logging.pyi stub behavior.

Testing that concept for feasibility in huge codebases using a variety of type checkers will be interesting future data.

smontanaro · March 19, 2022, 8:41pm

gpshead:

info(lambda: f"A {verbose_slow()} {thing}")
Doesn’t accurately capture what we’d want a d-str to do. Some context to show why:
def func(thing):
    for suffix in range(3):
        thing += str(suffix)
        info(lambda: f"A {thing} goes here")
What that does depends on when the deferred rendering call happens. If not rendered before info() returns, all renderings would contain the final iteration’s thing.

Hmmm… I was only thinking about its application to the logging example (where presumably it’s evaluated during the info(...) call or discarded), but I can see that deferring indefinitely could produce strange results. Given that so-called d-strings don’t yet exist, I suppose semantics are in the eye of the beholder. You could say, “Use it or lose it, buddy. The results of storing it beyond the return of the potentially calling function are undefined.”

ericvsmith · March 19, 2022, 10:13pm

PEP 501 is all about delayed f-strings, and mentions logging.

asvetlov · March 19, 2022, 10:43pm

5 cents. It would be nice to restore the unsubstituted logging template in the logging handler.

At least https://sentry.io/ groups logs by the log message.
The following calls are grouped, shown as two events for the same sentry issue, etc.

log.info("Log %s", 1)
log.info("Log %s", 2)

In opposite, if log arguments are substituted, the Sentry processes it as different issues. The following is not auto-grouped by default:

log.info("Log 1")
log.info("Log 2")

It leads to ineffective UX and expensive usage costs IIRC.

The doc describes the behavior: Grouping Issues | Sentry Documentation

EpicWink · March 20, 2022, 12:02am

As someone who manages an analysis system which collectively has tends of thousands of log-call lines (and an approximate 10 million calls per day), it’s my experience that almost all expensive logs have multiple lines (statements) before the log to construct the log data (eg mean, relative values, etc) anyway, and are best optimised through if __debug__

For the actual logs, all our custom __str__/__repr__ methods are cheap to call (as much as possible, effectively slotted dataclasses). The most effort goes into swapping out NumPy array __str__/__repr__ to something simple (eg dtype, shape, memory address)

potiuk · June 18, 2022, 8:34pm

I really love the *f() idea. The “%” is indeed ugly and always when I see it looks like someone who does not know that we can now do better and nicers (i.e. somoene used to Python 2.7 or 3.5) wrote it. Of course it’s not true, but I think we should actually do everything to make it actually become true.

The ugliness is not the most important point though. There are is one more that is more important - context/ meta-data of the logs is lost when f-strings are used…

There are two important cases:

Local manipulation of the logs.

In Apache Airflow we are using “secret_masker”. This is a logging filter applied to the logs that finds out all potential secrets logged and replaces them with ***. Having f-strings, the only way we can actually do that is to get the “content” of secrets we want to mask and search the whole logging message to see if they are not in the logging message. This is SLOW and brittle. If we could use only debugf/infof style of messages, finding out if you are using a secret in one of your parameters, could be WAY faster - you could make (and we already do that as well) it much faster by analyzing just names of the parameters (password, secret, key, etc. etc. ) and “walking” through the usual structures you can expect (list, dicts etc). It would be possible for us to enforce debugf/infof use all over the code to make it happen.

Secret masking is just one example. There are other usages that might be more powerful locally.

But there is one, even more important reason why debugf/infof are better than f-strings.

Remote logging and log analysis meta-data.

Whenever your loggger uses some kind of external log aggregation/analysis (sentry, elasticsearch, cloud watch, Google Cloud logging, Sumo Logic, you name it ) then using f-strings looses all the metadata that are valuable, as well as does not alllow to easily aggregate those logs. Say you have similar log generated in few places: f"important-message: {param}" where param is “x”, and “y” respectively - there is no way it will be aggregated as the same message with differen params. Where infof(“important-message: {param}”, param=“x”) and infof(“important-message: {param}”, param=“y”) can not only be aggregated as “same message” but also you can know that the param was “x” and “y” respectively and your elasticsearch or other log system can store it as metadata and allow to query, aggregate, corrrelate and analyse.

Recently there is an important effort to standardise such telemetry - Open Telemetry. And logging is in “beta” there but clearly once it is out, it will become an important intermediary to send the logs to any of sentry/es/cloud loggings-). We are including OpenTelemetry in Airflow and we are already thinking how lack of logging meta-data might make it far less powerful.

I’d really love to see the infof/debugf idea to be accepted. Happy to help if needed.

Safer logging methods for f-strings and new-style formatting

The problem with logging f-strings

PEP675 and typing.overload to the rescue

Big bonus: switching logging to the modern formatting style

Additional links

PEP675 and `typing.overload` to the rescue

Big bonus: switching `logging` to the modern formatting style