Safer logging methods for f-strings and new-style formatting

potiuk · June 18, 2022, 8:34pm

I really love the *f() idea. The “%” is indeed ugly and always when I see it looks like someone who does not know that we can now do better and nicers (i.e. somoene used to Python 2.7 or 3.5) wrote it. Of course it’s not true, but I think we should actually do everything to make it actually become true.

The ugliness is not the most important point though. There are is one more that is more important - context/ meta-data of the logs is lost when f-strings are used…

There are two important cases:

Local manipulation of the logs.

In Apache Airflow we are using “secret_masker”. This is a logging filter applied to the logs that finds out all potential secrets logged and replaces them with ***. Having f-strings, the only way we can actually do that is to get the “content” of secrets we want to mask and search the whole logging message to see if they are not in the logging message. This is SLOW and brittle. If we could use only debugf/infof style of messages, finding out if you are using a secret in one of your parameters, could be WAY faster - you could make (and we already do that as well) it much faster by analyzing just names of the parameters (password, secret, key, etc. etc. ) and “walking” through the usual structures you can expect (list, dicts etc). It would be possible for us to enforce debugf/infof use all over the code to make it happen.

Secret masking is just one example. There are other usages that might be more powerful locally.

But there is one, even more important reason why debugf/infof are better than f-strings.

Remote logging and log analysis meta-data.

Whenever your loggger uses some kind of external log aggregation/analysis (sentry, elasticsearch, cloud watch, Google Cloud logging, Sumo Logic, you name it ) then using f-strings looses all the metadata that are valuable, as well as does not alllow to easily aggregate those logs. Say you have similar log generated in few places: f"important-message: {param}" where param is “x”, and “y” respectively - there is no way it will be aggregated as the same message with differen params. Where infof(“important-message: {param}”, param=“x”) and infof(“important-message: {param}”, param=“y”) can not only be aggregated as “same message” but also you can know that the param was “x” and “y” respectively and your elasticsearch or other log system can store it as metadata and allow to query, aggregate, corrrelate and analyse.

Recently there is an important effort to standardise such telemetry - Open Telemetry. And logging is in “beta” there but clearly once it is out, it will become an important intermediary to send the logs to any of sentry/es/cloud loggings-). We are including OpenTelemetry in Airflow and we are already thinking how lack of logging meta-data might make it far less powerful.

I’d really love to see the infof/debugf idea to be accepted. Happy to help if needed.