Preserve unpacked `extra` in LogRecord

adriangb · May 9, 2022, 6:29pm

The logging module has some support for structured logs via the extra parameter (logger.info(..., extra={"foo": "bar"})), but it is handicapped by the unpacking of extra into LogRecord.__dict__.

Two issues this leads to:

Every library needs to write code to separate user defined attributes from built-in ones (eg: python-json-logger or opentelmetry-sdk)
Keys that are not valid Python identifiers still get upacked, which is in general not great.

Since a large fraction of logging is now structured and meant to ingestion into various logging services (vs. being read from the console by a human), good support for structured logs is more important than ever.

I think the easiest step forward would be to check if there is an "__extra__" (or something like this) key and, if not, safe a reference to whatever the user passed in.
Then filters / formatters / etc. could access LogRecord.__extra__ without logic to separate out attributes and without having to access LogRecord.__dict__ directly to support non-valid identifiers.

vsajip · June 20, 2022, 9:43am

What about if you pass extra={'structured_data': {'foo': 'bar'}}, for instance? Then a Formatter can certainly format stuff however it wants, and doesn’t need to work at separating out built-in attributes from user-specified ones.

adriangb · June 20, 2022, 2:23pm

I’ve used the approach before. There’s two issues you run into:

We all have to agree on structured_data as the key. If you switch logging formaters, you have to replace every logging call.
It adds extra boilerplate to every logging call (and I’d argue the extra= is already quite boilerplatey) and instantiation of a dict (small but non-zero performance penalty).

vsajip · June 20, 2022, 3:33pm

I don’t know to what extent these are showstoppers. When it comes to structured logging, people use different libraries anyway (e.g. structlog), so the ship has perhaps sailed to standardize anything, anyway, outside of a specific group of collaborators who can agree a common approach. And the logging cookbook describes yet another approach to structured logging. As to performance, returning LogRecords from filters is probably a bigger performance hit than instantiating a dictionary You know what they say about premature optimization, right?

adriangb · June 20, 2022, 4:43pm

I think the first point (coming up with an agreed upon key that is not codified anywhere) is kind of a show stopper because it makes it very difficult to build any sort of ecosystem / interoperability. Sure it will work within a group of collaborators / one org but not for the wider ecosystem.

I don’t think it’s fair to point at another relatively unrelated change made to justify a solution that requires an extra instantiation. That said, I do fundamentally agree with you: the creation of an extra dict is not argument enough for a new feature, at least from a performance standpoint.

I have to disagree on this. I think we should be able to evolve the standard library to continue to provide value into the future. That’s why I focused a large part of the feature request on the increasing popularity of structured logging. There is a long precedent for the standard library learning from external modules, and I think this is a situation where we should do just that. For example, structlog allows the syntax logger.info("msg", user="foo"). I would greatly prefer that to the extra= syntax but since that is not possible I am trying to find a way to get as close as we can to what structlog provides without breaking existing use cases. In particular, I think that LogRecord.extra (or some other name) would make life a lot easier for python-json-logger or opentelemetry-sdk to process and interact with structured log records.

vsajip · June 20, 2022, 5:04pm

That wasn’t my point - I was just kidding, hence the The serious point is that the performance cost of an instantiation is really not something to be worried about, given that logging is generally “fast enough”. But you get that, as you say as much in the same paragraph.

Notice that any change has to be backwards-compatible, and that is the challenge. Also, I am not really sure to what extent people are relying on the stdlib adding features in this area. I’m not saying that a lot of logging isn’t structured - I know it is - just that it isn’t clear there’s a burning need to make improvements in this area. Is there some sort of forum where structured logging people hang out where this can be discussed in a focused way? If so, have you or anyone else started a discussion about this which I can look at? This forum might be too general.

adriangb · June 20, 2022, 5:24pm

I don’t think there is such a forum, at least not that I know of. Maybe we can include @hynek in this discussion?

I admit I haven’t done any sort of polling in this area, but I’d think that given popularity of structured logging and the existence of libraries like python-json-logger (which has ~ the same number of as structlog) there is a desire from the community to be able to do structured logging within the standard library.

Agreed! This is going to be the hard part. Do you think that adding LogRecord.extra would be backwards compatible? It would look something like (somewhere in LogRecord or Logger):

if "extra" not in extra:
    extra["extra"] = extra

aathan · June 20, 2022, 7:57pm

Adrian, you may want to read this thread as well: LogRecord construction: Why is extra not passed? - #12 by aathan

Like you, I would like to see LogRecord construction be more flexible than the current system – specifically, that LogRecord creation at the logging site should accessible to users of the logging framework in ways that are on par with the logging framework itself.

In fact, the suggestion I made would allow you to do what you want.

Also I completely agree with your sentiments about improving stdlib and Python’s logging module. Structured logging is an important modern facility. Rust’s Tracing crate may be of interest there.

aathan · June 20, 2022, 8:06pm

PS: In case you have not already done so, a way to achieve your goal is to monkeypatch Logger.makeRecord() or by subclassing Logger. If you want me to I’ll post a snippet; but I’m pretty sure you’ve thought of that.

In general it seems to me that instead of putting the “extra” handling in Logger.makeRecord() this should be left to the LogRecord factory or its dunder init perhaps with the inclusion of a utility method to do the current dunder dict key/value copying. Then, libraries could construct LogRecords as they see fit and be able to do so in the context of the actual logging callsite.

adriangb · June 28, 2022, 8:18pm

@vsajip just checking back in on this, I think this is the key question here

vsajip · June 28, 2022, 9:11pm

I’ve been away and not been able to spend any time on this, but I hope to have some free time next week.

adriangb · June 28, 2022, 9:22pm

No rush, sorry to ping you while you were away

Topic		Replies	Views
LogRecord construction: Why is extra not passed? Ideas	12	799	June 20, 2022
Logging attributes standardization Ideas	19	1401	July 19, 2023
Diagnostic logging: reinventing the wheel? Python Help	3	352	November 23, 2023
Wrong documentation of logging.setLogRecordFactory? Documentation	2	490	January 13, 2023
F-string, maintainability vs pep8 Python Help	8	796	September 3, 2023

Preserve unpacked `extra` in LogRecord

Related Topics