LogRecord construction: Why is extra not passed?

aathan · June 20, 2022, 8:24am

I’m rather frustrated by the fact that Logger’s level funcs (def debug/warning/info/etc() → def log() → def _log()) does not handle **kwargs more loosely, so that those are passed to the LogRecord factory; which is a problem because extra is also not passed to the factory. This leaves the factory with no opportunity to receive extra information, from which to scaffold itself, directly from log.levelfunc(…) calls during construction. Logger.makeRecord() could pass extra to the factory, leaving it up to the LogRecord class init to set its own attributes. What motivates the current design? Why not forward **kwargs more permisively, at least?

vsajip · June 20, 2022, 9:21am

What exactly is frustrating about it? What is the exact use case you are experiencing problems in?

aathan · June 20, 2022, 9:58am

Let me restate/repeat perhaps more clearly. I fired that last message off a bit huridely:

If you look at the implementation of Logger.makeRecord(), it does not pass the parameter called “extra” to the LogRecord factory. Rather, it takes it upon itself to set the dunder dict members of LogRecord from extra. An alternative design would be to pass extra to the logrecord factory and have it be responsible for processing it.

Further, any **kwargs that get passed to Logger.debug() or Logger.info() etc are suppressed on their way through the Logger implementation.

This means that a logging facility enhancement that relies on LogRecord to perform some scafolding of itself, using extra information passed on the logging line, is not possible. In other words, LogRecord itself has no chance to receive information directly from a logging invocation.

Allowing **kwargs through to the LogRecord factory, or at least passing extra, would be helpful in this regard.

Perhaps there are good reasons for the current design that does neither of those things, so I was wondering if they could be articulated.

Monkeypatching makeRecord is an option, but should that really be resorted to?

A usecase for this is in order to allow log.warning(…, color=red) or for log.info(…, a=1,b=2,…) where additional arguments to the logging function are preserved by the LogRecord according to their natural expression on the logging line (vs enclosure in extra). But even if enclosure in extra is a desired design feature, at least allowing extra to be processed by the LogRecord factory would be nice. The contents of “extra” or forward **kwargs could then be used to trigger preparation of data enclosures in the LogRecord by its factory. Instead the current design forces all such enclosures to be done at the log.info(…) invocation itself. Example, the existing design makes something like log.info(…,save_context=True) difficult (where what context is saved by save_context is defined by the LogRecord factory).

vsajip · June 20, 2022, 10:26am

There are many ways of designing things. I designed things the way they are because it seemed a reasonable way of doing it at the time. As for aesthetics, those are personal and subjective. You can’t please everyone, so there’s no point trying.

This seems to be an aesthetic argument. The arguments are available, whether under extra or not.

The logging package is meant for logging, not general-purpose data processing. So your use case, while it might be theoretically something you might want, doesn’t seem to really be needed in logging, and the lack of the design you want doesn’t appear to have hampered usage of logging in the many years since it was introduced. Perhaps you can explain if something you want to do in logging isn’t possible with the current design.

Backward compatibility is taken seriously in Python. Random changes to designs which work, but don’t fit specific people’s tastes, aren’t really an option. I’m all ears if something functionally can’t be done with logging because of a design flaw, or if there is an actual bug. I realise this is the “Ideas” category, and it’s fine for you to bring up ideas here, but bear in mind that any idea which requires backwards-incompatible changes isn’t going to fly, and even ideas which don’t break backward compatibility might not fly if stdlib maintainers don’t feel comfortable with them. (I’ve been maintaining logging for almost 20 years, and accepted plenty of ideas and rejected plenty of ideas over that time - feel free to look at issues and PRs related to logging to see what makes the cut and what doesn’t.)

aathan · June 20, 2022, 11:02am

I appreciate your time as an expert who can cut through the fog. I’ll have a look at the PRs as you suggest as time allows.

This seems to be an aesthetic argument.

It is not meant to be an aesthetic argument. I ran into specific difficulties, the remedy of which required me to monkeypatch Logger.makeRecord() so that it would pass the information it has to the LogRecord factory. I was not able to immediately come up with an alternative design.

In other words, I came here specifically because I’ve found it surprisingly difficult to do something that seemed at first to be simple. But maybe I’m exaggerating a bit: The monkeypatching route worked fine. In that sense, perhaps the aesthetics are whether you intend for the Logging package to require that (or subclassing Logger, and monkeypatching the entire class; or referencing MyLogger everywhere in configs); or perhaps I’ve over looked a simpler approach to achieve my goals.

By the way: Let me also clarify that I’m not criticizing. I’m starting from the assumption that, not being Python logging guru, I’m unaware of something (such as some specific Pythonic or historical reason, or methodology).

The arguments are available, whether under extra or not.

Given your clear expertise in this area, I’m wondering if we’re misunderstanding each other because I don’t believe this is an accurate statement.

While I would agree “args” gets passed to the logrecord factory, what does not get passed all the way through to it are either the additional kwargs to the logging function, nor "extra’ itself. Passing those through would provide the opportunity for LogRecord to construct itself using hints passed through those parameters. Perhaps your design assumes this shouldn’t be done, and is strictly a function of Logger.

not general-purpose data processing

Totally. I am seeking to centralize the inclusion into log records (for logging purposes, not for data processing purposes) additional contextual information declaratively; both in terms of log data and routing/display hints for the log record.

Anyway, the short form issue is: Neither **kwargs nor “extra” are passed to the LogRecord factory but it seems the could be. Assuming backwards compatibility could be addressed, would you reject such an enhancement to Logger outright?

vsajip · June 20, 2022, 11:23am

Aren’t there cookbook examples showing how to get context-sensitive information, such as is passed in extra, into log output? What specifically couldn’t you do? Leave aside the detail of whether something is passed into a particular function or not: I’m saying that you can pass arbitrary information into a logging call and have it appear in logging output in a format controlled by you - isn’t that the case?

aathan · June 20, 2022, 4:15pm

setLogRecordFactory() is hamstrung for no good reason.
This should be easy to understand.
If you do not understand that fact, that’s unfortunate for Python.

Not every desirable instrumentation of a LogRecord that must be done at its creation point consists of hanging a scalar known at the log.info() etc callsite. Every other mechanism for achieving that goal could much more easily be achieved via setLogRecordFactory(), which (by its very name) is specifically intended for this act.

EDIT: One last possibly illuminating comment: Sometimes the data you pass into log.info() is not itself, directly, the data you want to log, but is instead information meant to modulate the behavior of the logging system itself and/or meant to signal the need to marshal additional data from disparate points inconvenient at the callsite itself. loglevel is a good example of such a datum. LogRecord is the one cookie that is guaranteed to pass through all loggers and all handlers from the callsite forward. Sometimes that instrumentation depends on the other args/content of the logging call, and must be done at the callsite (not later, in say a Handler or Formatter). This can be for externalities such as being in an async environment. Something like extra=dict(additionalStuff=makeAdditionalStuff()) is not only unnecessarily wordy but not enough (because makeAdditionalStuff() doesn’t see the rest of the parameters and cannot act on the LogRecord itself).

Because neither kwargs nor “extra” are passed to the log record factory there is no mechanism to signal the factory other than via other arguments (such as args), all of which already have other purposes.

If you still don’t get it, one day I’ll submit a PR that you can reject.

vsajip · June 20, 2022, 5:12pm

I asked a simple question:

You don’t appear to have answered this question.

loglevel is a particular case because it is the key mechanism for determining verbosity. That aside, I don’t agree that the other things are in scope for logging. It’s certainly not the case that logging’s behaviour is intended to be modulated by all manner of user-defined data passed into a logging call. There are existing mechanisms for modulation: levels, filters, adapters.

What would be the point? First answer the question about what can’t be done (in a specific real-world scenario, not a theoretical/hypothetical situation) with things the way they are now. Not “can’t do it the way I’d like to”, but “can’t do it”. Otherwise, it’s an aesthetic difference of opinion rather than a functional flaw we’re talking about.

aathan · June 20, 2022, 6:50pm

First answer the question about what can’t be done [without hypotheticals]

None of this is (or has been) hypothetical.

Think gevent. IO pre-emption.

Leaf loggers a.b.c.d.1 and a.b.c.d.2 and a.b.c.e.3 are created and assigned based on dynamic behaviors of the system. Certain handlers that can be hung off the logger at b care about (debugging, tracing, security, etc) information available at 1, 2, 3 and other leafy areas of the system. Not all handlers at b exist during all runs or at all times.

Clearly the code in 1,2,3 should not be encumbered by the act of gathering the data b’s handler wants to log when it happens to be attached to the system. However, the code in 1,2,3 certainly knows that a particular log site is a certain kind of logsite and can declare that (similar to stack_info=True). The information that should be gathered depends on *args.

log.warning(msg,*args, interesting_place=True)

b’s filters won’t see the LogRecord until b is reached, possibly after lots of other handlers and filters run. In an asynchronous environment, sometimes this will be too late to capture system-wide contextual information available at the moment of the logging call.

Hanging filters off of d, e, 1, 2 or 3 because the handler at b exists is possible, but cumbersome and breaks encapsulation. Due to async, this would anyway not be ideal. Signaling the LogRecord factory via bitfields in the logevel or via elements of *args is possible ~. LoggingAdapters do not change the shape of the problem, they are only sugar and knowing to create them suffers from the same encapsulation issue. LoggingAdapter.process() doesn’t get *args.

~ please don’t argue not wanting to do that is an aesthetic issue. We’re all adults here.

Meanwhile, setLogRecordFactory() is available. The example for its use talks about chaining. Clearly it will be invoked universally at just the right time. What an awesomely powerful tool that allows capturing information system-wide at every log() site!

But woops, for no apparent reason, it can’ see either extra or kwargs.

I’m saying that you can pass arbitrary information into a logging call and have it appear in logging output in a format controlled by you - isn’t that the case?

You don’t appear to have answered this question.

I’ll answer you following this paragraph.

Likewise, you have not answered me as to why it is a bad idea to pass extra and or kwargs to the LogRecord factory. Before you continue to ask me questions, please answer that one simple question too. Why is it a bad idea for LogRecord factory to receive either/both kwargs and extra? Not hypothetically. Not by resorting to backwards compatibility (because compatibility is solvable).

As far as your question goes: Yes you can pass information in the logging call that then appears in the logging output. Based on the example at the top I’m sure you can understand why that’s only tangentially relevant.

That aside, I don’t agree that the other things are in scope for logging. It’s certainly not the case that logging’s behaviour is intended to be modulated by all manner of user-defined data passed into a logging call. There are existing mechanisms for modulation: levels, filters, adapters.

The very fact that levels, filters and adapters exist is counterfactual to your important opinion as the maintainer. They specifically act on all manner of user-defined data passed into a logging call. It is precisely the nature of all of those features to act on arbitrary elements of what’s passed into logging calls.

Anyway, I would ask you to consider simply: Why not my suggestion? and Doesn’t it make setLogRecordFactory() better?

TL;DR I’m pointing out a hole in the existing mechanisms. More specifically, I’m pointing out that there is an existing mechanism (setLogRecordFactory()) that is trivially hamstrung for apparently no good reason. We shouldn’t need pages and pages of debate on that point.

Thanks for building the module.

vsajip · June 20, 2022, 7:22pm

OK, thanks for the detailed explanation of your use case.

It’s not necessarily a bad idea to do that. However, it doesn’t work that way at the moment, so a reason has to be provided which justifies the change. As maintainer, I have to be comfortable with maintaining it (and any other change) for the foreseeable future. Your use case is (on the face of it) so off the mainstream, recommended way of using logging that I can’t yet see the need to modify how the stdlib works to accommodate that use case. Specifically, adding and removing handlers continuously during program execution is not a recommended practice, as it makes system behaviour hard to reason about (all the more so in a multi-threaded or async environment).

I don’t understand what this means. Obviously, I intended levels, filters etc. as defined mechanisms to modulate system behaviour. That doesn’t mean that other mechanisms aren’t technically possible, but that also doesn’t mean that other mechanisms have to be provided. If the existing mechanisms can’t do something, that’s different.

I’m not sure I can. Because I wouldn’t configure logging the way you seem to have (with handlers coming and going), I can’t tell whether there is an XY problem here or something else.

I tend to be fairly conservative about changes, because of the ongoing maintenance responsibility: I have to be sure that any change won’t cause me headaches further down the line.

aathan · June 20, 2022, 7:44pm

I implore you not to be so dismissive. The important features of the use case I described is squarely aimed at and addressed by (but for the extant simple omission) the LogHandler factory functionality in the Logging package.

I did not describe “continuously” changing handlers during a run (though, attaching a handler when a debug condition is set can happen) what I described instead is a declaratively configurable system that for particular runs, or under specific conditions, has the configuration I described.

The key element in my description is not the insertion of the handler at b, but only that it exists and that there are leaf loggers the log site of which that shouldn’t care about its existence.

Dynamic creation of leaf logger paths happens in many many mainstream environments, such as when specific worker threads or request handlers log to a logger named according to the inbound connection. flask.server.component.customer.

I’m sure I don’t need to remind you that Python’s Logging module, being in stdlib, is often used as part of other foundational frameworks (as in this case).

Such frameworks and libraries may be used in production (where a.b’s handler is not configured), tracing/staging (where a.b’s handler is configured), and dev (where even more handlers at a.b.c.d etc are configured). They may run in customer environments (where certain data shouldn’t be gathered) or in provider environments (where that data is safe to gather). Etc.

So, and with all due respect: It seems to me you are trying very hard to make a pretzel out of a simple suggestion maybe because of the frustrated tone of my initial 2am posting.

As you stated, passing at least extra if not kwargs to the LogRecord factory is not necessarily a bad idea. In fact, you should go so far as to say it is a good idea where a very small code change would yield metric ton of utility

All the best.

aathan · June 20, 2022, 7:51pm

Hmm, not sure if you deleted my final reply or the system is about to duplicate it.

I did not describe a system that dynamically and continuously reconfigures loggers.

Rather, I described a system that is itself configurable with respect to logging (i.e., on that uses the Logging system’s configuration capabilities).

Many mainstream systems create leaf loggers on the basis of context (e.g., request handlers in an RPC environement) so that specific customer logs are grouped and segregated.

Meanwhile the handlers attached closer to the top of the logging tree may be configured based on the environment (production, testing etc).

So, please don’t be dismissive of such a simple change that has so much utility; particulary when it’s likely to be a 2-3 line change with very small error surface.

All the best.

davidism · June 20, 2022, 8:31pm

Too heated