-X importtrace to supplement -X importtime for loaded modules

Hi! I sent this proposal to the Python Ideas mailing and was redirected here, so apologies (and thanks) to anyone who’s already read this :slight_smile:

I’d like to propose an adjacent interpreter flag to -X importtime: -X importtrace (open to alternative naming).

While -X importtime is incredibly useful for analyzing module import times, by design, it doesn’t log anything if an imported module has already been loaded. -X importtrace would provide additional output for every module that’s already been loaded:

>>> import uuid
import time: cached    | cached     |   _io
import time: cached    | cached     |   _io
import time: cached    | cached     |   os
import time: cached    | cached     |   sys
import time: cached    | cached     |   enum
import time: cached    | cached     |     _io
import time: cached    | cached     |     _io
import time: cached    | cached     |     collections
import time: cached    | cached     |     os
import time: cached    | cached     |     re
import time: cached    | cached     |     sys
import time: cached    | cached     |     functools
import time: cached    | cached     |     itertools
import time:       151 |        151 |     _wmi
import time:     18290 |      18440 |   platform
import time:       372 |        372 |   _uuid
import time:     10955 |      29766 | uuid

In codebases with convoluted/poorly managed import graphs (and consequently, workloads that suffer from long import times), the ability to record all paths to an expensive dependency–not just the first-imported–can help expedite refactoring (and help scale identification of this type of issue). More generally, this flag would provide a more efficient path to tracking runtime dependencies.

As a proof of concept, I was able to hack this functionality into -X importtime by adding a couple lines to import_ensure_initialized in Python/import.c (hence the output above). A separate flag is probably desirable to preserve backwards compatibility–maybe -X importtrace would only show cached imports and you’d supply both to get the full output for maximum flexibility?

Looking forward to your feedback,
Noah

2 Likes

I’ve had situations where this would have been very handy; given the implementation is quite simple, I’d support adding it.

Over on the mailing list @methane suggested -X importtime=2 to avoid needing a second -X flag; I like that a lot as well, since this is a minor extension of importtime; it’s like importtime verbosity level 2.

I’ve started drafting the PR and there are two small issues with this approach:

  1. Users might already be spelling the option -X importtime=true (or with any string). since the value is currently unchecked. This would technically break backwards compatibility; we’re retroactively constraining valid option usage.

  2. It’s unclear how to validate the corresponding environment value PYTHONPROFILEIMPORTTIME: should we error if this isn’t “1” or “2”? Should we only enable cached import tracing if the value is “2”?

While I initially thought augmenting this flag was the best approach, I think adding a new one could markedly simplify these issues :frowning:

Since the docs don’t say that -X importtime=<anything> is valid, it seems that changing the semantics is somewhat defensible. To be concrete:

  • -X importtime or -X importtime=1 for the current behavior
  • -X importtime=2 for the new, “show already loaded” behavior
  • -X importtime=<anything else> is an error

I am less sure what would be reasonable for the environment variable, though. The docs currently say

If this environment variable is set to a non-empty string, Python will show how long each import takes.

Retroactively making this only support "1" or "2" seems aggressive, but maybe it’s fine? Other suggestions:

  • "2" is special, but any other value gives the old behavior
  • A separate environment variable

We can provide extended behavior for a special value (e.g. 2) without constraining the only valid values to 1 or 2. Since this is a debug-only tool, I don’t think it’s a big deal if someone who for some reason is already setting it to 2 now gets some additional extended output. But I don’t think we should make values other than 1 or 2 an error. This may seem weird, but practically I think it’s fine and a reasonable concession to backward compatibility.

2 Likes