Guidance on "importability" of installed modules?

ferdnyc · April 3, 2023, 12:41pm

I’m not really sure if this is the correct category, correct venue, or even correct approach to raising this question, but it felt at least not obviously in-appropriate to go this route.

Is there any… formal/official “guidance” available anywhere (perhaps in a FAQ / documentation, a mailing-list thread, a ranting screed someone crayoned to the door of a bathroom stall at a conference somewhere… whatever) as to how “importability” should be handled in Python modules? (Ones installed into the interpreter’s site-packages, particularly.)

Some background

I discovered to my dismay, recently, that the sphinx_theme_builder package contains modules (well, at least one module) that take an unusual approach to implementing dependency management. On import, the module sphinx_theme_builder._internal.cli.__init__.py will do a check for the (external, third-party) dependencies of its CLI mode, and if any are missing, it will display a very snazzily formatted, colorized error message…and then call sys.exit(1).

I discovered this in the course of doing a pydoc -k search for some keyword I don’t recall, when I noticed that my search was abruptly terminating with a non-zero exit code. But the same thing happens if you just run import sphinx_theme_builder._internal.cli in an interactive Python session — instant termination.

The good-faith attempt to address

That just doesn’t sit well with me. At least as a matter of opinion (something that, I stipulate, carries less that no weight), I feel that any and all modules installed into a Python distribution should be at least importable… importable, that is, without taking down the Python interpreter as a side-effect.

We certainly tolerate plenty of other, often ugly, side-effects from module imports, but I feel (again: meaningless) that a line worth drawing in the sand is: At no point should importing a module cause the interpreter to exit. Any module, under any circumstance.

So, I attempted to raise that concern with the sphinx_theme_builder developers, demonstrating with this REPL listing, verbatim:

$ python3
Python 3.11.2 (main, Feb  8 2023, 00:00:00) [GCC 12.2.1 20221121 (Red Hat 12.2.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sphinx_theme_builder._internal.cli
error: missing-command-line-dependencies

× Could not import a package that is required for the `stb` command line.
╰─> No module named 'sphinx_autobuild'

note: The optional cli dependencies of this package are missing.
hint: During installation, make sure to include the `[cli]`. For example:
      pip install "sphinx-theme-builder[cli]"
link: 
https://sphinx-theme-builder.rtfd.io/errors/#missing-command-line-dependencies

$

The response

In reply, a project maintainer posted a single, pointed question: “Why are you importing an _internal submodule directly?”

Not an unfair question, perhaps, but also sort of missing the point, since I’d mentioned previously (there as well as here) that, for one thing, this was affecting the ability of pydoc -k to fully scan the collection of installed modules.

The appeal to authority (this post)

Which is kind of where things are at. IMHO the fact that the module is internal is immaterial — it’s installed into Python as part of the package, and if nothing else there are tools in the broader ecosystem that rely on being able to import every installed module as part of their basic function.

And that’s before you factor in the best interests of curious users, inquisitive language-learners, and even potential sphinx_theme_builder contributors, all of whom might have valid reasons for experimentally importing a module, despite it being internal, despite it not being intended to be imported that way. “Simply because it’s there.”

…But other than my (stipulated: meaningless) arguments about what I believe, are there any resources I can point to that help make the case for “don’t exit on import”?

sinoroc · April 3, 2023, 12:59pm

This topic is out of scope for #packaging in my opinion. Maybe #users would be a better fit. If there is a discussion to have about side effects when importing a package or module, then it should be had with the whole Python community, it is not something specific to packaging.

My take, is that importing should avoid side effects as much as possible. Not all side effects are avoidable, not all side effects have an impact big enough to be worth avoiding. This particular example seems a bit hostile to me and the intended effect does not seem worth it, but I do not know the back story.

hroncok · April 3, 2023, 1:30pm

In this at least, I suppose raising ImportError is a nicer thing to do. Making the module call exit directly is indeed weird, but I don’t think an authority exists that could “ban” that.

abessman · April 3, 2023, 1:50pm

A CLI tool terminating early due to a missing dependence is uncontroversial. If that module is normally only imported by the CLI tool, I don’t think there is anything wrong with this design. The name sphinx_theme_builder._internal.cli implies this is the case.

If, on the other hand, it’s possible to trigger sys.exit() unexpectedly through the package’s public API, that’s clearly a bug.

pradyunsg · April 3, 2023, 2:05pm

Oh hai! I wasn’t expecting that this would end up on discuss.python.org.

My opinion on this is pretty much the same as what @abessman stated above. Beyond that, you’ll notice that the package uses objects from (some of) the optional dependencies in type annotations. I could use from __future__ import annotations to avoid that specific issue. Beyond that, this import check also ensures that users don’t end up in a situation where the user upgrades the package (which changes its dependencies) but does not install new optional dependencies; and then attempts to use the CLI (a user who did this in the past is me ).

From my POV, this is only used in the CLI, so it’s relatively safe to do this https://github.com/pradyunsg/sphinx-theme-builder/blob/0f72fe5eec5e381eb6499cfbee095c051fcd6d8a/pyproject.toml#L54

To build on what @hroncok has said, there’s no real “central authority” involved on these things. I did start by raising an importerror here, but it isn’t presented as cleanly as all the other error messages presented by the tool right now; and is inconsistent with the extensive effort I’d put in toward extensive and good-looking error messaging within this package (this was the experimentation ground where I drafted the improvements to pip’s error messaging FWIW) that guide users toward specific actions to resolve issues and give them sufficient context about the errors.

I’m open to discussion on doing this differently and that’s why I asked what I asked. IMO, the discussion has to start from what usecases/workflows something breaks, and whether it’s worth supporting them. If we take it from a philosophical direction, it’s gonna be much harder to get a solution if we don’t share the same philosophies.

pf_moore · April 3, 2023, 2:47pm

I don’t have much to add, other than to say that the use case given of pydoc -k is something I’d initially argue is exposing a design flaw in pydoc - importing modules can run arbitrary code for better or worse, and as such it can do things like exit the interpreter. If pydoc isn’t prepared to cope with that, it needs to document its limitations more clearly.

Maybe there’s a way that pydoc could co-operate with packages to determine what is safe to import (“pydoc won’t import modules named with an initial underscore” might be a reasonable starting point). But as a general rule, I’m inclined to say “don’t import arbitrary modules without knowing what they do” is reasonable advice, similar to “don’t run arbitrary untrusted code”.

kknechtel · April 3, 2023, 7:12pm

Two points:

Anything named with a single leading underscore, by convention, is intended as “private”. Especially if it has a name like _internal, and especially especially if you don’t see it mentioned in documentation that otherwise looks comprehensive. While it’s true that Python doesn’t have any compile-time concept of privacy, and that runtime privacy guards are trivially worked around - with great power comes great responsibility.
Every Python source file conceptually represents a module. Something has to be the entry point for a program. That means that not every module is going to be imported, and not every module is going to be designed to be imported.

Yes, the if __name__ == '__main__': idiom exists, and can prevent code from running when the file is imported as a module. However, this is… only important when it’s relevant. That is to say: why should we want to import the file in question? Certainly that code serves a documentation purpose in itself. I’ve also seen people use if __name__ == '__main__': around informal test code for a module that will normally be important and does not represent a normal entry point. (Please do strongly consider using a proper test harness, though.)

But for something that’s already known to be an internal implementation detail, there’s no reason to expect that importing it would expose any useful functionality. The only code that’s intended to import this file, if any, is elsewhere within sphinx_theme_builder. (For example, it might be used specifically by a wrapper script generated by the build process, as one of the “console entry points” described in setup.py or pyproject.toml.)

That’s by design.

On the contrary, it’s essential. They don’t want pydoc to generate documentation for that part, because the end user is not supposed to try to use it directly.

Contributors can modify the source code to work around the problem, since their contributions are going to consist of modifications to the source code anyway. Curious users and inquisitive language-learners are just not going to get the value out of this that you seem to expect. Again: suppose there weren’t such a check and the module imported successfully. Exactly what do you expect that you would be able to do with it, and why would you have this expectation? For pedagogical purposes, how is this useful above and beyond using the documented parts, reading the documentation, and learning the language fundamentals by following tutorials etc.?

Even this arguably has precedent in the standard library:

>>> import argparse
>>> argparse.ArgumentParser().parse_args('x')
usage: [-h]
: error: unrecognized arguments: x

and there the REPL quits. This is particularly annoying because the argparse documentation even claims:

Sometimes it may be useful to have an ArgumentParser parse arguments other than those of sys.argv. This can be accomplished by passing a list of strings to parse_args(). This is useful for testing at the interactive prompt:

(italic emphasis mine)

But I suppose that’s another discussion.

chepner · April 3, 2023, 10:17pm

Karl Knechtel:

Even this arguably has precedent in the standard library:
>>> import argparse
>>> argparse.ArgumentParser().parse_args('x')
usage: [-h]
: error: unrecognized arguments: x
and there the REPL quits.

That popped into my mind as well. However, there is a distinction to draw here: simply importing argparse cannot cause your script to exit. You have to call parse_args, and it is at least documented as being able to exit the interpreter:

When it encounters such an error, it exits and prints the error along with a usage message:

I, too, would prefer that it raise an exception (other than SystemExit) instead of calling sys.exit, but at least the caller has the option of catching it:

try:
    args = parser.parse_args()
except SystemExit:
    # try to recover
    ....

ferdnyc · April 4, 2023, 12:34pm

Yeah, sorry… y’all don’t know me, but for those who do, it’s no surprise that I kinda suck.

If it’s any consolation, the responses here have won me over to the idea that it’s overreach to mandate something like “don’t exit on import”. Better to solve it elsewhere.

I looked into the actual issue some more. The irony is that pydoc has a safeimport() which it uses for actual documentation lookup — a request for pydoc.render_doc('sphinx_theme_builder._internal.cli') won’t exit, because it uses safeimport():

>>> import pydoc
>>> pydoc.render_doc('sphinx_theme_builder._internal.cli')
error: missing-command-line-dependencies

× Could not import a package that is required for the `stb` command line.
╰─> No module named 'sphinx_autobuild'

note: The optional cli dependencies of this package are missing.
hint: During installation, make sure to include the `[cli]`. For example:
      pip install "sphinx-theme-builder[cli]"
link: 
https://sphinx-theme-builder.rtfd.io/errors/#missing-command-line-dependencies
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/sphinx_theme_builder/_internal/cli/__init__.py", line 18, in <module>
    import sphinx_autobuild  # type: ignore  # noqa
    ^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'sphinx_autobuild'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.11/pydoc.py", line 442, in safeimport
    module = __import__(path)
             ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/sphinx_theme_builder/_internal/cli/__init__.py", line 37, in <module>
    sys.exit(1)
SystemExit: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.11/pydoc.py", line 1758, in render_doc
    object, name = resolve(thing, forceload)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/pydoc.py", line 1742, in resolve
    object = locate(thing, forceload)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/pydoc.py", line 1719, in locate
    nextmodule = safeimport('.'.join(parts[:n+1]), forceload)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/pydoc.py", line 457, in safeimport
    raise ErrorDuringImport(path, sys.exc_info())
pydoc.ErrorDuringImport: problem in sphinx_theme_builder._internal.cli - SystemExit: 1
>>>

But the pydoc -k shell command uses pkgutil.walk_packages() to scan the available package set, and that doesn’t import safely. The function is documented as having to import every package (for subpackage discovery), and because the sys.exit() is in a package __init__.py, it gets triggered.

Still, pkgutil.walk_packages() could be made safe in the same manner that pydoc.safeimport() is safe. It would have to be done in pkgutil itself, though, because handling it later is too late:

# Will error out long before it reaches sphinx_theme_builder
packages = [p for p in pkgutil.walk_packages()]

# Will exit when it hits sphinx_theme_builder._internal.cli
def ignore(e): pass
packages = [p for p in pkgutil.walk_packages(onerror=ignore)]

# Won't exit, but hitting sphinx_theme_builder._internal.cli
# will kill the iterator
packages = pkgutil.walk_packages(onerror=ignore)
while True:
    try:
        p = next(packages)
        # Do something with p
    except SystemExit:
        continue
    except StopIteration:
        break

If pkgutil.walk_packages() explicitly catches SystemExit and forwards it along to the onerror handler instead of raising it, then the walk is able to continue. I’ll file an issue.

It feels a bit like adding a special case for just one package, but presumably there will be others in the future. (If not already.)

sinoroc · April 4, 2023, 12:58pm

Independent from this particular example, I feel the more general question deserves some attention.

I now recall that it was discussed quite recently – but indirectly – in the context of PEP 690 – Lazy Imports:

@ferdnyc, if you are really interested in this topic, you might find some elements of answers in those threads. But I doubt you will find anything like a “formal guidance”, I do not think such a thing exists.

ferdnyc · April 4, 2023, 1:08pm

Still, it can’t help but be informative, particularly ahead of reporting this as a pkgutil issue (a report I’m currently in the middle of editing) / trying to make pkgutil.walk_packages() safe. I’ll take a look and incorporate any relevant context, thanks!

(Edit: OK, I didn’t realize you were pointing me at discussions with a combined length of 292 comments.Thanks… I think? )

pradyunsg · April 4, 2023, 2:13pm

I, too, would prefer that it raise an exception (other than SystemExit) instead of calling sys.exit

sys.exit does raise SystemExit.

ferdnyc · April 4, 2023, 2:30pm

Right; that’s kind of the distinction that makes all the difference, to me. Plus, you can work around those by:

Setting add_help=False and implementing your own help, to avoid the exit-after-help issue
Calling parser.parse_known_args() instead of parse_args(), to avoid the exit-on-unknown-argument issue

But, it’s possible to avoid the exit-on-import, too, with:

try:
   __import__('sphinx_theme_builder._internal.cli')
except SystemExit:
   pass

…Which is basically what pydoc.safeimport() does (except, it raises its own ErrorDuringImport exception instead), and what I intend to propose be added to pkgutil.walk_packages() to implement similar protections.

pkgutil already does this:

        if info.ispkg:
            try:
                __import__(info.name)
            except ImportError:
                if onerror is not None:
                    onerror(info.name)
            except Exception:
                if onerror is not None:
                    onerror(info.name)
                else:
                    raise
            else:
                path = getattr(sys.modules[info.name], '__path__', None) or []

But that won’t protect against SystemExit, it needs to be caught explicitly.

ferdnyc · April 4, 2023, 2:39pm

Right, Clint was saying he’d prefer argparse.ArgumentParser.parse_args() raise an exception that’s not SystemExit, on encountering an unknown argument.