Introduce generic filters to filter auto-completion matches in rlcompleter

When a module imports other modules, auto-completion also completes imported modules. For instance import re then exposes re.functools in the suggestions. This is not a good idea as this would 1) misteach users that they can use that (and imported modules should never be used implicitly unless re-exported or stated otherwise) 2) one might wonder “is this a function?”

Instead of special-casing for imports, I want to introduce filters. We have two different matching strategies. One is for global names and one is for attributes. The subject we’re working on is different (in the first case, we use __getitem__, while in the second, we use getattr).

Now, before adding a match to the list of matches, we could decide whether it should be accepted or not. The idea is to use a similar approach to the logging filters (which I took inspiration from). For now, I’d like to make the interface CPython only.

Filtering submodules is achieved as follows:

class _FilterImport(AttributeFilter):
    """Filter imported modules.

    Modules imported by a module are filtered out if:

    - they are built-in modules imported by a non-built-in module,
    - they are non-built-in modules imported by a built-in module, or
    - they are modules imported from another package.

    Imported modules are always auto-completed if they are explicitly exported.

    The filter is conservative in the sense that if some condition cannot be
    verified due to the lack of some expected attribute, the imported module
    is still auto-completed.
    """

    def filter(self, instance, name, value, text, /, **options):
        import types

        if (
            # non-modules are not processed by this filter
            not isinstance(instance, types.ModuleType)
            # non-submodules are not processed by this filter
            or not isinstance(value, types.ModuleType)
        ):
            return True
        try:
            is_re_exported = name in instance.__all__
        except (AttributeError, NotImplementedError, TypeError, ValueError):
             # be conservative if __all__ is not a container of does not exist
            return True
        if is_re_exported:
            # imported modules explicitly re-exported are auto-completed
            return True
        return self.filter_imported_module(instance, value)

    def filter_imported_module(self, module, submodule):
        spec_mod = getattr(module, '__spec__', None)
        spec_sub = getattr(submodule, '__spec__', None)
        if spec_mod is None or spec_sub is None:
            # Be conservative for modules without '__spec__'.
            return True
        if self.is_builtin_module_spec(spec_mod):
            # Built-in modules should not re-export non-built-in modules.
            # XXX: What about custom interpreters and custom extensions?
            return self.is_builtin_module_spec(spec_sub)
        if self.is_builtin_module_spec(spec_sub):
            # Non-built-in modules should not re-export built-in modules.
            # XXX: What about custom interpreters and custom extensions?
            return False
        # Pure Python *packages* should auto-complete their submodules but not
        # their imported modules. Only modules in the same package are shown.
        modfile = getattr(module, '__file__', None)
        subfile = getattr(submodule, '__file__', None)
        if not modfile or not subfile:
            # conservative if we cannot determine the modules' __file__
            return True
        # importable submodules are also completed, even if they are private
        abs_modfile = os.path.abspath(modfile)
        abs_subfile = os.path.abspath(subfile)
        return abs_subfile.startswith(os.path.dirname(abs_modfile))

    def is_builtin_module_spec(self, spec):
        return getattr(spec, 'origin', None) == 'built-in'

where instance would be the object evaluated to instance.<TAB> and name would be a suggestion. I have a branch ready for a PR but I’d like to first hear others thoughts (maybe I missed something).

With filters

Without filters

Without filters, one can see that re.copyreg, re.enum and re.functools are added, which is not really a good idea.

PoC

Python’s builtin REPL in only one of many ways to explore a namespace (and probably used orders of magnitude less than say PyCharm’s completion) so this seems like an odd place to address namespace polution. Unless perhaps you’re hoping this will snowball into IDEs doing the same?

Could we just clean up the offending namespaces? i.e In the re module, replace import functools with import functools as _functools. A GitHub search indicates that people rarely trip up over this – in only two cases would this break something.

Sorry I should have been more precise: the abstract interface would be implemented on rlcompleter and _pyrepl would be able to implement its own filtering logic. Existing subclasses of rlcompleter.Completer would not be affected since they will use the default filtering logic (namely, nothing is filtered out).

It wouldn’t help in this case because rlcompleter also autocompletes private names.

I’ve never had/noticed this problem in my IDE, and I hope I never do.
Thanks for working on unseen maintenance :slight_smile:

Here’s a possible way to simplify the logic for users:

  • if the namespace has __all__, default autocompletion only looks at that. Otherwise, it works like today (filtering underscore-prefixed names)
  • a new shortcut (Ctrl+Tab?) uses all of dir (without even filtering underscore-prefixed names)

Adding a new shortcut (and advertising it to users) would probably be more work for pyrepl devs (sorry!), but I think it beats adding yet another “private/public” distinction.