Add a protected decorator to typing

kkpattern · July 12, 2024, 4:42pm

Sorry if this has already been discussed before. I would like to propose adding a protected decorator to typing.

Use case

Currently, we can already express a method as protected by naming it with a _ prefix. However, we cannot distinguish a “public” protected method(a stable API) from an “internal” protected method in this way.

Say, I’m a library author writing a remote job system. I want the library users to inherit a JobBase class to implement their jobs. To do some extra initialization, users can override the _post_init method. The method is named_post_init because it’s a protected method. We don’t want users to call it, otherwise the instance may be initialized incorrectly. The JobBase also has a method named _cleanup to do some internal clean up after the job is done. But we don’t want the users to override the _cleanup method because it’s an “internal” method, meaning it may be refactored away in the future version of the library. Also, we want to avoid the users overriding this method without calling the super()._cleanup(), which will cause some resource leaks.

The users can not easily tell which of these two methods is “public” and which is “internal” because they both start with the _.

A real-world example is that mkdocstrings filters out all methods that start with a _ by default. If we want it to generate docs for the protected methods, we have to remove the filter. But then all the “internal” methods will also be included in the generated docs, which is not ideal.

Similar to the final decorator, we can add a protected decorator to typing. Then we can write the JobBase like this:

from typing import protected

class JobBase:
    @protected
    def post_init(self) -> None:
        """Override this method to do some extra initialization."""
        pass

    def _cleanup(self) -> None:
        # User-land code should not override or call this method.

Why not use `final`

An “internal” protected method is not necessarily a final method. The library itself may override it in a subclass of JobBase like:

class ProcessJobBase(JobBase):
    def _cleanup(self) -> None:
        ...

The library authors can safely refactor all the _cleanup methods away as long as no user-land code overrides them.

Why not use private method

The same as above. An “internal” protected method is not necessarily a private method. Also, some developers may not used to name all internal methods with the double _ prefix.

Implementation

Like the final decorator, we can add a __protected__ attribute to the wrapped method. The type checker can issue an error if a protected method is called externally.

Is this idea worth a PEP? If so I’d happy to prepare one and try to do a reference mypy implementation. Thanks.

erictraut · July 12, 2024, 5:34pm

As you indicated, there’s an existing convention whereby methods with a single underscore are interpreted as protected. Pyright (and pylance, the language server built on pyright) honor this convention and enforce protected semantics if you enable the reportPrivateUsage diagnostic check. Protected methods can be overridden, referenced, and called by subclasses, but if they are referenced by code outside of the class or its subclasses, a diagnostic is reported.

Here’s what this looks like in practice. Code sample in pyright playground.

Are you proposing that this existing behavior should be modified? This behavior is pretty well established, so I think it would be disruptive to change it.

Or perhaps you’re saying that the existing conventions and behaviors are OK, but you want to support another way to mark methods as protected even if their names don’t begin with an underscore? If so, would the semantics be the same as the existing conventions? If so, does this proposal provide some new utility beyond what’s available today with the underscore convention?

You mentioned the mkdocstrings filter as a motivation. I’m not familiar with mkdocstrings, but is there a way to filter based on attributes other than the name? If so, you could perhaps use a custom decorator to add an attribute to the methods that you want to consider “protected but documented” versus those that should be “protected but not documented”.

Do you have any other real-world examples for why it would be desirable to mark methods as protected even if their name doesn’t reflect that fact?

I’ll note that in languages that include a protected keyword, it’s common for linters to enforce a rule that all protected methods must be named starting with an underscore.

kkpattern · July 12, 2024, 6:03pm

Or perhaps you’re saying that the existing conventions and behaviors are OK, but you want to support another way to mark methods as protected even if their names don’t begin with an underscore? If so, would the semantics be the same as the existing conventions? If so, does this proposal provide some new utility beyond what’s available today with the underscore convention?

Yes. I think the existing conventions are OK. I want another way to mark methods are protected even if their names don’t begin with an underscore mainly because I want to express that “this method is protected but also a stable API that won’t be refactored away without a major version bump”.

Do you have any other real-world examples for why it would be desirable to mark methods as protected even if their name doesn’t reflect that fact?

I came up with this idea when maintaining an internal library inside our team. For example, I have a base class in the library:

class UIViewBase:
    def _do_refresh(self):
        self._in_refresh = True
        self._refresh()
        self._in_refresh = False

    def _refresh(self):
        pass

The user should override the _refresh method to implement the custom UI update logic. They should never override the _do_refresh method because it’s internal, it can be refactored away in the future. Both _refresh and _do_refresh are protected and shouldn’t be called outside.

However, users cannot easily tell which is the internal API and which one is the stable/public API.

If we can mark a method as protected even if its name doesn’t start with an underscore. Then the above code can be written as:

class UIViewBase:
    def _do_refresh(self):
        self._in_refresh = True
        self.refresh()
        self._in_refresh = False

    @protected
    def refresh(self):
        pass

Then, we can tell the library users to never override a method with an underscore prefix name.

Jelle · July 12, 2024, 6:24pm

It sounds like you want a concept of “protected” that allows a method to be overridden in external code, but not called in user code. That could be added, but I’m not sure the complexity is worth it; there’s a lot of potential for variations.

MegaIng · July 12, 2024, 6:59pm

You could potentially use name mangling here and call the method the internal method __do_refresh with two underscores. This would IMO pretty strongly signal that this method should not be called nor overwritten by subclasses, but it’s drawback is that runtime inspection would potentially be confusing.

kkpattern · July 12, 2024, 7:27pm

We don’t want to prevent all subclasses from overriding these internal methods. The subclasses from the library itself can override them. For example, a ListViewBase class may override the _do_refresh method to do some special optimization.

We don’t mind subclasses inside the library to override these methods because if we need to refactor the _do_refresh(add a new positional parameter for example), we can make sure we refactored every single overrided _do_refresh.

But we cannot change the user-land code, so we want to make sure no user-land code override them.

We can use docstring or custom decorator to express the stable/internal APIs like this:

class ViewBase:
    @internal_api
    def _do_refresh(self):
        ...

    @public_api
    def _refresh(self):
        ...

But it’s much better if the user can just tell a method is a public API by looking at its name(without an underscore prefix).

Thuna · July 12, 2024, 8:36pm

I don’t have any particular opinions on the proposal but I am wondering how someone who doesn’t use any sort of (relevant) tooling would be able to tell whether any given object is internal or not if the naming convention becomes unreliable. I guess the assumption is that people will be using some kind of a tool?

kkpattern · July 15, 2024, 1:34pm

The plan is that if we can mark a method as protected using a decorator besides the underscore name prefix. Then, library authors can tell the users that all the methods start with a underscore is internal API and should not be called/override. The internal APIs may even be filtered out entirely from the doc, like mkdocstrings did by default. We can use type checkers to make sure no protected methods are called outside the class.

I think I can try to do a reference implementation in mypy to see if this will add too much complication to typing and type checkers.

pawamoy · July 28, 2024, 4:08pm

I’m unfamiliar with the semantics of “protected”. In my experience, conventions in Python suggest __special__, _private and __class_private. Both private and class private names are what I usually call “internal API”.

The distinction of “can be called / overridden” and “can only be overridden” is new and interesting to me. I myself wouldn’t prefix any of these names with an underscore since they are exposed to users, and they would consequently be considered “public API”.

If I still wanted to prevent users from actually calling some methods, I guess I would use a decorator that inspects previous frames to raise an error if the call did not originate from a sibling method (or something like this, even though it sounds like an ugly hack, or maybe there are libraries that do this more elegantly/efficiently).

IMO this subject boils down to: “let’s standardize public/internal API concepts for Python and equip the standard lib with decorators or other tools to support the standard”. Definitely not an easy task

As for mkdocstrings, it uses Griffe to extract data from sources, and Griffe supports extensions, so extensions could be written to support any kind of decorators (or other techniques used to declare internal or public APIs), whether they are standard or third-party.

Kevdog824 · July 28, 2024, 5:46pm

I like the proposal and, in fact, have thought of making a similar proposal myself. I always hated the “_” prefix convention (My personal opinion is that it is ugly but that’s neither here nor there).

One more note I would make is that, like the final decorator which has a Final annotation for fields, we might also want to add Private[T], Protected[T], etc. annotations for fields. Admittedly, I haven’t given a ton of consideration to the side-effects or feasibility of applying access modification annotations to class fields so take the suggestion with a grain of salt. Arguably, you could use @property with @protected instead to make a protected field but property methods sometimes don’t play nicely with other libraries that do background “dynamic magic” like pydantic so implementing it this way may be a pain point for developers.

NeilGirdhar · July 29, 2024, 1:09pm

Right, this makes sense, and as you say, that’s what the leading underscore means.

This doesn’t make sense to me. If you remove the leading underscore, you’re telling users that they can call that method, which isn’t what you want, right? I don’t think you should be removing the leading underscore.

And here you say that both _do_refresh and _refresh might be overridden in contradiction to the above where you want only _refresh to be overridden.

In my opinion, the Pythonic way of indicating that you don’t want something to be overridden is to use the final decorator.

In summary, there are currently two concepts in Python:

protected (in the sense that they shouldn’t be called by users) methods, which are indicated with a leading underscore, and
final (in the sense that they shouldn’t be overridden) method, which are indicated with a decorator.

It seems like your proposal wants to add an alternative way of specifying protected methods, but I still don’t see why the leading underscore doesn’t work here? The idea that you’re guiding users on what to override is an orthogonal concept, and the only guidance is currently given by the final decorator or docstrings.

Incidentally, one concept that would be useful in my opinion, would be the guidance that a method implements an augmenting pattern. In your example, _refresh is an augmenting method, and any time it’s overridden, the overriding method should call super (or else behavior may be lost in case of multiple inheritance). Other methods like __enter__, __exit__,
and __init__ are augmenting methods. I’ve held off on proposing this because people don’t even have the patience to mark things with @override generally, but it would be useful in your example because it would serve to indicate (at least intuitively) that _refresh collects augmented behaviors from subclasses. It would also allow type checkers to verify that you’re calling super. (Pyright can already optionally check that __init__ calls super.)

Love it or hate it, it’s idiomatic Python . That’s why we don’t have Protected[T]—we just use leading underscores. I think I just accepted Python’s way of doing things over time

kkpattern · July 29, 2024, 1:49pm

That’s the reason why I want the protected decorator, I can remove the leading underscore and the type checker still know it’s a protected method and will prevent users from calling it.

The internal refers to the entire library. If a method is internal, only classes inside the library can override it. If a method is public, user land code outside the library can also override it.

The reason why I want to distinguish between the code inside and outside the library is that when refactoring, I can guarantee that I will update all the code inside the library, but I have no way to modify the user’s code.

NeilGirdhar · July 29, 2024, 2:03pm

Right, so I personally think it would be the path of least resistance to simply accept the idiomatic Python indicator, which is the leading underscore. There were some Python design choices that I didn’t like at first, but I just accepted because, ultimately, writing good Python code means writing code that is understood by other Python programmers, and that means writing idiomatic code.

Okay, so essentially two levels of “protection”. That’s interesting. You may want to choose a more descriptive name then like @library_overrideable or @not_user_overrideable.

I think you should keep the leading underscore for both cases though since you don’t want these methods to be called by external (to the class or its children) code.

Right, your desire makes sense.

The alternative would be to explain in the docstring what you mean. Or else, to have your own decorator. I recognize that you wouldn’t get the type checker support that you want though.

I think, as with a lot of such proposals, it would probably strengthen the case if you could find other libraries that could also make use of these two levels of protection. Essentially, how common is this?

xmw · July 29, 2024, 10:34pm

One current way to achieve the same effect is to separately distribute .pyi stub-only packages. If you have a library which looks like this,

# lib/module.py

class A:
    _protected_public_API_var: int
    _protected_implementation_details_var: int

    def _protected_public_API_method(self) -> int:
        return 0

    def _protected_implementation_details_method(self) -> int:
        return 0

then you can distribute a stubs package which looks like this, which supersedes any inline package based on import resolution ordering (#4) so that type-checkers will warn upon accessing restricted symbols in the following interface:

# lib-stubs/module.pyi
from typing import Final, Never

class A:
    _protected_public_API_var: int
    _protected_implementation_details_var: Final[Never]

    def _protected_public_API_method(self) -> int: ...
    _protected_implementation_details_method: Final[Never]

As long as library users install this stubs package, and library developers don’t install it, then everyone will be happy. Of course, this entails a maintenance burden for keeping the stubs package in sync with the runtime.

As for other languages, Java’s access modifiers and Rust’s pub(...) offer similar fine-grained levels of visibility, indicating that it’s useful to control access to a symbol based on the fully-qualified-name of the access request location (which is distinct from access of protected members in a class, as classes can be subclassed in third-party packages). The implication is that it’s safe to internally refactor things in a library for which third-parties should not have access to (a.k.a. implementation details) because third-party accesses’ fully-qualified-names aren’t on the library’s package path.

I’m not sure about a @protected decorator - I’d prefer to see a more general solution like what Java and Rust have.

Add a protected decorator to typing

Use case

Why not use final

Why not use private method

Implementation

Why not use `final`