Use name mangling in the standard library where appropriate / more consistently

A question was just asked on Stack Overflow wherein someone accidentally encountered a name collision while trying to inherit from threading.Thread. The code attempted to use its own internal “_started” attribute for its own purposes, which turns out to conflict with a name used in the base class implementation.

To the best of my understanding, this class is intended to be overridden in user code (to implement run, rather than passing a callable to the constructor). Certainly user-defined classes are ordinarily supposed to be able to add attributes to a class. The expected way to learn what attributes are already assigned and have meaning (and thus cannot be set willy-nilly for the derived class’ own purposes) is to read the documentation; but of course leading-underscore attributes aren’t expected to be documented.

Shouldn’t attributes like this, in general - i.e., implementation details of standard library classes that are intended for subclassing by the user - consistently take advantage of name mangling (i.e. use a double leading underscore instead)? The purpose of that feature is exactly to avoid this sort of collision, right? (And if it were intended for subclasses to know about and work with the existing attribute, it should be documented and not use a leading underscore, right?)

It doesn’t need to be. If you’re subclassing and want to be sure you don’t collide, you can use name mangling. Unlike some other features (like protocols surrounding super() and the passing on of kwargs), this is something where it requires no coordination between the classes.

I’m not so sure. Classes intended to subclass ought to document “protected” attributes.

4 Likes

C# distinguishes between ‘private’ and ‘protected’. ‘private’ means that only that class can see it, whereas ‘protected’ means that only that class and its subclasses can see it.

That being the case, does a single underscore in Python indicate private or protected?

A single underscore means protected in the C# sense and also in the Python sense: you should not use these attributes or methods outside the class definition, since they are meant for class internal use.

A double underscore prefix gets mangled and thus can only be used by that class. Subclasses won’t directly be able to use such attributes or methods. This is private in the C# sense and also in the Python sense. See 9. Classes — Python 3.11.5 documentation for details on how these are handled in Python.

Example:

>>> class C:
...     _protected = 1
...     __private = 2
...
>>> o = C()
>>> o._protected
1
>>> o.__private
Traceback (most recent call last):
  File "<console>", line 1, in <module>
AttributeError: 'C' object has no attribute '__private'
>>> o._C__private
2

Note that none of this is enforced by Python. It’s only by convention and agreeing on how API contracts work.

I think that most C# programmers would say that “in the C# sense” requires the language to actually enforce the restriction.

I also don’t think it’s fair to compare protected in C# (similarly Java and C++) to the single-underscore convention, because the “class internal use” idea in Python’s convention isn’t as formally defined of a concept. Languages that implement a protected keyword, by so doing, promote a viewpoint that derived classes have a more compelling interest in the implementation details of the base, than client code does. The point I’m trying to make isn’t just about conventions vs. enforcement, or even the philosophy behind offering such features per se, but rather the philosophy of what subclassing entails.

Java also has a concept of restricting access to the same package as the current class (C# might too? I don’t remember), which I suppose illustrates that there is some variation in ways of understanding the meaning of “internal” in this context.

Personally I only ever use _, but then I find that I don’t design APIs where I expect users to inherit from my classes. I barely use inheritance in my own code! (Working with protocols is just more convenient for me most of the time, and I quite often find that inheritance is just not the strategy I want for code reuse, if there actually is code to reuse between the classes. And then, of course, there are the cases where I only ever end up with one implementation, even if I expected otherwise to start.)

I think the exception here is that protected (Python’s single _) can legitimately be used in subclasses, so for the benefit of subclasses (if subclassing is an intended use) all single-underscore attributes should be documented. This is the responsibility of the base class (if it advertises subclassing as an option), not of the subclass. (IOW I disagree with what Chris wrote.)

In this specific case, it’s not clear to me whether people who subclass threading.Thread are intended to use its _started attribute in their own logic, or if it was only called that due to an attitude of defaulting to single-underscore names for implementation details.

If they are intended to use it, then simply put I agree with @guido . Not just if the class “advertises subclassing as an option”, but specifically if _started should be “available to” subclasses.

If they’re not, then I propose that the standard library should use name mangling in such a case. As it happens, the intuition that thinks of this as representing “private” leads to that conclusion naturally; but I want to show that thinking about it purely in terms of what Python actually does, and without using that theory or dogma about “access levels”, leads to the same conclusion.

I’ve seen a lot of advice that says we should only use the double-underscore name mangling when it’s really necessary. But if clients of threading aren’t intended to work with _started in derived classes, and everyone is following this style guide, then collisions of this sort are only to be expected. The standard library authors wouldn’t write __started because they don’t have a subclass that needs to avoid treading on the base class attribute, and the clients wouldn’t write __started because they have no reason to anticipate the collision. If they both wrote __started “defensively” then the name mangling would save them. But since they both write _started “idiomatically”, the client encounters an error.

An experienced developer might manage to avoid a long debugging session by inferring “oh, that clearly can’t be my code doing that, so it must be the library’s code, so it must be due to a name collision”. But clearly there’s potential for the error to be quite confusing. In the motivating case, for example, the user wanted the derived class _started to be a simple boolean, whereas the base class apparently uses an instance of some class with an is_set method (some kind of semaphore perhaps?).

Of course, either party can avoid the name collision by using a double underscore: if it’s only used by one party then the names are actually different to begin with, and if it’s used by both then the name mangling does what it’s designed to do. But if the style expectation is to use a single underscore by default, I argue that it’s the library’s responsibility to break that style guide to preempt the problem. Library code should work smoothly when users do normal things. A convention that says “prefer a single underscore, except when you inherit from a library class” seems needlessly complex to me.

Can you (or someone else) do some research trying to find out whether Thread._started is intended to be shared with subclasses or not? And what about other single-_ attributes? That seems to be the logical next step.

If the conclusion is that _started and other protected attributes were intended as “private”, not “protected”, we should research whether renaming these will break existing code. Since this class has existed for a long time, if there is functionality that’s useful but only accessible through “private” attributes, it may be too late to rename them, and our only choice is to document them. We could still propose to deprecate them, but that will have to follow the standard deprecation process.

1 Like

_started is used in Thread subclasses _MainThread and _DummyThread and in a test for _DummyThread.

So can we conclude that _started should also be shareable with user subclasses?