Naming consistency for various types

A number of classes self-report names which are not an importable name for that type.
The inconsistency means that they can’t be pickled, typeshed can’t match their display name, and can cause minor confusion. I’m interested in getting these to be consistent where possible, so this thread is for any discussion before I go anywhere with that. The issue can be solved by either adding an additional importable alias for the type, or changing what the type names itself. Both of these approaches have sometimes been done in similar situations in the past.

Here’s all the ones I am aware of, broken up by category. I’m not expecting or proposing to create consistent naming for all of these, just trying to be complete:

These are types defined in C where the assigned name doesn’t match the import name:

  • _json.make_encoder calls itself _json.Encoder
  • _json.make_scanner calls itself _json.Scanner
  • pyexpat.XMLParserType / xml.parsers.expat.XMLParserType calls itself pyexpat.xmlparser
  • signal.ItimerError calls itself signal.itimer_error
  • _tkinter.TkappType calls itself _tkinter.tkapp
  • _tkinter.TkttType calls itself _tkinter.tktimertoken

The json types make sense to me as Encoder/Scanner and I think those should be added as a new alias. For the others, the current importable name seems more standard and I’d lean towards updating the internal name to match.

Several mismatches related to the new _interpqueues and _interpreters modules:

  • _interpqueues.QueueError calls itself test.support.interpreters.QueueError
  • _interpqueues.QueueNotFoundError calls itself test.support.interpreters.QueueNotFoundError
  • _interpreters.InterpreterError calls itself interpreters.InterpreterError
  • _interpreters.InterpreterNotFoundError calls itself interpreters.InterpreterNotFoundError
  • _interpreters.NotShareableError calls itself interpreters.NotShareableError

The interpreters.* errors won’t be an issue anymore once a interpreters module is added and they can be imported from there. I’m not certain what’s going on with the naming on the queue errors.

These are named tuples. They’re all private, and probably low importance but low risk to harmonize the names.

  • functools._CacheInfo calls itself functools.CacheInfo
  • shutil._ntuple_diskusage calls itself shutil.usage
  • urllib.parse._DefragResultBase calls itself urllib.parse.DefragResult
  • urllib.parse._ParseResultBase calls itself urllib.parse.ParseResult
  • urllib.parse._SplitResultBase calls itself urllib.parse.SplitResult

After that, there’s a bunch of builtin types which can’t be imported from builtins but have been given an official importable location in types. These are probably somewhere between touchy to impossible to make their self-name match an importable name because of how old and important they are. But it would be nice.

  • types.AsyncGeneratorType calls itself builtins.async_generator
  • types.BuiltinFunctionType / types.BuiltinMethodType calls itself builtins.builtin_function_or_method
  • types.CellType calls itself builtins.cell
  • types.ClassMethodDescriptorType calls itself builtins.classmethod_descriptor
  • types.CodeType calls itself builtins.code
  • types.CoroutineType calls itself builtins.coroutine
  • types.FrameType calls itself builtins.frame
  • types.FunctionType / types.LambdaType calls itself builtins.function
  • types.GeneratorType calls itself builtins.generator
  • types.GetSetDescriptorType calls itself builtins.getset_descriptor
  • types.MappingProxyType calls itself builtins.mappingproxy
  • types.MemberDescriptorType calls itself builtins.member_descriptor
  • types.MethodDescriptorType calls itself builtins.method_descriptor
  • types.MethodType calls itself builtins.method
  • types.MethodWrapperType calls itself builtins.method-wrapper
  • types.ModuleType calls itself builtins.module
  • types.TracebackType calls itself builtins.traceback
  • types.WrapperDescriptorType calls itself builtins.wrapper_descriptor
  • types.CapsuleType calls itself builtins.PyCapsule

These also are builtins that are not importable by the name they report, but they have
have special handling in pickle to make them pickleable:

  • types.EllipsisType calls itself builtins.ellipsis
  • types.NoneType calls itself builtins.NoneType
  • types.NotImplementedType calls itself builtins.NotImplementedType

The final category is a bunch of types in ctypes.wintypes which are created using ctypes.POINTER(). I won’t list them all here, but a representative example is ctypes.wintypes.PULONG which calls itself ctypes.wintypes.LP_c_ulong.

An additional, slightly related category named tuples which are never assigned to
an importable name because they’re defined inline with a class definition that inherits from them. I only noticed these because of work validating inheritance in typeshed.

  • ssl._ASN1Object inherits from a namedtuple that calls itself ssl._ASN1Object; typeshed calls it ssl._ASN1ObjectBase
  • tokenize.TokenInfo inherits from a namedtuple that calls itself tokenize.TokenInfo; typeshed calls it tokenize._TokenInfo
  • platform.uname_result inherits from a namedtuple that calls itself platform.uname_result_base; typeshed does not currently represent this base.
  • tkinter._VersionInfoType inherits from a namedtuple that calls itself tkinter._VersionInfoType; typeshed does not currently represent this base.
  • doctest.TestResults inherits from a namedtuple that calls itself doctest.TestResults; typeshed does not currently represent this base.

If they were given a name other than that of the class they provide a base for, typeshed’s representation could be a little closer. platform.uname_result_base is different, but it’s still tricky because it’s not a private name. Even so, these won’t be importable regardless of their name, so they’ll never be pickle-able and that small runtime effect is irrelevant here.

For comparison, these are previous mismatches that have been resolved which I know of:

  • _thread.LockType calls itself _thread.lock (_thread.lock was added in 3.13)
  • _ssl.SSLSession / ssl.SSLSession used to call itself _ssl.Session (changed to _ssl.SSLSession in 3.10)
  • _thread._ExceptHookArgs used to call itself _thread.ExceptHookArgs (changed in 3.10)
  • _ctypes.CFuncPtr used to call itself _ctypes.PyCFuncPtr (changed in 3.10)
  • weakref.CallableProxyType used to call itself builtins.weakcallableproxy (changed in 3.10)
  • weakref.ProxyType used to call itself builtins.weakproxy (changed in 3.10)
  • weakref.ReferenceType / weakref.ref used to call itself builtins.weakref (changed to weakref.ReferenceType in 3.10)
  • contextvars.Context used to call itself builtins.Context (changed in 3.10)
  • contextvars.ContextVar used to call itself builtins.ContextVar (changed in 3.10)
  • contextvars.Token used to call itself builtins.Token (changed in 3.10)
  • _struct.Struct / struct.Struct used to call itself builtins.Struct (changed to _struct.Struct in 3.9)
8 Likes

Thanks for the great summary!

For these specifically, there’s some previous discussion at Change names of builtin types exposed in the types module · Issue #100129 · python/cpython · GitHub

FWIW these feel like typeshed bugs to me: even if we can’t get the name in the stub exactly the same as the name the base class claims to have at runtime, we should at least try to incorporate the base class into the MRO in the stub with a slightly different name, unless it’s very hard to do so

Agreed; I mostly mentioned it that way here to make it clear that there’s not an alternate name for the same class already in use.

1 Like

Some of these might be intentional.
Some types shouldn’t be pickled (like a JSON encoder or a tk app).
In a lot of these cases it seems to me that typeshed should use a protocol with only the documented attributes. Some classes are private implementation detail and subject to change – especially if there’s a leading underscore in the name. Making such classes easier to work with might be leading users into a trap.

1 Like

Maybe, but there’s little harm in it. We’re talking about pickling the class itself here, not an instance of it. At any rate, I’m not personally concerned with whether or not someone can pickle the class itself (and I’m not sure why someone would want to), but that’s one of the few real runtime effects of the name inconsistency so I think it merits pointing out.

Ultimately I am more concerned with two things: the potential confusion of names that don’t match and the effects of (in)consistency on automated validation of typeshed (aka stubtest). Stubtest requires more exceptions and special handling when cpython itself is inconsistent, and I think that automated validation of the stubs is very important for maintaining their quality and fidelity to runtime. Where private implementation details do make it in to typeshed, stubtest is one of the best tools for ensuring that any relevant changes are tracked.

I don’t think I believe that the fact that having a __name__ which is different from the name used to expose a class is a meaningful speedbump against bad decisions.

2 Likes