Add a mixin class to allow module constants with custom repr and docstrings

I would like to propose a small class (it goes, probably, to the collections module), that will customize repr and docstring values for module constants (like math.pi or sys.maxsize). Draft implementation is available to play here: Add collections.NamedConst mixin class by skirpichev · Pull Request #22 · skirpichev/cpython · GitHub.

This will allow to use custom docstrings for constants (see issue #104042):

>>> import inspect, math
>>> math.e
math.e
>>> math.e.__doc__
'Euler number (natural logarithm base).'
>>> type(math.e)
<class 'collections._NamedFloat'>
>>> isinstance(math.e, float)
True

Cryptic literal values (like 6.283185307179586 or 9223372036854775807) also prevent using them as defaults for arguments. We already have some places in the stdlib, where such literal values are exposed and it’s obviously not a good idea to add more of them (e.g. to solve issue #89381), like this:

>>> import inspect
>>> inspect.signature(list.index)
<Signature (self, value, start=0, stop=9223372036854775807, /)>
>>> import math
>>> def mylog(x, base=math.e, /):
...     return math.log(x, base)
...     
>>> inspect.signature(mylog)
<Signature (x, base=2.718281828459045, /)>

With a custom repr for a constant, using such default values look more readable:

>>> import inspect, math
>>> s = inspect.signature(math.log); s
<Signature (x, base=math.e, /)>
>>> s.parameters['base'].default
math.e
>>> type(_)
<class 'collections._NamedFloat'>

Note, that such approach is already used in some places of the stdlib, e.g. the _NamedIntConstant class of the re module. Runtime cost of this should be negligible, performance impact will affect mostly module initialization.

I think it’s worth to mention the related proposal (issue #61005) to preserve original (as preferred by programmer in sources) representation for default values. Thanks to introspection capabilities of the inspect module this seems doable:

>>> import ast, inspect
>>> def foo(x=2**5 - 1):
...     pass
...     
>>> s = inspect.getsource(foo)
>>> t = ast.parse(s)
>>> t.body[0].args.defaults[0].lineno
1
>>> t.body[0].args.defaults[0].col_offset
10
>>> t.body[0].args.defaults[0].end_col_offset
18
>>> s.splitlines()[1-1][10:18]
'2**5 - 1'

But this seems to be a too limited solution. In general, meaning of constants can’t be easily inferred from their values (will 2**63-1 be more descriptive than 9223372036854775807?) or names (can you guess that math.tau is?).

3 Likes

Hmm, have you considered an enumeration? A lot of constants - and those far less recognizable than tau and MAXINT - have been seamlessly converted into enum values.

>>> import socket
>>> socket.SOCK_STREAM
<SocketKind.SOCK_STREAM: 1>

You can still use them exactly like normal numbers (42 + socket.SOCK_STREAM == 43) but they have better reprs.

2 Likes

Making these constants a float/int subclass is potentially a breaking change or at least forces the slow paths in many codes optimized for exact floats/ints.

I think a more backwards-compatible approach to better help messages is to add some doc/name mappings to the respective modules, e.g.:

__constant_docs__  = {'pi': '...', 'e': '...', 'tau': '...'}
__constant_names__ = {id(pi): 'math.pi', id(e): 'math.e', id(tau): 'math.tau'}

Then have pydoc/help consult __constant_docs__, and have sys.displayhook/inspect.Signature/pprint look up __constant_names__ so that at least in REPL, signature inspection and pretty print, math.e can be displayed as math.e. The downside is that repr(math.e) will remain '2.718281828459045' without a much more invasive change to PyObject_Repr (which can be done but at the cost of adding unnecessary overhead to most other repr calls).

2 Likes

Perhaps, but one that’s been deemed acceptable for many other stdlib modules.

1 Like

You meant something like this:

# mymath.py

import enum, math
class consts(float, enum.Enum):
    pi = math.pi
    e = math.e

pi = consts.pi
e = consts.e

def log(x, base=e, /):
    return math.log(x, base)

if __name__ == '__main__':
    import inspect
    print(inspect.signature(log))

?

$ python mymath.py 
(x, base=<consts.e: 2.718281828459045>, /)

Yes, it’s better than raw floats and I think it’s same performance-wise. But cons:

  1. You can’t customize docstring for the constant in this way.
  2. IMO, repr looks less readable than e or math.e.

As I said before, I don’t think that customized docstring is a breaking change. For the repr — yes, it might break some doctests, for instance this (why you would like to test that?):

>>> math.pi
3.141592653589793

But I don’t see other good way to make int/float/complex defaults useful in libraries. Signatures like foo(spam=2.718281828459045) looks fine only in toy projects.

Even arithmetic dunders for builtin floats/ints has no dedicated paths for exact floats/ints.

We have such case e.g. in builtin sum(), but I believe this is rather a historical artifact.

This cure looks worse than disease. Lets not introduce differences in the stdlib behavior, that depend on using REPL, IDLE or something else. inspect.Signature() should be same.

2 Likes

For example, BINARY_OP would specialize to BINARY_OP_MULTIPLY_FLOAT after an iteration if and only if both operands are exact floats:

So a loop like this would see performance degradation when math.pi is turned into a _NamedFloat:

for d in degrees:
    radians.append(d * math.pi / 180)

I agree that it isn’t really a breaking change then, since now come to think of it no reputable code should require exact floats to work. Those that look for exact floats should always do it for optimizations only and fallbacks should always be available for float subclasses.

1 Like

I’m not a fan of the super-short e outside of a clearly mathematical constant, but I could possibly get behind math.e as the repr. Fortunately it’s easy to achieve:

>>> class consts(float, enum.Enum):
...     pi = math.pi
...     e = math.e
...     def __repr__(self):
...         return "math." + self.name
...         
>>> consts.pi
math.pi

and since that is done once for all of the constants,[1] it’d be easy enough to iterate on that and refine the description, while keeping them all consistent.

If there IS a doctest like that, I would expect it to be a test for the math module itself, and therefore, subject to change along with any change that turns math.pi into an enum. And IMO the improvement to the repr, if considered worth doing at all, should easily be enough to justify this. (“If” because I think there’ll be some constants for which it is, and others for which it isn’t. Not every named constant necessarily needs to become an enum.)

Very small degradation. I highly doubt it’ll be enough to be worth worrying about.

# timeme.py
import math, enum
class consts(float, enum.Enum):
	pi = math.pi
	e = math.e
	def __repr__(self):
		return "math." + self.name
pi = consts.pi
e = consts.e
$ python3 -m timeit -s 'import math, timeme' '[d * math.pi / 180 for d in range(0, 360, 5)]'
200000 loops, best of 5: 1.48 usec per loop
$ python3 -m timeit -s 'import math, timeme' '[d * timeme.pi / 180 for d in range(0, 360, 5)]'
100000 loops, best of 5: 2.02 usec per loop

We’re talking microseconds here, for the whole list comp. I don’t think the difference is significant :slight_smile:


  1. or, once for all the ints and once for all the floats ↩︎

1 Like

Another (minor) problem is that non-stdlib code that uses one of these named constants in untyped code would cause type errors in typed code relying on the type checker deducing the type. i.e.

def func(v=math.e):
    print(v)

reveal_type(func) # def func(v: _NamedFloat) -> None

func(v=1.23) # Type Error

This change would be forcing people to go add :float/:int in (probably) older code. I’m not saying that’s necessarily a bad thing (I think more typed code is better), but I think it is worth considering.

2 Likes

Is there a reason we couldn’t use Annotated for this?

# math.py
from typing import Annotated, Final

float_e = Annotated[float, "Euler's number (natural logarithm base)."]
float_pi = Annotated[float, 'pi (16 decimal places).']
float_tau = Annotated[float, 'tau (16 decimal places).']

e: Final[float_e]
pi: Final[float_pi]
tau: Final[float_tau]

You would lose the __doc__ attribute, but it wouldn’t mess with the underlying type and it would allow you to inspect the annotation with get_type_hints.

(This was in response to Daniel btw, this only applies to stubs)

2 Likes

True, and with an “anonymous” class per constant decorated by @global_enum that’s even easier, plus it allows docstrings:

@enum.global_enum
class _const(float, enum.Enum):
    "Base for natural logarithm."
    e = 2.718
1 Like

Ah, I’d forgotten about global_enum. That’s definitely easier if you’re only doing one, though if there’s a bunch of them, I’d still be inclined to have a single class, since then it’s just one more line per constant. Yay for flexibility!

1 Like

I like this idea because it reuses an existing data model to describe a name without changing the type of the object the name is bound to, but we’ll need to standardize on how a name-bound docstring is separated from the rest of the existing annotations for a name, mostly likely through a standard Doc type as described in many other threads on this similar topic, e.g.:

float_pi = Annotated[float, Doc('pi (16 decimal places).')]
pi: Final[float_pi]

Of course this looks rather verbose but that’s what many other threads discussing a dedicated syntax for annotations are for.

2 Likes

As was brought up in the proposal for typing.Doc, this doesn’t work well with environments that use -OO to minimize memory use.

Typing also isn’t documentation.

2 Likes

But once Doc becomes standardized as part of the language/stdlib it can become a formal form of documentation and may make sense for it to be special-cased from certain levels of optimizations.

1 Like

I was just reading aboutt __objclass__ last night and really thought this was going to work but unfortunately help() seems to look up through getattr where as type(my_module).__dict__ shows the descriptors

It also doesn’t appear to work on a non module class either

from types import ModuleType
import sys

class ImmutableClassVarWithDoc:

    def __init__(self, value, doc):
        self.value = value
        self.__doc__ = doc

    def __set_name__(self, owner, name):
        self.__objclass__ = owner

    def __get__(self, instance, owner):
        return self.value

class CustomModule(ModuleType):

   foo = ImmutableClassVarWithDoc(1, "my doc string")

sys.modules[__name__].__class__ = CustomModule

class Bob:

   foo = ImmutableClassVarWithDoc(1, "my doc string")
1 Like

I agree but also disagree. I’ve found well typed code to be much more helpful than documentation pages many times since it’s so closely bound to the actual codebase.

I have tried as hard as I can in my code to try and bring typing and docs as close together as possible since, when using a language server or type checker, most of your interaction with the code will be through that.

But bringing paradigms like __doc__ and typing together are way outside the scope of this discussion lol

2 Likes

With this customization, you re-invented my wheelproposal, isn’t?

Does it allow separate docstrings for constants? I think you need a separate enum for each one. And docstring will be overflowed with enum-related generic info.

1 Like

What I’m saying is that an enum is the easiest way to achieve this.

As far as a more readable default value for a signature goes, I like your solution of extracting the expression that defines a default value from the source code much better than your proposal of relying on overriding __repr__ of named constants because more often than not the expression used to calculate a default value serves as a more meaningful documentation, and the expression gets lost if your named constant is operated on in any way. Extracting the expression of a default parameter not only works for named constants, but also for unnamed ones such as:

def action(timeout=5 * 60): # defaulting to a 5-minute timeout
    ...
2 Likes

Fun fact: This could be achieved as a side effect of PEP 671 late-bound defaults. That would have a performance impact, but since the main purpose of late-bound defaults is for things that can’t be evaluated eagerly, the documentation must by definition use the unevaluated expression. So you could do something like def action(timeout=>5 * 60) and then the docs would say 5 * 60 instead of 300. Personally I don’t think that that’s a strong argument in favour of PEP 671, but it is there.

3 Likes