Docstrings for dunder methods

The __add__ method Numpy arrays performs elementwise addition. For Python lists, it performs concatenation. However, the docstrings don’t tell us about this. The docstrings describe how __add__ is called rather than what it actually does. I’ve seen this be a problem for beginners over and over again.

>>> from numpy import ndarray
>>> help(ndarray.__add__)
Help on wrapper_descriptor:

__add__(self, value, /)
    Return self+value.

>>> help(list.__add__)
Help on wrapper_descriptor:

__add__(self, value, /)
    Return self+value.

Currently, pure Python classes can provide custom docstrings for dunder methods. Offhand, I’m not sure how to do this for C types. It might need new argument clinic magic to create an alternate docstring and a way for a wrapper_descriptor or inspect to find it.

class A:
    def __add__(self, other):
        'Concatenate two A instances to make a new A.'

>>> help(A.__add__)
Help on function __add__ in module __main__:

__add__(self, other)
    Concatenate two A instances to make a new A.
6 Likes

Related:
The brand new @critical_section Argument Clinic directive is currently being used to help implement free-threading. @colesbury has floated the idea of adding type slot support to Argument Clinic, to further ease implementation of PEP-703. With type slot support in place, it would be easy to add custom docstrings for dunder methods in C types.

2 Likes

IIUIC, it’s impossible right now. All wrapper descriptors (i.e. the __add__ dunder for different types) share pointer to same wrapperbase struct.
We could extend the PyWrapperDescrObject with a doc field to support this kind of overloading.

Quick draw
>>> help(int.__add__)
Help on wrapper_descriptor:

__add__(self, value, /) unbound builtins.int method
    Return self+value.

>>> help(list.__add__)
Help on wrapper_descriptor:

__add__(...) unbound builtins.list method
    Boo!

diff --git a/Include/cpython/descrobject.h b/Include/cpython/descrobject.h
index bbad8b59c2..25545fc9e4 100644
--- a/Include/cpython/descrobject.h
+++ b/Include/cpython/descrobject.h
@@ -55,6 +55,7 @@ typedef struct {
     PyDescr_COMMON;
     struct wrapperbase *d_base;
     void *d_wrapped; /* This can be any function pointer */
+    const char* doc;
 } PyWrapperDescrObject;
 
 PyAPI_FUNC(PyObject *) PyDescr_NewWrapper(PyTypeObject *,
diff --git a/Objects/descrobject.c b/Objects/descrobject.c
index df546a090c..cabd01e8d9 100644
--- a/Objects/descrobject.c
+++ b/Objects/descrobject.c
@@ -687,7 +687,7 @@ static PyObject *
 wrapperdescr_get_doc(PyObject *self, void *closure)
 {
     PyWrapperDescrObject *descr = (PyWrapperDescrObject *)self;
-    return _PyType_GetDocFromInternalDoc(descr->d_base->name, descr->d_base->doc);
+    return _PyType_GetDocFromInternalDoc(descr->d_base->name, descr->doc);
 }
 
 static PyObject *
@@ -695,7 +695,7 @@ wrapperdescr_get_text_signature(PyObject *self, void *closure)
 {
     PyWrapperDescrObject *descr = (PyWrapperDescrObject *)self;
     return _PyType_GetTextSignatureFromInternalDoc(descr->d_base->name,
-                                                   descr->d_base->doc, 0);
+                                                   descr->doc, 0);
 }
 
 static PyGetSetDef wrapperdescr_getset[] = {
@@ -1019,6 +1019,7 @@ PyDescr_NewWrapper(PyTypeObject *type, struct wrapperbase *base, void *wrapped)
     if (descr != NULL) {
         descr->d_base = base;
         descr->d_wrapped = wrapped;
+        descr->doc = base->doc;
     }
     return (PyObject *)descr;
 }
@@ -1402,7 +1403,7 @@ static PyObject *
 wrapper_doc(PyObject *self, void *Py_UNUSED(ignored))
 {
     wrapperobject *wp = (wrapperobject *)self;
-    return _PyType_GetDocFromInternalDoc(wp->descr->d_base->name, wp->descr->d_base->doc);
+    return _PyType_GetDocFromInternalDoc(wp->descr->d_base->name, wp->descr->doc);
 }
 
 static PyObject *
@@ -1410,7 +1411,7 @@ wrapper_text_signature(PyObject *self, void *Py_UNUSED(ignored))
 {
     wrapperobject *wp = (wrapperobject *)self;
     return _PyType_GetTextSignatureFromInternalDoc(wp->descr->d_base->name,
-                                                   wp->descr->d_base->doc, 0);
+                                                   wp->descr->doc, 0);
 }
 
 static PyObject *
diff --git a/Objects/object.c b/Objects/object.c
index df14fe0c6f..4e1de930a7 100644
--- a/Objects/object.c
+++ b/Objects/object.c
@@ -2337,6 +2337,11 @@ _PyTypes_InitTypes(PyInterpreterState *interp)
         }
     }
 
+    PyObject *dict = PyType_GetDict(&PyList_Type);
+    PyObject *descr = PyDict_GetItem(dict, &_Py_ID(__add__));
+    Py_DECREF(dict);
+    ((PyWrapperDescrObject *)descr)->doc = "Boo!";
+
     // Must be after static types are initialized
     if (_Py_initialize_generic(interp) < 0) {
         return _PyStatus_ERR("Can't initialize generic types");

Probably, first there should be a way to do this without AC…

2 Likes

Sorry to resurrect an old thread, but I wanted to say that I agree it would be great to have better docstrings for the built-in types.

I started poking at this a little bit this morning, and I put together a quick little test implementation that adds docstrings to str’s dunder methods using the strategy @skirpichev suggested above (thanks!) and a script in Tools to attach the new docstrings to the built-in types at startup.

I don’t want to worry about the particular phrasing of the new docstrings yet, but rather just to get a sense of whether this is something worth devoting time to (and if there are better ideas for how to organize the code).

For reference:

First part of help(str) before
class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __getnewargs__(...)
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __len__(self, /)
 |      Return len(self).
 |  
 |  __lt__(self, value, /)
 |      Return self<value.
 |  
 |  __mod__(self, value, /)
 |      Return self%value.
 |  
 |  __mul__(self, value, /)
 |      Return self*value.
 |  
 |  __ne__(self, value, /)
 |      Return self!=value.
 |  
First part of help(str) after
class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to 'utf-8'.
 |  errors defaults to 'strict'.
 |
 |  Methods defined here:
 |
 |  __add__(self, value, /)
 |      Implements self + value.
 |
 |      Return the concatenation of self and value.
 |
 |  __contains__(self, value, /)
 |      Implements value in self.
 |
 |      Return True if the given value is a substring of self, and False
 |      otherwise.
 |
 |      Equivalent to self.find(value) >= 0.
 |
 |  __eq__(self, value, /)
 |      Return self==value.
 |
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |
 |  __ge__(self, value, /)
 |      Implements self >= value.
 |
 |      Comparisons of strings use lexicographical ordering.  self and value are
 |      compared character by character using their Unicode code points.  If all
 |      compared characters are equal, the longer string is considered to be
 |      greater.
 |
 |  __getitem__(self, key, /)
 |      Implements self[key].
 |
 |      If key is an integer, return a length-1 string containing the
 |      character at that index in self.
 |
 |      If key is a slice, return the corresponding substring of self.
 |
 |  __getnewargs__(self, /)
 |
 |  __gt__(self, value, /)
 |      Implements self > value.
 |
 |      Comparisons of strings use lexicographical ordering.  self and value are
 |      compared character by character using their Unicode code points.  If all
 |      compared characters are equal, the longer string is considered to be
 |      greater.
 |
 |  __hash__(self, /)
 |      Return hash(self).
 |
 |  __iter__(self, /)
 |      Implements iter(self).
 |
 |      Return an iterator over the characters of self.  The iterator yields
 |      successive length-1 substrings, each corresponding to a single Unicode
 |      character in self.
 |
 |  __le__(self, value, /)
 |      Implements self <= value.
 |
 |      Comparisons of strings use lexicographical ordering.  self and value are
 |      compared character by character using their Unicode code points.  If all
 |      compared characters are equal, the longer string is considered to be
 |      greater.
 |
 |  __len__(self, /)
 |      Implements len(self).
 |
 |      Return the number of Unicode characters in self.
 |
 |  __lt__(self, value, /)
 |      Implements self < value.
 |
 |      Comparisons of strings use lexicographical ordering.  self and value are
 |      compared character by character using their Unicode code points.  If all
 |      compared characters are equal, the longer string is considered to be
 |      greater.
 |
 |  __mod__(self, value, /)
 |      Implements self % value.
 |
 |      Return a new string printf-style string formatting, replacing '%'
 |      conversion specifications in self with zero or more elements in the
 |      given value.  Alternative options for string formatting include
 |      str.format, or template strings.
 |
 |  __mul__(self, value, /)
 |      Implements self * value.
 |
 |      Return a new string containing the contents of self repeated a number of
 |      times given by value, which must be an integer.
 |
 |  __ne__(self, value, /)
 |      Return self!=value.
1 Like

This was raised in 2014 in Wrapper descriptor (slot) methods have fixed docstrings · Issue #65508 · python/cpython · GitHub.

I think we should do this, both (a) providing support for custom docstrings for C slot methods, and (b) extending AC to make this simpler within CPython.

I’ve glanced through the test implementation you link – this won’t work, because non-CPython extension modules need the ability to override the wrapper docstrings, rather than making it part of static type initialisation.

Perhaps we could have a PyDescr_SetDoc() API to do this? cc @encukou for thoughts/comments

A

I don’t think it’s a problem. Assuming that PyWrapperDescrObject was adjusted like above, you *can* change it same way in an extension. (Though, a dedicated API will make this much more convenient.) I doubt it’s a good approach, assuming we make API from scratch. (I.e. why not have an entry for docs in the PyType_Slot? The difference wrt PyMethodDef is that we have sane default for docstring, if this field is not provided, NULL.) But if this option is not on the table - I don’t think we can have something very different from the proposed solution.

@adqm , I think you should try to integrate this in the AC, instead of inventing a new tool. But lets wait if core devs come with a better design.

1 Like

AFAIK so far we’ve treated the descriptors are immutable after they’re created (modulo the fact that you can fudge the struct directly), so I’d not be comfortable adding a PyDescr_SetDoc.

The public API will almost certainly different from the built-in types, so it would make some sense to tackle them separately.
I guess AC is the way (for built-in types) – if it can generate what add_operators does statically, it might even improve startup time.

1 Like

You can already do this since Python 2.4. For example, list.__getitem__ and dict.__getitem__ have different docstrings:

>>> list.__getitem__.__doc__
'Return self[index].'
>>> dict.__getitem__.__doc__
'Return self[key].'
1 Like

I believe this uses METH_COEXIST in a regular method function, rather than using the slot. Would it make sense to do this more generally simply to add/override docstrings? Would there be a performance penalty?

I guess - no, based on example (very performance-critical method). But I did also quick benchmarking on a toy extension type and I have seen no measurable difference.

patch + benchmark
# bench.py
import pyperf
from operator import add
from example import xxx

x, y = map(xxx, [123, 321])
runner = pyperf.Runner()
s = repr(x) + " + " + repr(y)
runner.bench_func(s, add, x, y)

(This benchmark shows e.g. difference for static and heap types.)

Patch:

diff --git a/example.c b/example.c
index 6d46cce..ea73088 100644
--- a/example.c
+++ b/example.c
@@ -62,6 +62,12 @@ static PyNumberMethods xxx_as_number = {
     .nb_add = add,
 };
 
+static PyMethodDef xxx_methods[] = {
+    {"__add__", add, METH_O | METH_COEXIST,
+     "__add__(self, other, /)\n--\n\nBoo!"},
+    {NULL, NULL}
+};
+
 PyTypeObject XXX_Type = {
     PyVarObject_HEAD_INIT(NULL, 0)
     .tp_name = "xxx",
@@ -70,6 +76,7 @@ PyTypeObject XXX_Type = {
     .tp_repr = repr,
     .tp_as_number = &xxx_as_number,
     .tp_flags = Py_TPFLAGS_DEFAULT,
+    .tp_methods = xxx_methods,
 };
 
 static int

I doubt this will work with some special snowflakes (e.g.__pow__). Though, this is something worth be documented, as it could be helpful for most cases.

METH_COEXIST was added as a way to speed up the direct call of dunder methods in Python code. But it seems to have a serious impact on the subclasses.