Signatures, a call to action

Thanks for moving this forward. I hope the **kwargs gets completed soonish.

It was back in 2014 that Nick Coghlan added these comments through out the code base:

/* AC: cannot convert yet, waiting for *args support */

The world has been waiting for this for a long time.

Yes. Exactly this. We see this occur over and over.

Here is a concrete proposal to kick off the conversation about how to complete the Argument Clinic and stop having to live with an incomplete tool.

Given that our documentation DSL can already describe optional arguments getattr(object, name[, default]) and that the type annotation DSL can already describe the function using overloads, let’s modify arg clinic to be able to describe and generate code for a union of signatures:

/*[clinic input]
getattr as builtin_getattr2

    object: object
    name: str
    /

getattr as builtin_getattr3

    object: object
    name: str
    default: object
    /

Get a named attribute from an object; getattr(x, 'y') is equivalent to x.y.

When a default argument is given, it is returned when the attribute doesn't
exist; without it, an exception is raised in that case.
[clinic start generated code -- signature 1]*/

static PyObject *
builtin_getattr2(PyObject *module, PyObject *object, PyString_Object *str)
/*[clinic end generated code: output=b1b433b9e51356f5 input=bed4ca14e29c20d1]*/

[clinic start generated code -- signature 2]*/

static PyObject *
builtin_getattr3(PyObject *module, PyObject *object, PyString_Object *str, PyObject *default)
/*[clinic end generated code: output=b1b433b9e51356f5 input=bed4ca14e29c20d1]*/

Having a union of signatures would work super well for functions where None can’t be used like getattr and dict.pop for cases where None is merely undesirable such as type(object) vs type(name, bases, dict, **kwds) or range(stop) vs range(start, stop[, step]).

5 Likes

Then, invalid signature for math.hypot · Issue #101123 · python/cpython · GitHub does make sense for you? The inspect module can represent this Signature and AC can handle this function.

That’s PR I was referring to above. @pablogsal , but please take look on the generated code for *args-only function in the PR#101124 (1st commit): I don’t understand why the math_hypot_impl here has PyCFunction type instead of _PyCFunctionFast. This introduce extra slowdown for the converted function for no reason. I think, it’s an AC bug.
Edit: OK, it seems there is Avoid temporary `varargs` tuple creation in argument passing · Issue #90370 · python/cpython · GitHub

1 Like

To my mind that is an argument for None, not against.

We tend to use None because it is usually a good sentinel - it
isn’t in the valid domain for the parameter; here base.

Accepting 0 instead of None is a worse choice to my eye, because
it’s a number, and base is numeric. None is obviously a sentinel
value, because it isn’t numeric.

Now, with log() maybe 0 is obviously invalid, but only if you think
about it i.e. considering the semantics of the log() function
specificly. What about a more esoteric function? Having the “use the
default” placeholder be obviously not in the natural domain (numbers) is
a better thing IMO.

Of course, I’m also no great fan of -1 with str.split; to my mind it
has always represented some low level int-only C level API interface
from ancient times rather than a good choice for meaning “no limit”.

We’re diverting away from signatures a bit here.

Cheers,
Cameron Simpson cs@cskk.id.au

6 Likes

What about sys.maxsize? It seems to be better.

index(self, value, start=0, stop=sys.maxsize, /)

That would indeed be much better. (Though I’d personally prefer a
sentinel like None - too late now.)

This, to my mind, argues for a better way to express the origin of a
default value.

To take a Python side example from my personal stuff, I’ve got Python
methods with docstrings like this:

 @fmtdoc
 def tar(
     *srcpaths: List[str],
     chdirpath='.',
     output,
     tar_exe=TAR_EXE,
     bcount=DEFAULT_BCOUNT
 ):
   ''' Tar up the contents of `srcpaths` to `output`.

       Parameters:
       * `srcpaths`: source filesystem paths
       * `chdirpath`: optional directory to which to `chdir` before accessing `srcpaths`
       * `tar_exe`: optional `tar` executable, default from `TAR_EXE`: `{TAR_EXE}`
       * `bcount`: blocking factor in 152 byte unites,
         default from `DEFAULT_BCOUNT`: `{DEFAULT_BCOUNT}`
   '''

where the @fmtdoc decorator formats the docstring so that the
docstring says:

 tar(*srcpaths: List[str], chdirpath='.', output, tar_exe='tar', bcount=2048)
     Tar up the contents of `srcpaths` to `output`.

     Parameters:
     * `srcpaths`: source filesystem paths
     * `chdirpath`: optional directory to which to `chdir` before accessing `srcpaths`
     * `tar_exe`: optional `tar` executable, default from `TAR_EXE`: `tar`
     * `bcount`: blocking factor in 152 byte unites,
       default from `DEFAULT_BCOUNT`: `2048`

Now obviously this is Python side (and I’m using markdown in the
docstring to boot) but I’m arguing here that maybe the signature
documentation stuff could do with a mechanism to embed a chosen
representation of the default value (with or without the explicit
“resolved” value).

In the help text above the generated function signature line still has
magic values embedded in it, but wouldn’t it be better to have a
mechanism so that it could have meaningful terms there instead? Hand
hacked example:

 tar(*srcpaths: List[str], chdirpath='.', output, tar_exe=TAR_EXE 'tar', bcount=DEFAULT_BCOUNT 2048)

Another example is os.open, in
documentation the signature is:

os.open(path, flags, mode=0o777, *, dir_fd=None)

but the help function displays:

>>> help(os.open)
Help on built-in function open in module nt:

open(path, flags, mode=511, *, dir_fd=None)
   ...

which is unintuitive.

Aye, again a hook to “present the default value nicely” could be of help
here.

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like

Indeed, and I in fact just made the same point over on the other thread before I’d read this one:

2 Likes

It looks like this is already the case, and the __text_signature__ reflects this, but the help() rendering gets this wrong.

>>> list.index.__text_signature__
'($self, value, start=0, stop=sys.maxsize, /)'

Presumably this is because repr(sys.maxsize) is the decimal value?


If it’s interesting related art, in PyO3 we use Rust’s proc macros to do the equivalent of AC’s work of generating argument parsing code and signatures for documentation. Users can override what gets emitted for __text_signature__. See Function signatures - PyO3 user guide

At the moment we don’t try to generate default values in the __text_signature__, instead we just emit ... for all defaults. I believe using ... for defaults is commonplace in .pyi files.


One thing that I have found lacking in __text_signature__ is that type annotations are not currently supported. I think builtins and PyO3 alike would benefit greatly if type annotations were supported. I’ve been of the mind to propose this here before but not had the time to get around to it.

“Signature unions” would also be very nice to support, though it’s unclear to me how the existing __text_signature__ and inspect.signature() could be expanded to support them. Perhaps would need a new api inspect.overloads()?

This is an excellent idea!

It is likely the simplest way to address the needs of modern tooling. I’m thinking that this is what we want:

>>> signature(type)
[<Signature (object, /)>, <Signature (name, bases, dict, /, **kwds)>]

Having a union of signatures would solve most of our remaining problems. We’ve been waiting ten years for this. For functions like type(), range(), and getattr(), this is the only way forward.

I’m not sure. It looks like both inspect and ipython are not using __text_signature__.

In [1]: import inspect

In [2]: list.index.__text_signature__
Out[2]: '($self, value, start=0, stop=sys.maxsize, /)'

In [3]: inspect.signature(list.index)
Out[3]: <Signature (self, value, start=0, stop=9223372036854775807, /)>

In [4]: list.index?
Signature: list.index(self, value, start=0, stop=9223372036854775807, /)
Docstring:
Return first index of value.

Raises ValueError if the value is not present.
Type:      method_descriptor

Why do you think so? sys.maxsize got evaluated here, in the wrap_value method.

I agree that the current Signature object is not able to represent more complex signature of some functions implemented in C, and it is not good for some Python functions. The right solution is to extend the representation of signatures, not to limit the interface of functions.

There are several technical issues:

  1. Returning a sequence of Signature object from inspect.signature() is a breaking change. I agree that it is inevitable.

  2. A syntax using square brackets for optional arguments without clear default value is well known, but it is incompatible with Python syntax. Argument Clinic supports it (it is widely used in the curses module), but the inspect module does not. It is difficult to parse such syntax. So I think that internally such cases should be represented as a merge of simpler signatures, e.g. (key, /) and (key, default, /). But pydoc could detect common parts and output it as (key[, default], /).

  3. In case of list.index() and os.open() the problem is in a way how default values are represented in the Signature object. list.index.__text_signature__ is '($self, value, start=0, stop=sys.maxsize, /)', but inspect evaluates sys.maxsize and stores the default value in the Signature object as integer 9223372036854775807, because the Signature object can only contain default values, not they string representation. pydoc uses inspect.signature() to parse __text_signature__ and reconstruct the string representation of the signature, but loses the original representation in process. For Python functions, the text representation is not available from the start.

4 Likes

It would sound better to deprecate inspect.signature() while introducting a new inspect.signatures() (note plural).

6 Likes

Most functions have a single signature. I think that it can return something like MultiSignature. Some properties and methods should be common in Signature and MultiSignature, but with different behavior. For example, str() will return a multiline string.

I’m not sure what “different behavior” means, but it’s important not to break existing usage of inspect.signature for already supported signatures. For example, inspect.signature(...).parameters should still return an ordered mapping of names to parameters, not something else.

4 Likes

It would sound better to deprecate inspect.signature() while introducting a new inspect.signatures() (note plural).

I like this idea. Could you get the signatures from the typing overloads? I think that’s what David proposed above with inspect.overloads. Because, it appears that the information needed to resolve problem functions mentioned in the linked thread (dict.pop, type, range, min) are properly annotated in the typeshed: dict.pop, type, range, min.

1 Like

One way to avoid breakage would be to add a new function inspect.signatures() that always returns a list.:

>>> signatures(type)
[<Signature (object, /)>, <Signature (name, bases, dict, /, **kwds)>]

Legacy tooling can continue to use the old function which would continue to raise an exception for complex signatures. Presumably, tool makers (PyCharm, etc) would quickly move to the new function.

Another way to go is to add a link field to Signature objects giving something like:

>>> signature(type)
<Signature (object, /, alt=<Signature (name, bases, dict, /, **kwds)>)>

Existing tooling could ignore the link field at first and gradually grow smarter.

2 Likes

Sorry for the thread exhumation, I only just now saw this. What follows is a little historical context.

The initial motivation for Argument Clinic was to make it easy to add Python-visible signatures to a function. I figured, adding them by hand would take forever and be error-prone. And they’d go out of date, as core devs modified the argument parsing for a function but forgot (yet again) to update the “signature” to match. Making Argument Clinic generate the argument-parsing code itself seemed like a win; the signatures it generated were guaranteed to be accurate and up-to-date. (Argument Clinic hides the signature at the end of the docstring, in a place inspect.signature can find it, which meant built-in functions would finally have signatures, and pydoc signature information would be automatically up-to-date.)

Also, automatically generating the argument parsing code seemed like a boon to development and maintenance. It’d make it easier to add new extension functions in the future. I figured Clinic would initially be a puzzle to new core devs; instead of having to figure out how to use the somewhat-awkward Python argument-parsing functions, they’d have to figure out how to use the also-somewhat-awkward Argument Clinic. But they’d have plenty of examples, letting them copy-paste-modify their way to early success. And of course it’s a boon of convenience for the core dev programmer, having their function prototype magically generated for them, with its arguments magically appearing in the right types at runtime.

But that’s why Clinic was only designed to accommodate functions with signatures you could express in Python–generally, functions already using PyArg_ParseTuple or PyArgParseTupleAndKeywords. (Maybe even the ancient PyArg_Parse.) It really was all about smuggling the signatures for extension functions into the runtime so inspect.signature could read them. And it just wasn’t designed to permit expressing functions with signatures you couldn’t write in Python.

There are loads of functions in Python that have signatures you can’t really express in Python. IIRC the most common of these are functions with a parameter whose default value can’t be represented in Python; internally the C variable the argument is slotted into is pre-initialized with NULL, and there’s no way in Python to pass in a C NULL. But there are other functions that play games, e.g. using other internal default values that couldn’t be represented in Python, or by counting their parameters and only allowing discrete numbers of values. By the time I shipped Clinic I’d tried to support the latter with “argument groups”, which weren’t directly expressible in Python, but could be rendered in a readable fashion by pydoc using square brackets. I dimly recall Clinic also handles functions with these internal NULL defaults by similarly marking them as optional using square brackets. These aren’t proper Python signatures, though; IIRC I had to teach inspect.signature to ignore those square brackets. (The only way a inspect.Parameter object can represent the state “this parameter is optional” is by setting the default attribute to the parameter’s default value.)

After Argument Clinic shipped, the community responded by trying to force a lot of square pegs into round holes, adding Argument Clinic support to functions that weren’t really good candidates for it. Initially it seemed like a weird game, like contributors were trying to figure out ways to trick Clinic into generating code that parsed a reasonable simulacrum of the original function’s behavior, if you squinted your eyes just right. I found this low-key irritating, but I didn’t have the energy to fight about it. And I couldn’t really get mad that they seemed to want to make CPython better, make it easier to maintain, etc.

Eventually the effort to use Argument Clinic for as many functions as possible became a performance win, because it let us quickly innovate in faster argument parsing. You could write a new argument passing convention and only have to modify one spot–once you updated the Argument Clinic code generator to understand your new calling convention, you got the rest of the entire CPython source tree for free. I believe Victor’s “fast call” implementation was far easier to implement because of the work the community did in converting a significant portion of CPython extension functions to Clinic.

I’d say I’ve given any thought to how to extend Argument Clinic to handle more types of Python functions. But it’s also fair to say I’ve done nothing about it in a very long time, and I have no plans to return to the project. Finally, it’s not accurate to say that nothing has been done about it for more than a decade, as Argument Clinic didn’t land in the Python repo until October 2013.

9 Likes

I’m sure you have considered and rejected this idea without even mentioning it, but for what it’s worth: I have a small module for function overloading in my projects, and there I use the fact that Signature is not iterable to make an iterable OverloadedSignature subclass, which yields the sequence of overloaded plain Signature instances when iterated. This even plays very nicely with stringification and help() output.

It’s similar to the link field idea by @rhettinger but somewhat easier to use, I think.

Stripping out all the bits to do with my specific code, it’s something like this:

class MultiSignature(Signature):
    __slots__ = 'signatures'

    def __len__(self):
        return len(self.signatures)

    def __iter__(self):
        return iter(self.signatures)

    def __str__(self):
        return '\n'.join(map(str, self))

    def __repr__(self):
        return f'<Signature with {len(self.signatures)} alternatives(s)>'

sig = MultiSignature.from_callable(func)
sig.signatures = tuple(map(Signature.from_callable, [func, alt, other]))
1 Like

I started working on it. @rhettinger, I look forward to your help in reviewing the new docstrings for dict, range and other builtins.

It turns out that the difference in two implementations (list of signatures or the MultiSignature class) is small and does not matter much in comparison with other needed changes.

5 Likes

I’m thinking (as an outcome of this thread) about the PEP that will expose the __text_signature__ (which is undocumented and private so far) string attribute for extension modules. With some minor additions for its format specification (return annotations so far).

Maybe we could hit two rabbits in one shot and include basic support for multiple signatures?

The proposed extension of the __text_signature__ (perhaps, renamed in this case as __text_signatures__) format is trivial: lets just list all different signatures one by one, separated by a newline. Each signature has a parameter list, enclosed by round brackets, optionally followed by ‘->’ and an expression (for return annotation). For example, in the log() case this will look like:

($module, x, /)
($module, x, base, /)

The attribute will be created from the docstring just as we do it now (modulo renaming). Later we could modify the AC to support generation of the docstring from the clinic input, proposed above by Raymond, add the inspect.signatures(), etc.

Fancy syntax using square brackets for optional arguments might look more economical. But I suspect its more suitable for typeless languages. Also, above proposal is almost compatible with Python syntax: every valid text signature, after simple tweaks, will be ast.parse()-able.