Signatures, a call to action

rhettinger · February 6, 2023, 3:23am

I saw the other thread on math.log() and wanted to start a new one to shift the focus to what I
think is the underlying problem than needs to be solved.

If the playful story telling style doesn’t fit your tastes, please try and look past the style and
focus on the substance of the post. I tried rewriting this a few times but found that the parallel
construction form of comparison and contrast best communicated where work needs to be done.

There once was little scripting language called Python and it became very popular because it met user needs of “fit in your head”. It had functions like range(), min(), max(), getattr(), type(),
math.log(), and methods like dict.pop(), str.split(), str.index(), and list.index() People were
happy and there was much rejoicing.

A need arose to document this lovely language, but doing so involved creating a parallel language to describe it. Fortunately, there was a well known, well understood, and widely adopted notation involving square brackets for optional values. It accurately modeled the Python language:

range(stop)
range(start, stop[, step])
min(iterable[, key])
min(arg1, arg2, *args[, key])
getattr(object, name[, default])
math.log(x[, base])
dict.pop(key[, default])
str.index(sub[, start[, end]])
str.split([sep[, maxsplit]])
type(object)
type(name, bases, dict)

Occasionally, the documentation language had to list an entry twice to cover the union of two
calling patterns. Otherwise, there was peace and harmony throughout the land.

The Python language grammar was described by yet another language, EBNF. This was a standard but was tough readers to follow and was an awkward it. It also constrained Python in ways that got in the way of meeting user needs. Something had to give. Either Python had to change or EBNF had to be replaced by something more expressive. In the end, EBNF was replaced by PEG allowing the language to grow more naturally and providing better readability for those wanting to understand the grammar. The world was in harmony once again.

A need arose for yet another parallel language, this time to describe type signatures. This ground had been previously explored in formal mathematics, in fully typed languages, and in gradually typed languages such as TypeScript. Following those leads, Python gained an annotation language. At times, the fit was uncomfortable, but each time it was the typed language that adapted rather than Python itself. It grew “|” to replace “Union” and “Self” to replace awkward type variable constructions. The challenging cases listed above were handled by way of a Union or by overloads. This was sufficient to annotate most of the Python ecosystem with the notable exception of recursive types such as JSON. The beautiful language itself did not change except to allow the optional notations to be written inline with the code they described. There was some grumbling, but mostly the world was in harmony and users were happy.

Yet another parallel language arose. Signatures were designed to model the language in a way that supported runtime introspection, allowing tooling to become more powerful. Here the happy part of the story ends.

Signatures were only designed to describe the common and simple cases in Python. Work to complete the signature language to include a union of signatures was sadly left incomplete.

Some lucky parts of the API were marked as waiting for the signature design to be completed. Hence, str.index() has no signature.

Other parts of the API were not so lucky and the tail began to wag the dog. The list.index() method had to change its API. Its help now reads:

index(self, value, start=0, stop=9223372036854775807, /)

Yuck, how did these implementation details leak into the language? Likewise, str.split() got altered to use a -1 magic constant. Its help now reads:

split(self, /, sep=None, maxsplit=-1)

That’s a bummer because documentation modeling language formerly used in help() is clearer:

str.split([sep[, maxsplit]])

Unlike the previous parallel languages, a curious pattern has emerged. Rather than build out the parallel signature language to accurately model Python, there are recurring efforts change Python’s long standing, sensible battle tested APIs just to accommodate the incomplete signature language.

This makes no sense to me. The signature language needs to be extended and completed. We should not make permanent ad-hoc API changes just shoehorn our functions into an inadequately expressive modeling language.

If signatures can’t express something basic like *args, then tools like math.hypot(*coordinates)
should just have to wait.

In the case of math.log(x [, base]), I recommended that we leave the function alone. In MS Excel, the signature is the same as we have now. In other languages, the base argument is not nullable. In writing mathematics by hand or in LaTeX we don’t put a None or null in the base field. In the
two decade history of this function, no user has ever wanted to put None in for the base argument. So, we should be honest with ourselves. The purpose of the proposed change was not to benefit users of the function or to better model mathematics. The sole reason for the proposed edit was make it fit into an incomplete modeling language. Had the signature language been completed, no one would have ever suggested this API change. And for a mathematical function in particular, it especially
nice to keep the inputs and outputs in the domain of numbers.

If someone would just focus on the task of completing the work on signature objects, we could return to the happy world of the modeling languages adapting to Python rather than vice-versa. All that is needed is for *arg support and for signature unions. Otherwise, the functions listed above will never get signatures or they will become like the proverbial square pegs forced into round holes.

EpicWink · February 6, 2023, 3:49am

I personally would be fine with breaking API and making typing.Optional actually mean that a parameter is optional (or a variable can be inbound).

As that’s unlikely, how about typing.NotRequired or typing.PotentiallyUnspecified?

In response to the main suggestion: I agree, the API shouldn’t change to match incorrect documentation, but sometimes it’s tricky to determine which of the API and documentation is incorrect.

stoneleaf · February 6, 2023, 4:01am

Raymond Hettinger:

The list.index() method … help now reads:
index(self, value, start=0, stop=9223372036854775807, /)
Yuck, how did these implementation details leak into the language? Likewise, str.split() got altered to use a -1 magic constant. Its help now reads:
split(self, /, sep=None, maxsplit=-1)
That’s a bummer because documentation modeling language formerly used in help() is clearer:
str.split([sep[, maxsplit]])

I agree that those helps are much less helpful than they used to be, and I would love to see a completed signature language.

I also agree with Guido:

Should `None` defaults for optional arguments be discouraged?

Contrary to most reactions, I find it natural and desirable to accept None as an alternative to omit an optional parameter in a wide variety of APIs. Not because I like to read or write log(2.3, None) but because I like to be able to define a simple wrapper, e.g.
def my_log(x, base=None):
    # <extra stuff here>
    return math.log(x, base)

I suspect, however, that being able to accept None (or some other sentinel) to specify “use the default value” is a separate issue from signatures.

rhettinger · February 6, 2023, 4:41am

That is usually a safe bet

But it is a distractor. Sure if a function would benefit from a None default, then go head and do it. But don’t change the API simply because the signature objects aren’t sufficiently expressive. Instead, fix the actual problem (signature expressiveness) rather being forced into a decision that we wouldn’t do otherwise. <Insert the “tail wagging the dog” idiom here.>

The PR for math.log was not made to fix an API defect or user need. It was done solely to force fit to the limitations of signature objects. Don’t lose sight of that essential fact.

gpshead · February 6, 2023, 4:47am

These signature changes don’t really have anything to do with static typing and type annotations. They have everything to do with the current implementation of Argument Clinic which we use to make the C code behind these APIs easier to maintain.

Argument Clinic can be improved to support more things to allow not exposing internal details when deemed inappropriate. As has been done a few times since it came about.

rhettinger · February 6, 2023, 6:00am

I listed many core functions and methods that can’t currently be accommodated by AC or Signature objects. The call to action is to fix AC and Signature objects. That has been an open todo for many, many years. AFAICT no one is working on it or has even thought about it.

In the meantime, people keep trying to force fit APIs into AC even when they don’t fit.

I think you missed the entire point of the post. The goal was to highlight the difference between the various DSLs that have been created to model Python. Except for argument clinic, the other DSLs have adapted to fit the language. With AC and Signatures, the opposite is occurring.

Possibly in the spirit of Monty Python’s Argument Clinic skit, everyone seems to be arguing here even when we likely all agree about the core facts:

Many essential functions and methods cannot currently be modeled by Signature objects.
That has been the case for a very long time.
No one is currently working to fix it, nor is there a plan to do so.
More and more tooling such as PyCharm depends on Signature objects.

As a Steering Council Member are you satisfied with rhia state of affairs? Do you disagree that it should be fixed? Is there already someone working on it?

skirpichev · February 6, 2023, 6:21am

Sorry for a stupid question, but why do you think, that it doesn’t fit here? Is there any difference with an example

In [2]: def foo(*coords):
   ...:     pass
   ...: 

In [3]: inspect.signature(foo)
Out[3]: <Signature (*coords)>

In [4]: _3.parameters['coords'].kind
Out[4]: <_ParameterKind.VAR_POSITIONAL: 2>

?

Edit:
Add support of multiple signatures · Issue #73536 · python/cpython · GitHub seems to be related.

This also does make sense outside of the c-world of the stdlib, ex. for the multipledispatch package:

# with mrocklin/multipledispatch#114
from multipledispatch import dispatch

@dispatch()
def foo(x: int):
    return x + 1

@dispatch()
def foo(x: str, y: str):
    return "%s, %s" % (x, y)

I’m not entirely understand why we can’t return several Signature objects in this example with signature().

rhettinger · February 6, 2023, 6:41am

Can the ArgumentClinic generate that signature?

Our code is full of comments to the effect:

/* AC: cannot convert yet, waiting for *args support */
static PyObject *
builtin_min(PyObject *self, PyObject *args, PyObject *kwds)
...

It has been "waiting for *args support* for almost a decade with no progress.

skirpichev · February 6, 2023, 6:53am

Yes, see gh-101123: Add signature for the math.hypot by skirpichev · Pull Request #101124 · python/cpython · GitHub
The AC stuff (first commit) was reverted not due to missing capabilities of the AC or the inspect module, but rather to some performance penalty.

Yes. And your comment here was cryptic for me too. In the PR thread it was suggested, that it’s about performance issues, mentioned above. But I’m not sure.

Edit: In fact, you can see working signature in the last comment of the issue thread (help output). That’s doable with the AC or without (and no runtime cost). AC got *args support in PR#18609.

encukou · February 6, 2023, 9:18am

It could also be that there has been minor wart in the API all these years. Minor enough so no one bothered to fix it, and that everyone could easily work around it. In that case (only!), it’s better to just fix the API, rather than teach the signature mechanism to express the inferior signature.

Of course, in some APIs the “force-fitting” argument definitely holds, and of course signatures should be improved. But in cases where accepting None is slightly better – for reasons Guido gave – why not add it?
And then we’ll get a more focused set of use cases the signature improvements, which might lead to a better design.

pf_moore · February 6, 2023, 9:50am

Argument Clinic is a “convenience tool” for writing C functions, and should be used when the tool fits. If it doesn’t support a given signature, don’t use it (otherwise you’re committing the “when you have a hammer, everything looks like a nail” mistake). By all means extend Argument Clinic so that it handles more cases, but there’s no requirement to do so, just don’t use it if it doesn’t fit (and don’t argue that we should “make it fit” - see below).

Signature objects are different - they are an introspection API, and as such, should be available for as close to every callable as possible (raising ValueError when asked for a signature should be a last resort). However, it’s fine for a signature not to have annotations - typing is optional, after all. I’m not sure why the two are linked here - is Argument Clinic based on signatures? If so that’s an argument for why AC can’t be extended to support all APIs independently of changes to the Signature object, not an argument for refusing to allow certain APIs^[1].

The design of APIs is a third axis, though. The design of an API should be based on what’s easy to use, not what’s easy to implement^[2].

And that’s what ultimately triggered this debate - an API change justified by the limitations of an implementation choice, not on its own merits. I don’t have a strong view on whether the base argument of log should have an explicit default of None, but I do think that such a change needs to be argued based on use (which is what Guido did) and not on implementation details (which is what the original PR did).

There’s a valid global argument that could be made to have a policy limiting the types of API we allow (which could be “if it’s not supported by the Signature object, we won’t use it”) - but we tend not to support such sweeping global changes, and to my knowledge no such policy exists currently. ↩︎
Although ease of implementation can be a tiebreaker, and “difficult to implement” can be a red flag. ↩︎

steven.daprano · February 6, 2023, 11:04am

Raymond Hettinger:

The list.index() method … help now reads:
index(self, value, start=0, stop=9223372036854775807, /)
Yuck, how did these implementation details leak into the language? Likewise, str.split() got altered to use a -1 magic constant. Its help now reads:
split(self, /, sep=None, maxsplit=-1)
That’s a bummer because documentation modeling language formerly used in help() is clearer:
str.split([sep[, maxsplit]])

I think the maxsplit case is fine. -1 is a nice, obvious “magic value” that we can use if we need to explicitly force the default value.

But the index case is horrific. It offends my aesthetic sensibilities and makes me die a little bit inside every time I see it.

It would be a little bit better if it could be displayed as stop=2**63 - 1 but even that is too complicated to make a good “nice, obvious magic value” suitable as a default.

There are many cases where None makes an excellent “nice, obvious magic value” for defaults, but I’m not certain that logs are one.

I know this is subjective, but having base=None for the default just looks and feels weird to me. I could live with it, but if I were writing my own log function, I’d use a different magic value.

Oh look, I actually did


def log_star(x, base=0):

    """log_star(x)



    Return the iterated logarithm log*(x) to some base.



    If the base	is missing or 0, the natural log is used.          

    """

    # FIXME: base must be > e**(1/e)

    if x <= 1:

   	    return 0

    elif base == 0:

        return 1 + log_star(math.log(x))

    else:

        return 1 + log_star(math.log(x, base), base)

I could live with a default of None for the base, but I think that using 0 as the magic value is nicer and less weird.

skirpichev · February 6, 2023, 12:00pm

IMHO, base=0 would shock a mathematician.

steve.dower · February 6, 2023, 12:14pm

I think a core point of Raymond’s motivation/frustration here is that as new people arrive to contribute, it’s not clear which description of a signature is canonical.

Those of us who were around before argument clinic was added know what the signatures were before then. And so we know that the only reason they “don’t fit” into AC is because it hasn’t been finished.

But anyone who only started looking after AC was widely implemented is going to have to figure out for themselves which signature is “correct”. And the tendency of developers is to move towards the most concrete unification system possible.

To put into straw-man thoughts: “the square bracket notation isn’t concrete - it’s merely documentation - but default values in AC is definitive, therefore it must be the real representation that everything fits into and anything else is wrong and needs to be fixed”.

But ultimately, this doesn’t serve the user, or the reader of Python code. “You know what it means” is a perfectly fine approach, even if it ~~forces~~encourages the Python developer to use clearer variable/function names.^[1] I don’t think anyone would prefer having to remember to choose a specific function because the arguments couldn’t be sorted out behind the scenes.^[2]

But Raymond has a perfectly good call to action, or perhaps an announcement of opportunity. At the very least, a reminder to people newer to contributing that argument clinic and Signature objects are not complete, and rather than feeling like you have to force all the functions you find to fit within them, we actually want to loosen their constraints. And our guidelines for how “loose” they should be is our documented functions, rather than the implementation of those functions.

So if this is something you’re interested in working on, please go for it! There’s support for it, so don’t worry about getting pushback from the rest of the core team (maybe some, but others of us are on side and can do a lot of the arguing).

[Edit] And now I see this was triggered from a change by someone who has definitely been around a while I think my argument still stands, and I’ve seen plenty of examples of it, though it’s clearly not the root cause of this case.

min(ages_of_users) isn’t ever going to be mistaken for taking the minimum of a single value, for example. ↩︎
e.g. min_of_iterable(x) vs min(x, y, z). Or we could go Windows-style with minEx, or C style with min2 or imin. Or Python style, where you just know what it’s doing, unless someone is being deliberately obtuse, in which case you reject it in code review ↩︎

steven.daprano · February 6, 2023, 1:06pm

base=None would shock a mathematician even more, because None doesn’t even exist in any of the algebraic structures they work with.

There is already precedent in Python: int(string, base=0) exists, even though base 0 numerals are just as shocking as base 0 logs.

This is subjective and a matter of personal taste. I don’t hate using None as the default base for logs, but I think it is weird and 0 would fit better.

I can’t explain it in any objective terms, there are many defaults where I think None is perfectly fine. For example, in some of my own maths functions, I use defaults of None for:

def permutations_with_repetition(n, r=None): ...
def circular_permutations(n, r=None): ...
def chinese_remainder(*congruences, lo=None, hi=None): ...

GalaxySnail · February 6, 2023, 5:10pm

What about sys.maxsize? It seems to be better.

index(self, value, start=0, stop=sys.maxsize, /)

Another example is os.open, in documentation the signature is:

os.open(path, flags, mode=0o777, *, dir_fd=None)

but the help function displays:

>>> help(os.open)
Help on built-in function open in module nt:

open(path, flags, mode=511, *, dir_fd=None)
    ...

which is unintuitive.

pablogsal · February 6, 2023, 6:31pm

If I am not mistaken, we added *args support some time ago:

We are still missing **kwargs support and support for signature unions.

Another simpler, possibility may be to add just support to override the signature text (so is not automatically generated). This may allow us to move forward and start improving the situation even if it doesn’t cover all we want.

rhettinger · February 6, 2023, 9:45pm

Thanks for moving this forward. I hope the **kwargs gets completed soonish.

It was back in 2014 that Nick Coghlan added these comments through out the code base:

/* AC: cannot convert yet, waiting for *args support */

The world has been waiting for this for a long time.

rhettinger · February 6, 2023, 10:01pm

Yes. Exactly this. We see this occur over and over.

Here is a concrete proposal to kick off the conversation about how to complete the Argument Clinic and stop having to live with an incomplete tool.

Given that our documentation DSL can already describe optional arguments getattr(object, name[, default]) and that the type annotation DSL can already describe the function using overloads, let’s modify arg clinic to be able to describe and generate code for a union of signatures:

/*[clinic input]
getattr as builtin_getattr2

    object: object
    name: str
    /

getattr as builtin_getattr3

    object: object
    name: str
    default: object
    /

Get a named attribute from an object; getattr(x, 'y') is equivalent to x.y.

When a default argument is given, it is returned when the attribute doesn't
exist; without it, an exception is raised in that case.
[clinic start generated code -- signature 1]*/

static PyObject *
builtin_getattr2(PyObject *module, PyObject *object, PyString_Object *str)
/*[clinic end generated code: output=b1b433b9e51356f5 input=bed4ca14e29c20d1]*/

[clinic start generated code -- signature 2]*/

static PyObject *
builtin_getattr3(PyObject *module, PyObject *object, PyString_Object *str, PyObject *default)
/*[clinic end generated code: output=b1b433b9e51356f5 input=bed4ca14e29c20d1]*/

Having a union of signatures would work super well for functions where None can’t be used like getattr and dict.pop for cases where None is merely undesirable such as type(object) vs type(name, bases, dict, **kwds) or range(stop) vs range(start, stop[, step]).

skirpichev · February 7, 2023, 12:18am

Raymond Hettinger:

It was back in 2014 that Nick Coghlan added these comments through out the code base:
/* AC: cannot convert yet, waiting for *args support */
The world has been waiting for this for a long time.

Then, invalid signature for math.hypot · Issue #101123 · python/cpython · GitHub does make sense for you? The inspect module can represent this Signature and AC can handle this function.

That’s PR I was referring to above. @pablogsal , but please take look on the generated code for *args-only function in the PR#101124 (1st commit): I don’t understand why the math_hypot_impl here has PyCFunction type instead of _PyCFunctionFast. This introduce extra slowdown for the converted function for no reason. I think, it’s an AC bug.
Edit: OK, it seems there is Avoid temporary `varargs` tuple creation in argument passing · Issue #90370 · python/cpython · GitHub