I saw the other thread on math.log() and wanted to start a new one to shift the focus to what I
think is the underlying problem than needs to be solved.
If the playful story telling style doesn’t fit your tastes, please try and look past the style and
focus on the substance of the post. I tried rewriting this a few times but found that the parallel
construction form of comparison and contrast best communicated where work needs to be done.
There once was little scripting language called Python and it became very popular because it met user needs of “fit in your head”. It had functions like range(), min(), max(), getattr(), type(),
math.log(), and methods like dict.pop(), str.split(), str.index(), and list.index() People were
happy and there was much rejoicing.
A need arose to document this lovely language, but doing so involved creating a parallel language to describe it. Fortunately, there was a well known, well understood, and widely adopted notation involving square brackets for optional values. It accurately modeled the Python language:
range(stop)
range(start, stop[, step])
min(iterable[, key])
min(arg1, arg2, *args[, key])
getattr(object, name[, default])
math.log(x[, base])
dict.pop(key[, default])
str.index(sub[, start[, end]])
str.split([sep[, maxsplit]])
type(object)
type(name, bases, dict)
Occasionally, the documentation language had to list an entry twice to cover the union of two
calling patterns. Otherwise, there was peace and harmony throughout the land.
The Python language grammar was described by yet another language, EBNF. This was a standard but was tough readers to follow and was an awkward it. It also constrained Python in ways that got in the way of meeting user needs. Something had to give. Either Python had to change or EBNF had to be replaced by something more expressive. In the end, EBNF was replaced by PEG allowing the language to grow more naturally and providing better readability for those wanting to understand the grammar. The world was in harmony once again.
A need arose for yet another parallel language, this time to describe type signatures. This ground had been previously explored in formal mathematics, in fully typed languages, and in gradually typed languages such as TypeScript. Following those leads, Python gained an annotation language. At times, the fit was uncomfortable, but each time it was the typed language that adapted rather than Python itself. It grew “|” to replace “Union” and “Self” to replace awkward type variable constructions. The challenging cases listed above were handled by way of a Union or by overloads. This was sufficient to annotate most of the Python ecosystem with the notable exception of recursive types such as JSON. The beautiful language itself did not change except to allow the optional notations to be written inline with the code they described. There was some grumbling, but mostly the world was in harmony and users were happy.
Yet another parallel language arose. Signatures were designed to model the language in a way that supported runtime introspection, allowing tooling to become more powerful. Here the happy part of the story ends.
Signatures were only designed to describe the common and simple cases in Python. Work to complete the signature language to include a union of signatures was sadly left incomplete.
Some lucky parts of the API were marked as waiting for the signature design to be completed. Hence, str.index() has no signature.
Other parts of the API were not so lucky and the tail began to wag the dog. The list.index() method had to change its API. Its help now reads:
index(self, value, start=0, stop=9223372036854775807, /)
Yuck, how did these implementation details leak into the language? Likewise, str.split() got altered to use a -1 magic constant. Its help now reads:
split(self, /, sep=None, maxsplit=-1)
That’s a bummer because documentation modeling language formerly used in help() is clearer:
str.split([sep[, maxsplit]])
Unlike the previous parallel languages, a curious pattern has emerged. Rather than build out the parallel signature language to accurately model Python, there are recurring efforts change Python’s long standing, sensible battle tested APIs just to accommodate the incomplete signature language.
This makes no sense to me. The signature language needs to be extended and completed. We should not make permanent ad-hoc API changes just shoehorn our functions into an inadequately expressive modeling language.
If signatures can’t express something basic like *args, then tools like math.hypot(*coordinates)
should just have to wait.
In the case of math.log(x [, base]), I recommended that we leave the function alone. In MS Excel, the signature is the same as we have now. In other languages, the base argument is not nullable. In writing mathematics by hand or in LaTeX we don’t put a None or null in the base field. In the
two decade history of this function, no user has ever wanted to put None in for the base argument. So, we should be honest with ourselves. The purpose of the proposed change was not to benefit users of the function or to better model mathematics. The sole reason for the proposed edit was make it fit into an incomplete modeling language. Had the signature language been completed, no one would have ever suggested this API change. And for a mathematical function in particular, it especially
nice to keep the inputs and outputs in the domain of numbers.
If someone would just focus on the task of completing the work on signature objects, we could return to the happy world of the modeling languages adapting to Python rather than vice-versa. All that is needed is for *arg support and for signature unions. Otherwise, the functions listed above will never get signatures or they will become like the proverbial square pegs forced into round holes.