AC: `NULL` defaults prevent correct signatures. Let's add `inspect.unrepresentable` to fix this

Problem

While working on gh-103131: Convert `sys.getsizeof` and `sys.set_asyncgen_hooks` to AC by sobolevn · Pull Request #103132 · python/cpython · GitHub I’ve noticed that NULL default is a big problem for current defaults in AC.

Right now, inspect.signature will fail for any function with NULL as the default. Let’s take builtins.iter (on 3.12) as an example:

>>> iter.__text_signature__
'($module, object, sentinel=<unrepresentable>, /)'

>>> import inspect
>>> inspect.signature(iter)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/sobolev/Desktop/cpython/Lib/inspect.py", line 3362, in signature
    return Signature.from_callable(obj, follow_wrapped=follow_wrapped,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sobolev/Desktop/cpython/Lib/inspect.py", line 3106, in from_callable
    return _signature_from_callable(obj, sigcls=cls,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sobolev/Desktop/cpython/Lib/inspect.py", line 2599, in _signature_from_callable
    return _signature_from_builtin(sigcls, obj,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sobolev/Desktop/cpython/Lib/inspect.py", line 2400, in _signature_from_builtin
    return _signature_fromstr(cls, func, s, skip_bound_arg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sobolev/Desktop/cpython/Lib/inspect.py", line 2261, in _signature_fromstr
    raise ValueError("{!r} builtin has invalid signature".format(obj))
ValueError: <built-in function iter> builtin has invalid signature

This is how AC converts NULL to be <unrepresentable> in __text_signature__:

/*[clinic input]
test_str_converter

    a: str = NULL
    /

[clinic start generated code]*/

PyDoc_STRVAR(test_str_converter__doc__,
"test_str_converter($module, a=<unrepresentable>)\n"
"--\n"
"\n");

Right now we have ~52 files with <unrepresentable> signatures.

There’s also a user-reported issue about bytes.hex having incorrect inspect.Signature: inspect.signature(bytes.hex) raises ValueError "builtin has invalid signature" · Issue #87233 · python/cpython · GitHub
And incorrect signatures in builtins module: Help text of builtin functions – missing signatures · Issue #107526 · python/cpython · GitHub

In typeshed we use ... to specify default values. For example, here’s how dir is defined: https://github.com/python/typeshed/blob/1c0500a57050815102c702efd053e09770a5ee88/stdlib/builtins.pyi#L1320

Proposed solution

I propose adding a special signleton value inspect.unrepesentable to be used instead. We can customize its __repr__ to be <unrepresentable>, ..., or whatever. Here’s how a hypothetical patch would look like:

diff --git Lib/inspect.py Lib/inspect.py
index c8211833dd0..64e1b8f0839 100644
--- Lib/inspect.py
+++ Lib/inspect.py
@@ -2238,10 +2238,29 @@ def _signature_strip_non_python_syntax(signature):
         add(string)
         if (string == ','):
             add(' ')
-    clean_signature = ''.join(text).strip().replace("\n", "")
+    clean_signature = ''.join(text).strip().replace("\n", "").replace(
+        # Handle `NULL` defaults:
+        "<unrepresentable>",
+        "__unrepresentable__",
+    )
     return clean_signature, self_parameter
 
 
+class _Unrepresentable:
+    _instance = None
+
+    def __new__(cls):
+        if cls._instance is not None:
+            return cls._instance
+        cls._instance = super().__new__(cls)
+        return cls._instance
+
+    def __repr__(self):
+        return "<unrepresentable>"
+
+unrepresentable = _Unrepresentable()
+
+
 def _signature_fromstr(cls, obj, s, skip_bound_arg=True):
     """Private helper to parse content of '__text_signature__'
     and return a Signature based on it.
@@ -2309,6 +2328,8 @@ def visit_Attribute(self, node):
         def visit_Name(self, node):
             if not isinstance(node.ctx, ast.Load):
                 raise ValueError()
+            if node.id == "__unrepresentable__":
+                return unrepresentable
             return wrap_value(node.id)
 
         def visit_BinOp(self, node):
@@ -2331,7 +2352,10 @@ def p(name_node, default_node, default=empty):
         if default_node and default_node is not _empty:
             try:
                 default_node = RewriteSymbolics().visit(default_node)
-                default = ast.literal_eval(default_node)
+                if default_node is unrepresentable:
+                    default = unrepresentable
+                else:
+                    default = ast.literal_eval(default_node)
             except ValueError:
                 raise ValueError("{!r} builtin has invalid signature".format(obj)) from None
         parameters.append(Parameter(name, kind, default=default, annotation=empty))

This will allow us to parse and inspect this signature:

>>> import inspect
>>> sig = inspect.signature(iter)
>>> sig.parameters
mappingproxy(OrderedDict({'object': <Parameter "object">, 'sentinel': <Parameter "sentinel=<unrepresentable>">}))
>>> sig.parameters['sentinel']
<Parameter "sentinel=<unrepresentable>">
>>> sig.parameters['sentinel'].default
<unrepresentable>
>>> sig.parameters['sentinel'].default is inspect.unrepresentable
True

Related:

I others agree, I can submit my patch + tests + docs.

CC @erlendaasland @storchaka

2 Likes

Thanks for taking this on; +1 from me. I would prefer ... instead of <unrepresentable>.

1 Like

It would be nice if the issue be so simple. But it is not. iter(object) and iter(object, inspect.unrepresentable) are different calls. For now, the only special value for Parameter.default is Parameter.empty, any code which builds args and kwargs by inspecting a signature will fail on unrepresentable.

2 Likes

... is out:

>>> ... is Ellipsis
True

We can document that this is just our way of representing the unrepresentable.
And passing _Unrepresentable instance would not work for calling these functions.
It will only work for inspecting the signatures.

I think that it is better than the current state: just no signature for NULL.

I’ve done some research about generating signatures based on @storchaka feedback and I now agree that we cannot make this the default. But, adding include_unrepresentable=False flag to inspect.signature function seems like a simple and working solution.

It will:

  • Allow us to use inspect.signature(..., include_unrepresentable=True) on all things inside (like in pydoc, help(), etc). And things that have NULL default will work correctly for our use-case
  • Not break anyone else’s code
  • Not complicate future potential inspect.signatures implementation

Downsides:

  • We will leak our internal tooling (which I consider it to be) as a public API, so maybe _include_unrepresentable, to indicate that this is some very strange argument?

See Pydoc: fall back to __text_signature__ if inspect.signature() fails · Issue #107782 · python/cpython · GitHub which adds a workaround of this problem in pydoc.

I believe that the correct general solution to this problem is to support multi-signatures. Instead of (object, sentinel=<something>, /) you will get a union of (object, /) and (object, sentinel, /). I looked at the corresponding inspect code, and it looks feasible. I think I can do it in the next few weeks or months. Maybe I’ll start today.

3 Likes

See also the discussion:

1 Like

Serhiy, thank you for the reference to Raymond’s Signatures thread. I always thought that marking optional parameters with no default with square brackets in the doc is just fine. Perfectly understandable to humans. I think that replacing bracket in the Library Buitin-functions chapter with two signatures is a regression for human readers. For me, there is one signature – pass something and maybe something else. At least reading that thread explains why the change was made even though I think it wrong. I hope that after you fix AC and inspect for tool use of signatures, pydoc can be changed to display the better form for humans.

2 Likes

I ran into this problem too because the tool we use to verify function signatures in typeshed relies on inspect.signature, which fails on these “<unrepresentable>” defaults. I proposed a hacky solution, but it would be better if the inspect module supported these signatures directly.

I would favor Nikita’s suggested solution of adding a marker like inspect.unrepresentable (I’d vote for exposing it as inspect.Parameter.unrepresentable, similar to .empty). @storchaka objects that this might break tools that assume something like iter(object, inspect.unrepresentable) would work. But it’s not unexpected for such tools that inspect signatures to have to adapt to new features in new Python versions (e.g., positional-only parameters). I help maintain several tools that rely on inspect.signature, and I’d much prefer if they could support unrepresentable defaults, even if that means I have to make some changes to support Python 3.13.

There is an alternative suggestion to support “multi-signatures”. That would be a good solution for some functions like iter, but for other cases like bytes.hex, I don’t think that solution is more elegant. It also adds significantly more complexity.

1 Like