Type Signatures for extension modules (PEP draft)

This pr slightly relaxes format requirements for text signatures in PyDoc_STRVAR and adds support for return annotations in the __text_signature__. Asking for review per suggestion in the devguide.

Perhaps, related issue __text_signature__ in custom code · Issue #93865 · python/cpython · GitHub might be discussed. Recently, the __signature__ attribute was documented (see this thread). Apparently, some projects (see PyO3 issues, mentioned in the pr) are using the other attribute too. Should we support that officially? If not, what prevents us from this? Clearly, it could be useful to add more introspection capabilities for external CPython extensions even in its current shape.

Edit: It turns that discussion in this thread was more not about the referenced pr, but about some new public interface that could describe function signatures in extension modules. Here is a concrete proposal (PEP draft): Draft PEP: Signatures for extension modules by skirpichev · Pull Request #2 · skirpichev/peps · GitHub, that uses the “signature line” from PEP 7. Discussion thread was retitled accordingly.

1 Like

I’m not convinced we should add this:

  • __text_signature__ is undocumented. I see it as an internal implementation detail used to get information from a C structure to inspect.
  • If it is an implementation detail, we only support it in the standard library. There might be code using it, and we should try to not break that code unnecessarily, but without docs and tests it’s hard to support a feature.
  • Since third-party code shouldn’t really use __text_signature__, and we generally don’t have type annotations in the stdlib, I don’t think we should extend __text_signature__ to support annotations.

IMO, the proper way to go about this is to document and test __text_signature__ as public API – and do that either first, or at the same time as extending it.

2 Likes

Below my arguments on why we might want to decide first on fate of Support annotations in signature strings. · Issue #81677 · python/cpython · GitHub (which addressed in my pr) and then on __text_signature__ in custom code · Issue #93865 · python/cpython · GitHub .

Well, perhaps the reason for this - is exactly one the referenced pr is trying to solve :slight_smile: IIUIC, there are no other (than __text_signature__) mechanisms, that could add such introspection capability for C code. And, in turn, the given mechanism lacks this feature… We have support for type annotations of arguments, but not for the returned value.

  1. I think the capability to add such annotations for C-coded functions could be helpful already in the stdlib. For instance, the math module, probably, was born first to provide math.h-wrappers, but now some functions return integers, not floats. I doubt it’s easy to infer type from arguments in most cases.
  1. Given above, should we expose the current format of the __text_signature__? It clearly lacks one basic feature.

If __text_signature__ was a new feature, IMO it would definitely need a PEP.
If we’re just exposing what already works for CPython, maybe not – but if we’re also changing it in the same release, I’d say writing a PEP is the best way to go.

The feature needs docs anyway, and the PEP can be written so that parts of it can be copied directly to the docs.

Ok. Rough idea: document the current state of art with a minor extension for return type annotations. (This will not address more complex issue Signatures, a call to action)

One hard part with writing a PEP, as I’ve learned — it does require a sponsor. Will you sponsor this?

1 Like

You’ll still need to think about the more complex issue, to make sure it can be solved in the future. Once __text_signature__ is documented and public, we shouldn’t make backwards-incompatible changes.

Yes. Please send me the draft before posting it.

2 Likes

I feel like I’m starting from the same premises as Petr but coming to the opposite conclusion.

__text_signature__ is a way we smuggle a signature object from argument clinic into inspect via auto-generated C source code, but the official way to do that is through __signature__.

If we need to smuggle more information through to __signature__ objects, we can just do it, because it’s an internal implementation detail.

If we want to make it public, we need to explain why the existing public supported method of doing exactly the same thing is insufficient, and argue for/against deprecating it. We don’t need multiple obvious ways to do the same thing if there’s no reason for it.

What you’re proposing a PEP for would be essentially an initializer for __signature__, so please frame it as a new feature. If we can safely reuse an internal name for a new public feature, great, we will. But it’s not “just” making a private thing public - the private thing was never thoroughly designed, which is why it was always private, so the PEP has to be the thorough design.

This particular one is also probably going to have to account for most type checkers not loading extension modules at all, so why do we need to provide better information for them in a way they don’t/can’t(?)/won’t(?) use? What’s the real problem that needs to be solved here? (Hint: CPython having private APIs isn’t the real problem :wink: )

Finding a sponsor for a decent idea shouldn’t be the hard part. It should only be hard for unpopular/ill-defined ideas (which is the point, otherwise the Steering Council would be bombarded with ill-thought out ideas regularly and “officially”).

The hard part is convincing people (like me) that we actually need to support a new feature here :wink:

2 Likes

Note that there are two private API bits in the current setup: the __text_signature__ attribute itself and the way the doc string should be formatted to get the right information into __text_signature__.

I use the latter in my own native extensions to get better introspection for native functions and methods, but technically this is a CPython private implementation detail.

1 Like

IIUIC, “the official way” doesn’t work for C extensions. I.e. we can’t simply add this at runtime:

>>> import gmpy2
>>> gmpy2.sin.__signature__ = 123
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'builtin_function_or_method' object has no attribute '__signature__'

This attribute could be created using C API, just like __text_signature__ from the doc string. But I would prefer if such parsing, creating Signature objects, etc — happens in the pure-Python code of the inspect module…

This sounds like an instance of chicken and egg problem. There is no API (only a private one) to publish signatures for extension modules — hence we have above situation.

I would like to use current API (with minor changes, see pr) in my extensions for same reasons as Ronald.

Sure, I assume that was the case:)

1 Like

I think it’s worth considering new approaches.

For example, maybe there’s a way to add a module-level function (through a slot?) that is called to request __signature__ for a particular name? And we likely could do with some new APIs to construct a Signature object from native code (or maybe the constructor is simple enough that PyObject_Call is good enough). That way a module can store the info however it likes and constitute it into an object when requested. It’s much safer to standardise and later version an API like this than a text-based serialization format.

We can update builtin_function_or_method to allow setting a value for __signature__. Just because it can’t be done today from Python code doesn’t mean we can’t change it if that’s what is required. Or more likely we could make accessing __signature__ actually go off and call the module’s (hypothetical) “get_signature” function the first time and then cache it (though potentially it makes more sense to not cache it - I’m not sure what the access patterns look like for these objects).

As someone who maintained code to scrape extension modules and generate signatures that was shipped in a major product, there’s definitely a way for type checkers to do it today if they want :wink: What they’ve opted for is to look for a .pyi file, and to be honest, I agree with them. Writing a .pyi file is much easier than trying to expose the information directly from the method. And so perhaps another alternative is teaching inspect to look for a .pyi file for native modules and read from that instead?

Either way, the real downsides of loading an extension module will still exist (executes arbitrary code being the main one; slow, and needing runtime dependencies configured are the others).

The fundamental question is: what scenario are we solving for? I’d argue that “__text_signature__ doesn’t work” and even “__signature__ doesn’t work” aren’t high level enough - “unable to see code completion information for native modules in Jupyter” or “unable to type check native modules in Mypy” are the kind of thing I’m thinking of.

1 Like

Extending builtin_function_or_method – and making an extensible version of PyMethodDef – is in the cards. There’s a lot of fiddly details and backwards compatibility issues, but I think it can be done.

I’ll note that Max Bernstein has been working on exposing the C signature”, which would allow passing unboxed values through the Python wrapper directly to the underlying C function. The work is for PyPy, but any optimizing interpreter will want this eventually. Currently he’s using a trick to smuggle extra data in PyMethodDef.
We might not want to add that right now, but we should think of it as a possible future improvement.

2 Likes

It’s probably worth mentioning here that PyO3 also formats doc-strings in a __text_signature__ compatible way. We’re well aware we’re relying on a private feature that just happens to meet our needs. We’d gladly take any improvements here.

Currently we do recommend that users wrapping Rust code using PyO3 write their own .pyi files. In combination with __text_signature__ and mypy.stubtest there is a crude way to “test” these are up-to-date.

I think the ideal in PyO3 would be to have an easy way to generate .pyi files from the extension module, probably as part of the development or packaging process. At the moment we’re looking at ways we can do that in our Rust layer, but if support for __signature__ becomes sufficient to do this we can drop whatever custom logic we come up with.

2 Likes

Ok, here is a draft in my fork of peps repo: PEP 743: Signatures for extension modules by skirpichev · Pull Request #2 · skirpichev/peps · GitHub

This proposal instead suggests using the __signature__ attribute, the __text_signature__ will be deprecated (or we could just remove it?). That is merely a cosmetic change, however. More important difference is using a different endmarker (\n\n vs --\n\n) to separate a “signature line” and the rest of docstring. In short, this proposal formalizes a “signature line” notion in the PEP 7.

@encukou, let me know if you would like to sponsor this PEP or be a coauthor. If you agree, should I first finish implementation fully before posting a pr against the peps repo?

In the proposed version, the __signature__ will be implemented as a managed attribute, that parses text signature from the docstring and uses inspect._signature_fromstr() to construct the Signature object.

That opportunity was mentioned in the PEP as an alternative. The downside I see here is that it mixes support for optional typing hints (generally, we don’t have type annonations in the stdlib) and more basic support for introspection signatures in extension modules, that already used in the CPython stdlib e.g. in the help() builtin.

This sounds for me more as a AC alternative/public variant of.

I would appreciate you opinion on the proposed PEP.

2 Likes

Not in this current form, sorry. IMO, it does not “document the current state of art with a minor extension for return type annotations”:

  • It proposes a lot of changes (see deprecations in Backwards Compatibility)
  • It doesn’t document the behavior – it says that signatures are parsed by the (private, undocumented) inspect._signature_fromstr. So, the exact process of getting from parameter_list to a signature would still remain an implementation detail.

I can revert the pep draft to the old version, that does exactly that, but first lets see if the current one could be improved:

All changes are internal to CPython. Public interfaces aren’t affected. But it seems that some external projects already use the __text_signature__ attribute, so I think we may mention such changes and keep that interface for a while.

I can expand that part of the specification. Will PEP make sense for you in that case?

Benefits of this version: no new special attributes, only the __signature__, which is already documented. The real content is the format of a “signature line” of docstring.

Edit: draft was expanded to describe proposed algorithm for _signature_fromstr() and mention that the PEP shouldn’t break existing public interfaces.

It looks more and more like a bad idea: the PEP makes it clear that inspect is guessing as it tries its best to build __signature__ using limited information. By turning this into a standard, we’d lose a chance to improve that information.
We can make this PEP the best it can be, but I think it’ll get rejected.
Thanks for writing it down – it makes the situation clearer, which is one of the reasons for PEPs.

That sounds good, but might need to be tied to the function: builtin methods don’t always have a direct reference to their module. (The current mechanism retrieves the module from sys.modules by name, which works fine in practice, but I don’t think it’s something to standardize.)

It’s one more case for extending PyMethodDef, so the text signature doesn’t need to be smuggled in the docstring.

Asking people to re-implement the _signature_fromstr algorithm themselves might be a good thing, especially if we look for ways to make it easier for them. For example, inspect should take care of deriving bound method signatures from unbound ones.

Then we’d also want teach Argument Clinic to write .pyi files. Or use them as input. Neither is a weekend project, but might be good in the long run.

Not sure what exactly do you mean by “guessing” and “limited information”. The proposed source format for the signature in the docstring — includes all relevant information, just as pure-Python function/method definition. What’s missing?

Ok, perhaps it doesn’t make sense. I still appreciate feedback from others, but meanwhile I’ll close my pr. Probably, Support annotations in signature strings. · Issue #81677 · python/cpython · GitHub could be closed as well.

Sorry to have been a bit late to reply to this. I do really want something like this for PyO3, I think it would go a long way towards improving introspection of native modules. At the moment we also use __text_signature__, mirroring Argument Clinic, for better or for worse.

I think @encukou is correct that continuing to ask inspect to analyse a text-based format is an insufficient halfway solution. I had hoped that it might be accepted as an incremental improvement (perfect is the enemy of the good, and all that), but there are good reasons why this existing mechanism is troublesome. One of the hardest jobs which I think inspect already has when analysing these __text_signature__ docstrings is figuring out how to turn the text representation of the docstring into real Python objects.

I think someone (maybe @iritkatriel?) was at one point suggesting a real Signature class for native functions, and probably this is the better solution for both Argument Clinic and PyO3. If the idea here has fallen short, I’ll hope we see that in the future.

3 Likes

FWIW, this may also be useful for Cython-generated extension classes (cc @scoder on this).

In PyArrow, we ended up splitting some C extension classes in two: a base C extension class with the required C-level (rather, C++-level) functionality; and a derived Python class that can hold proper docstrings and signatures. It makes the code less easy to maintain (here is an example: see how the class docstring and constructor signature are defined in the Python subclass) but it produces much better REPL and introspection for users.

Needless to say, we would welcome a solution that provides said introspection benefits without the annoyance of splitting classes in two.

2 Likes

I would appreciate if you could point to a related discussion.

Do you think that the proposed PEP doesn’t make sense and will be rejected?

BTW, I’m trying to do a complete implementation and it seems that the current “state of art” is more weird: the existing, documented __signature__ attribute actually also support __text_signature__-like string format (see issue Incorrect description of the __signature__ attribute in docs · Issue #115937 · python/cpython · GitHub).