Stubs and docstrings: official recommendations?

Lets make it short. In docs-oriented static analysis tools, should stub docstrings override source docstrings?

I would love to see some official recommendation about this in Writing and Maintaining Stub Files — typing documentation.

A user of Griffe uses a framework to build compiled modules. This framework adds __doc__ attributes to functions. These docstrings are made of a single line, the original signature of the function (in whatever language it was written in). The user then writes (or generates) stubs to change these docstrings into something more meaningful. For their use-case, it would be better to override the source docstring with the stub docstring.

But I can imagine an other case, where the opposite happens. The compiled object gets a proper docstring, and the generated stubs get useless ones. In this case, users might want to preserve source docstring and discard stub docstrings.

Obviously, each tool (the framework to compile objects, and the stubs generator) should be configurable, so that users can choose whether docstrings should be added or not in the source or in the stubs, making the question above moot. But in the meantime… what do you think :smile:? Maybe I should simply make Griffe configurable too? But what would be the default behavior?

Just my 2p, but tl;dr: do not override - append.

Neither frameworks, nor third party stubs publishers should be deleting or throwing away any original documentation. That’s the contract between the code authors and the users.
If a third party adds code, compiles code, or repackages it, whether by a tool or by hand, they should feel free to add documentation for that code (or what their tool has done with it).

But if they throw docs away, they should make their own fork, or project with the original as a dependency, and deal with their own user’s expectations themseves instead of creating hot air and nuisance issues for the original project.

If it’s a choice between one tool’s auto generated boilerplate vs another, then yes please include lots of configuration options. Including the option to turn it off altogether.

I would argue that an stub-file installed in a .venv or configured in a typings folder is an explicit indication that the user wants to make use of that stub.

Most if not all type checkers and LSP servers have an ordered list of search paths they use to locate modules. (15 paths is quite normal in my experience)

To me it only makes sense to use first the first hit in that list - rather than try to combine multiple - not just for performance but also to avoid confusion.

If appending docstring - where should it stop ?
2 instances , all instances found ?

For Griffe - you could add this as an explicit opt-in , but would not make sense as a default for me - and may expose me to unexpected changes based on changes in another package.

4 Likes

Thanks!

Yeah the best option here is to make Griffe configurable and retain the previous behavior as default, to maintain backward compatibility :slight_smile:

I would argue that because stubs are for type information that most users would find it surprising for docstrings to change for having them. Stubs are also often 3rd party, and only intended for type information. a docstring in stubs has no prescribed semantic meaning and might be used by those maintaining the stubs to document why something was typed a specific way, especially with protocols over concrete types in some cases.

Overriding library documentation or appending would be be incorrect here.

Users would only be surprised if the docstrings got worse after they installed the stubs, and most will recognise that they did something that led to it. A good set of 3rd party stubs will have the same or better docstrings in an ecosystem where the tools are going to use them (and a tool that thinks “oh, this stub has no docstring, I’ll ignore the one in the actual sources” is a tool that has a bug).

Stubs are also very useful for extension modules, and since editors would much prefer to read a docstring from a stub than from an extension module, it seems entirely reasonable for them to use the stubs in preference to the actual module. That’s a single consistent rule, rather than having to make multiple rules for subtly different contexts.

1 Like

I’ll quote the linked guide:

Consider the intended purpose of your stubs when deciding whether to include docstrings in your project’s stubs.

This sentence tells me stubs are not only for type information. This could be said more explicitly, maybe.

1 Like

The linked guide has several things that make stubs worse than they ought to be, such as suggesting people intentionally include Any when the concrete types are known and expressible just so that people don’t have to check for None (opening them up to more error prone code)

I wouldn’t call the existing guide a good reference for stubs, and some of the guidance certainly doesn’t reflect anything actually agreed upon specification wise.

There has been no process pep for defining what a docstring in a stub is intended to mean, or how tools should resolve it. I would argue that including docstrings at all is a mistake that only opens projects up to undefined behavior, but it’s not forbidden. If you find a use for it, you should probably open up a proposal to have your use encoded as valid as to not be broken in the future.

2 Likes

I’d love to agree with this, and would be fully in favor of actually defining the rules here, but when someone is asking about official recommendations, I’m going to stick to that there isn’t a defined meaning for docstrings in stubs currently. Everything here right now is tool-defined behavior.

I’d go a different route on what to propose:

docstrings in stubs should be considered secondary, and only used when no docstring exists in the module the stub is used for by default.

For extension modules like you’ve described, stubs work to provide docs, for 3rd party stubs, there won’t ever be stubs overriding things the original authors have provided intentionally.

This is against many many Python traditions, which I accept is the standard position for most typing fans, and it’ll probably pass, but I still think it’s wrong.

The user gets to override whatever they like, and installing stubs is how they override “things the original authors have provided”. Original authors don’t get to override what users choose to do with the code. (They can, of course, choose not to help those users, but they can’t prevent it.)

2 Likes

Okay, I actually agree with the user choice argument, but the problem I see with it is that docstrings that aren’t generated by extension module use tend to be out of sync in stubs, constantly, if present at all, and end up missing key information.

If docstrings were intentionally supported for stubs from the start, then it’s easy to make that argument and say any stub with a docstring needs to be as good as the original or better or the stub is doing a disservice to the user. But we’re sort of in a situation where docstrings are being pulled from stubs strictly because tools have chosen to currently, not because of any actual documented purpose.

This is a problem I have with a lot of tool-defined behavior scope creep here. If others don’t see that as a problem or are okay with the impacts of saying “lets define the behavior and prefer docs in stubs” namely, that a stub that breaks docstrings after this can be blamed even though docstrings weren’t defined for use, then that’s also an okay outcome with me. It’s not my preference because I know that there exist stubs out there that aren’t doing great things, and that the current stub guidance isn’t great to start with, but it would have been my preference to prefer docs from stubs had that been documented as intended or forbidden for use up until now.

1 Like

Thanks @mikeshardmind. I think it’s a helpful reminder :slightly_smiling_face:

I will consider doing that, thanks for the suggestion!

That is only incorrect if there would be a single omnipotent implementation of the Python language,

however that is not actually the case.

Also see the section on docstrings in stub-only packages.
Writing and Maintaining Stub Files — typing documentation

In practice, there should never be docstrings simultaneously in source and stubs. And therefore, priority does not really matter. We could still define a priority, but arguments for one or the other can only be based on rather nuanced edge cases.

Stub docstrings should only be used if source docstrings are not available, e.g. for extension modules. I would consider “improving docstrings through stubs” an anti-pattern. Documentation should live in one place and ideally close to the code to make keeping it up-to-date it easier. So if somebody wants to improve documentation they should contribute it directly to the source, not as third party stub.

Note: Third party stubs are justified because we don’t want to force every author to use typing. Not typing a library is acceptable, but I’d claim that not caring to document is a bad practice, for which we don’t have to define a third party workaround.

6 Likes

I think you mean “in theory”, because in practice, everything that can ever happen will happen. We have many millions of users, which means even 0.01% problems have hundreds of instances. Helping people decide whether their tool should prefer the source or the stub is a valid question in practice, even if in theory it doesn’t matter.

While this is true, it ignores additional challenges such as localisation. Trust me, you do not want seventeen different languages being stored in a single docstring :wink:

I agree that not documenting is bad practice, however, there’s a totally legitimate distinction between documentation shown on a web page vs. documentation shown in an IDE/editor vs. documentation shown when you do help(...).

The standard library optimises docstrings for help(...), and separates documentation into ReST files. Some tools have processed those latter files to be able to display more detailed documentation directly to users (and not just for Python, and not just for the stdlib), which is totally okay for them to do.

We also, by definition, can’t “define a third party workaround”. Those exist, those will continue to exist, and we can’t (and don’t want to) stop them. It’s called user choice. What we can do is encourage a bit of consistency between those workarounds, and when enough consistency exists, it gets even easier to formalise into something that tools and developers can both rely on - not that it’s hard to say “when docstrings exist both in stubs and source, you can use the stub’s if you think your users would prefer it”, which is all I’m suggesting here.

3 Likes

Maybe I’m misreading this, but I read the fist part as we should give direction, where as the second part suggest we should leave it up to the individual developers.

Each is a valid position, but we have to decide. That decision should be based on anticipated use cases. If we find strong cases for either preferring stub or code, that should make it to a recommendation to steer tooling and docstring authors into a consistent direction. If we don’t find these cases, we should leave priority undefined, so that the community has the freedom to do what is helpful if they find use cases.

IMHO Localization and different kinds of documentation are not suitable use cases for stub docstrings. Instead, we’d have to develop dedicated concepts to improve in these area. Therefore, they should not affect a decision on stub docstring priority.

I personally haven’t seen strong arguments where a priority would be helpful, so I’m inclined to not define it.

1 Like

We can do both at once. For example: “we believe there may be valid reasons for custom type stubs to provide different/better docstrings than what are in the source of a library, and so tools are welcome to choose to display those instead of the source code”.

That provides enough direction for tools to choose to use stubs, rather than inventing their own convention, and for developers/authors to write into stubs, rather than having to learn and follow a tool’s convention, but it does not force either of them into doing what we say.

We don’t need a specification to let people just do things, or to hint at the kind of things we think will be okay.