Why treat docstrings as string literals?

jagerber · December 16, 2024, 4:19am

I’m following Please don't break invalid escape sequences. The biggest downside I see to invalid escape sequences raising SyntaxErrors is that docstrings can now be broken and cause exceptions to be raised at runtime. This is because docstrings are simply multi-line string literals that appear as the first statement in a function or class declaration. I can imagine many users writing docstrings including backslashes and hitting this exception. They can easily get around it by making the docstring a raw string literal. In fact, it may become common practice to always make docstrings raw string literals.

My questions:

What are the reasons to NOT make docstrings raw string literals?
Why is it useful to have docstrings be python string literals at all? Why not just raw strings?

The only argument I see for either of these is that the docstring is what is returned by the python help() function. But this is not very compelling to me since I never actively call help(). I do sometimes peek at docstrings using jupyterlab or pycharm functionality. I’m not sure if that uses the python runtime docstring or not.

I happen to work on a codebase that has done some crazy stuff modifying the __doc__ attribute of various python objects at running including using string formatting etc. I understand that because the docstring is a python string literal it is possible to do this, however, I’ve always considered it a really bad practice and I don’t know how this usage affects different tools.

Not trying to be antagonistic or even propose any changes. Just curious to know about designed and/or actual usages. Thanks.

jamestwebber · December 16, 2024, 4:31am

I tend to agree–a lot of tools don’t execute the module just to provide documentation, and so runtime modifications of the docstring won’t show up. In the past I’ve tried to be clever with re-use of docstring templates but it never works out–there’s always some situation where it’s too much of a pain.

That said, I think this is going to be a historical discussion rather than something that can be changed. Any kind of string can be used as a docstring, and it can’t be changed now. We can’t retroactively redefine docstrings as raw, that would be very confusing syntax (“the quote rules are this, except for this one context when none of them matter”) and break the type of code you describe (even if it can already be prone to issues, it works now).

I don’t know how this is different from the first question. Are you asking why don’t people always use r""" for their docstrings? Likely they just didn’t think about it and it usually isn’t a problem.

jagerber · December 16, 2024, 4:34am

Yeah it’s not totally clear how this is different. I think I’m asking something like:

Why are docstrings python objects at all? Why can’t they just be bits of text in the source code? I guess this is more like what comments are?

jamestwebber · December 16, 2024, 4:38am

Yeah that’s what I was getting at by “historical discussion”. Sure, they could be comments instead, that’d be different. For examples, see Comments - The Rust Reference in Rust, and Docments – fastcore for a similar Python implementation, and a recent discussion about this.

But all of that stuff happened fairly recently, compared to 24 years ago when the convention of docstrings was formalized^[1].

maybe it was formalized even earlier than that, but that’s the PEP ↩︎

Rosuav · December 16, 2024, 4:40am

Not sure what you mean by “runtime docstring” but it’s all the same docstring, so, yes, you are using it.

jagerber · December 16, 2024, 4:46am

Would you agree or disagree with the statement: “(If we had a time machine) It would be better if docstrings were multiline comment rather than python object string literals?” From what I can tell I would agree with that statement, but I also obviously realize that ship has sailed > 20 years ago and there’s nothing we can do about it now. Either way, it’s still helpful now for me to make sure I have a solid understanding of the current state of things. For example it gives me more confidence/motivation/ammo to refactor the weird __doc__ manipulation going on in that package I mentioned.

Consider this “lovely” code:

def foo():
    """
    This is the original docstring
    """
    pass

help(foo)

foo.__doc__ = "This is a modified docstring"

help(foo)

which outputs something like




Help on function foo in module __main__:

foo()
    This is the original docstring

Help on function foo in module __main__:

foo()
    This is a modified docstring

Rosuav · December 16, 2024, 4:53am

Consider this equally “lovely” code:

def inner(*a, **kw):
    # no docstring here
    return func(*a, **kw)

def func(spam, ham):
    """Spaminate some ham"""

inner.__doc__ = func.__doc__

This (and a bunch else) is done by wrapping decorators all over the place. So if any tool DOESN’T respect the assigned docstring, it’s going to have problems. In this simple example, you could probably ignore the decorator and use the docstring from the undecorated and unwrapped function, but you can have a decorator that adds an argument and also appends to the docstring to explain that argument, and then you’d need to use the run-time information for reliability.

Sure, there are ways around this, but I fully expect that the “runtime docstring” (or as most people call it, “the docstring”) is recognized by basically all tools that have access to it.

jagerber · December 16, 2024, 4:59am

I see. Wrapping functions is a good example.

For what it’s worth, on the example I gave above, Pycharm code inspection shows me the original docstring that it presumably found from static analysis of the source code. It does not give me the new docstring that results from running the script/module.

But that said, I’ve also had issues with Pycharm code inspection on wrapped functions. But it is hard to know what to expect here. For example, you could imagine injecting user input into docstrings at runtime. There’s of course no way any tool could predict what the docstring would be in that case.

jamestwebber · December 16, 2024, 5:07am

Maybe? If I had a time machine, fixing Python syntax would be very low on my list of priorities.

That said I’ve come to really like Rust’s doc comment style, and if that if Python were being designed today^[1] it might adopt something similar. The tools that work with docstrings aren’t necessarily written in Python and shouldn’t be tied to it that closely.

After playing around a bit, I think I remember what I was trying to do^[2] that didn’t work: with a class you can set the __doc__ with an f-string, which works as you expect at runtime, but an IDE (e.g. VScode, I believe PyCharm) doesn’t execute the code so it doesn’t know this has happened. In general I don’t think IDEs would support those kind of docstring shenanigans because they don’t want to run your code just to pull out that info.

Again, not something I’m in the habit of doing, and it makes sense why it doesn’t work, but it’s the kind of thing that people might expect to work.

I think this is a more useful framing than “we have a time machine” because it acknowledges all the time we’ve spent getting better at this stuff ↩︎
years ago ↩︎

Rosuav · December 16, 2024, 5:17am

Ah okay. Using an f-string as a docstring is definitely on the weirder side of things, so I’m not surprised that some tools can’t handle them I don’t think it’s actually even supported by the language, although I would have to delve deeper to be sure.

jagerber · December 16, 2024, 5:22am

def foo():
    """
    This is the original docstring
    """
    pass

help(foo)

foo.__doc__ = f"This is a modified f-docstring {42**2=}"

help(foo)

gives

Help on function foo in module __main__:

foo()
    This is the original docstring

Help on function foo in module __main__:

foo()
    This is a modified f-docstring 42**2=1764

Rosuav · December 16, 2024, 5:49am

As attribute modification, sure. But that’s not a docstring syntactically, that’s just setting the attribute. Obviously you can do anything you like that way.

But, I would generally expect that all tools that have access to the doc attribute will use it. Others won’t, but you don’t have to worry about edge cases all the time - some tools operate under special restrictions (like “don’t run the code, just statically analyse it”) and accept the consequences (like “dynamic docstrings won’t be detected”).

petercordia · December 16, 2024, 7:26am

I’ve tried (unsuccessfully) to make stuff like this work too.
One use case was where I had a function factory, and it would have been both nice and intuitive if I could have put the function creation parameters in the docstring using f-strings. f-strings work everywhere else, so I found it confusing they didn’t work in this context.
Eventually I concluded it wasn’t viable.

jamestwebber · December 16, 2024, 4:04pm

For what it’s worth it isn’t supported in the normal context–if the string literal in the “docstring spot” is an f-string, it won’t be picked up and __doc__ will be None.

Of course you can manually set the attribute however you like, and if winds up as a str then help() will work (other stuff is ignored, it seems).

I’d view this as a very minor inconsistency or ambiguity in the syntax, in that the glossary says a docstring is a string literal as the first expression, and f-strings are string literals. But f-strings are “special” literals in that they actually involve some code execution.

edit: this is in fact documented in the f-string section as an exception, and it’s reasonable that it works like this, but I still think the whole thing is a bit unintuitive.

jagerber · December 18, 2024, 11:54pm

In the invalid escape sequences thread Umar brought up an important example in this post. There it is pointed out that IDEs respect escape characters appearing in docstrings. So this is a concrete example of a tool relying on the fact that docstrings are python objects, string literals.

tstefan · December 19, 2024, 11:53am

In a prior discussion, it was pointed out (and rightfully so) that f-strings are expressions that are disguised as literals. The following code is currently completely valid:

def abs(x):
    f'''Returns {(y:=x if x>0 else -x)}, the absolute value of {x}'''
    return y

The doc string, i.e., __doc__ is set when the function is defined. At that point in time, x is not known, so the f-string cannot be evaluated.