WarningMessage is undocumented

csm10495 · March 22, 2024, 5:07pm

I respectfully disagree to both points. WarningMessage objects have a .source as well that isn’t included in showwarning()'s args. So the current documentation is missing that point (and is therefore wrong and misleading in my opinion).

We should be explicit where possible in documentation so that it’s simple for folks to read and understand. Adding links to get to the object itself (or a protocol of it) makes it easier. This particular example, as it is before this PR, is not simple/easy for at least me to understand as it originally was.

This change request, is attempting to document in a way to make the information more accessible for others.

steve.dower · March 22, 2024, 5:11pm

If the docs said “Each object in the list has exactly the attributes with the same names as the arguments to showwarning(), no more, no less.” then I’d agree, but that’s not what it says.^[1]

The implication is that you can’t rely on .source being available. It is effectively internal implementation, and so information about it should not be more accessible. If anything, we might need to make efforts to hide it better, if users are inclined to assume that it’s a guaranteed attribute simply because it’s there.

If there’s a reason it should be guaranteed API, then that’s your request, and what the discussion should be about.

And obviously we never write that sort of spec, because it’s very unidiomatic for Python. ↩︎

csm10495 · March 22, 2024, 5:23pm

It seems odd to me. So if it shouldn’t be documented, should it not be mentioned here in the docs for the C-API for exception handling?

From Exception Handling — Python 3.12.2 documentation

int PyErr_ResourceWarning(PyObject *source, Py_ssize_t stack_level, const char *format, ...)
Part of the Stable ABI since version 3.6.
Function similar to PyErr_WarnFormat(), but category is ResourceWarning and it passes source to warnings.WarningMessage.

To me its either document it in all documentation or don’t. I don’t think it makes sense to be considered a concrete thing in that doc, but not in warnings’ documentation. If it’s added we should link it there too. If not… shouldn’t it be removed from there?

Edit: I know it shouldn’t matter in this context, but the class is also called out and used by at least one big 3rd party package: pytest: How to capture warnings — pytest documentation

steve.dower · March 22, 2024, 5:34pm

Yeah, probably not. That documentation doesn’t make much sense - it doesn’t describe what the function does or when you should use it. It deserves updating.

And the fact that it takes source as a parameter does mean it ought to describe what it’s for. That doesn’t necessarily mean it has to be an explicit member of every warning, but it certainly ought to be explained better than the documentation you quoted shows.

(Surprising/unexplained things in documentation often means that documentation needs fixing, not that every other piece of code and documentation needs to be updated to match. One mistake in one place doesn’t always force us to double-down on the mistake.)

pf_moore · March 22, 2024, 6:01pm

I’m sorry, but I agree with @steve.dower here. It’s entirely acceptable for the return value of a function to contain discoverable, but undocumented and unsupported, aspects. That is fundamental to Python’s original design as a dynamic and duck-typed language.

Your request seems to be coming from a static typing perspective, which tends to have different conventions, being much more inclined to pin down interface details exactly. In this case, a protocol is exactly the right way to bridge the two worlds. Steve’s suggestion that the definition and protocol lives in the typeshed is how these things are normally handled, and I really don’t see why this case is special enough to need different treatment.

Having said that, I’ve never to my knowledge used warnings.catch_warnings, and I’m only an occasional user of static typing, so I don’t really care much except in a theoretical sense.

csm10495 · March 22, 2024, 11:22pm

It’s a bit more than a static typing perspective. The wording in the docs confused me (as a user). I guess I read it differently. It seems weird to me that we’re so protective of this type so not to document it.

People do help() / dir() on objects they get, search for the name, etc., then expect some form of easy documentation on it. That’s missing here for my case unfortunately.

Though really the conversation seems a bit like bikeshedding to me: Is there an actual worry here about documenting this specific type or is it more of an academic/theoretical discussion? Are we genuinely thinking that we could change/remove it without giving extra thought? What is the compelling reason besides a purist view of duck typing to not document this?

The class is already in use by pytest and probably others. Changing/removing pieces would break people, regardless of original intent at this point.

I believe we should document it. If we want it to be more flexible we can call it a protocol instead of an object in the docs, but to me it’s more/less the same idea at this point.

steve.dower · March 25, 2024, 3:22pm

I already (basically) said that my opposition is theoretical. I don’t care that much about this particular type, but I don’t want to set a precedent that all result types need to be strongly named, as I believe that is unpythonic (in many cases, such as this one).

If there’s a way to say “in general, result types follow a protocol, but this case was special enough to specify a named type”, and others are okay with that, go for it.

CAM-Gerlach · March 25, 2024, 11:36pm

I’d been carefully following along with this discussion, but didn’t have the time or energy to reply until now due to my cat’s health situation, sorry.

It seems the core implication here is that deliberately not explicitly documenting return types is a common pattern in the modern Python stdlib docs. However, I had trouble finding other examples of this, both from memory and spending a fair bit of time browsing for such—all of the many examples I found had such classes explicitly documented in the module using the standard Sphinx syntax. Could you point out some examples of common functions with similarly undocumented return types that follow this pattern? ^[1]

Overall, I can understand the theoretical motivation, but to me it seems purity on this point comes at the expense of the practical need for explicit, consistent, structured, and precise API reference documentation. And pragmatically, it is unclear to me what practical harm explicitly documenting the return type and its existing publicly-mentioned attributes (leaving source undocumented, at least for now ^[2]) would have when it comes to duck-typing compatibility—the only thing additional being “exposed” is the name of the class (which is already relied on regardless by pytest, typeshed/type checkers and other users). In turn, this would only make a difference if:

The old-named name was completely removed even as an alias, or the new class did not inherit from it, and
User code was, for whatever reason, relying on isinstance checks for that name or inheriting from it, and
Users would not have done so without seeing it explicitly documented in the documentation, as opposed to in typeshed/type checkers, via dir(warnings), type(warnings.catchwarnings()), etc.

ISTM that this seems a relatively narrow hypothetical scenario to be concerned about, particularly relative to the existing real-world usage (and breakage that would result anyway), the complexity of defining/documenting (and having existing downstream code and type annotations switch to using) a new Protocol/ABC, and the benefits of explicitly documenting the public attributes of the class using the standard, consistent structured mechanisms.

The only generally similar cases that immediately came to mind for me was file-like object and path-like object, but those do indeed have explicit class-based (ABC) definitions (the former under io, the latter as os.PathLike, in addition to linked glossary entries, as well as a strong, widely-used and practical motivation for defining them as protocols (as they are implemented by many different classes across the stdlib and third party packages, not to mention accepted by countless other APIs). ↩︎
properly documenting the public attributes but not source makes it more explicit, if anything, that source is an undocumented detail ↩︎

CAM-Gerlach · March 25, 2024, 11:45pm

Actually, after looking more detail at the actual implementation of the changes being proposed in the PR, I think I can better understand some of the concerns here, as it is very different from what I thought was being proposed. What I was suggesting here is explicitly documenting the class’ attributes, which are publicly referred to, but never explicitly described ^[1]). I was not necessarily suggesting documenting WarningMessage’s constructor or adding it to __all__, at least not as a first step (which is in fact what the linked PR does instead).

It is not uncommon in the stdlib for the canonical way to create a specific class be with one or more higher-level functions other than its constructor, and the actual underlying constructor be deliberately left undocumented as a lower-level implementation detail (which the newely documented class could explicitly mention, if desired). Furthermore, in the few other similar cases I looked at like that, the class was accordingly not added to __all__. Additionally, that would avoid explicitly requiring that we document source (which is not currently referenced in the docs, and thus not “public” in that sense) in order to document the class; that could be either explicitly left undocumented as an implementation detail, or that decision deferred to a separate discussion/PR if necessary.

the docs reference the parameters to showwarning, which is already quite “indirect” and requires multiple layers of navigation and translation, increasing cognitive load—but they are left mostly implicit there too ↩︎

csm10495 · March 25, 2024, 11:56pm

I’m definitely flexible on both source and __all__. Honestly I just wanted a search for that class WarningMessage to find something in the docs.

If how to do that was a bit more clear, and it has consensus, I’m happy to update the PR.

steve.dower · March 25, 2024, 11:57pm

I skimmed the results for searching “returns a” in the Doc/library directory and found that Event Loop — Python 3.12.2 documentation has a pretty good spread of examples. Certainly some inconsistencies as well!

But one that caught my eye where I appreciated the language was:

Returns a pair of (transport, protocol) , where transport conforms to the asyncio.SubprocessTransport base class and protocol is an object instantiated by the protocol_factory .

I like “conforms to” as opposed to the much more strict “inherits from” or “implements”, because conforming to an actual type is how you implement a protocol. The “object instantiated by the protocol_factory” makes more sense in context - protocol_factory is one of the other arguments - but given there’s no requirement that your protocol object inherit from a particular named type, how would you express this as a name?

(Apologies for the slightly awkward wording - I’m trying to use “named type” carefully, because that’s the particular concern I have. The result will always have a type, and a type(result), but that type does not have to have a particular name. It can simply be a value that is able to be used for whatever the caller intends. Again, this is a very powerful aspect of Python that gets lost when you force your program to fit into a nominal (name-based) typing system, as opposed to a structural (duck-typed) one.)

ericvsmith · March 26, 2024, 7:16am

Here’s an example: dataclasses.Field objects are documented as existing, but not how to construct one. A minimum list of attributes is given, with a note that others may exist but are not to be relied on. See https://docs.python.org/3.8/library/dataclasses.html#dataclasses.Field

CAM-Gerlach · March 26, 2024, 7:36am

Thanks—that’s actually a pretty good example of what I’m proposing here. As mentioned, I’ve definitely seen plenty of examples of the class being documented but not its constructor, or certain attributes not considered public, and that’s actually very close to what I’m suggesting as a starting point, just with attributes documented as such rather than as with generic unstructured bullets (unlike in that case where its at least arguably justifiable, it wouldn’t actually duplicate other documentation elsewhere at least not in much meaningful detail).

csm10495 · March 26, 2024, 3:32pm

Ah this makes sense. Basically I can remove it from __all__, remove the source reference, change the description from initializer to documented attributes, then add a note that others may exist but should be considered private, etc, and a note that they shouldn’t be created directly.

With those changes in mind, would this be ok to move back to the PR phase? (I’d like to be mostly squared away and approved at least in theory before going back to the PR)

CAM-Gerlach · March 26, 2024, 8:16pm

That’s what I’m proposing, yes, which is similar to how analogous return types are handled elsewhere in the docs as far as I’ve seen, outside of the cases where a number of (often rapidly evolving) types conform to the same base protocol (e.g. file-like, path-like and asyncio) or may be arbitrary types (e.g. user-supplied callbacks).

I realized I didn’t directly address this earlier, but in fact, a here a protocol is the only plausible choice for either tuple members, because the fundamental design and purpose of this method necessarily means that both methods return arbitrary classes conforming to the respective protocols, dynamically determined by the event loop class and the user-supplied protocol factory, respectively.

However, I do note that this situation involves:

A set of methods called on an object that itself conforms to a protocol rather than having one specific name
With dynamically determined, arbitrary classes
That could be built-in, third party or user-created
That implement well-established protocols already used many other places in asyncio and third party code
And that involve a large amount of change and evolution from version to version and in third party libraries

It seems to me that while an ideal example of a situation where using a protocol has great value (like file-like objects and path-like objects I mentioned above), it doesn’t at all resemble the present case, where we have:

A single module-level function
That returns a specific defined, concrete type
That conforms to no existing, established protocol
And has been unchanged for the past ≈15 years
And a number of existing users of the concrete type

And in any case, both of the former protocols have their stdlib concrete implementations publicly documented.

Its worth pointing out that of the 8 hits for your returns a in that file, besides the 4 for (transport, protocol), the other 4 are all (4 unique) concrete named classes, and all four are—just like this case—only really intended to be returned by other functions and not constructed directly, lack a documented constructor while documenting their attributes and methods, and contain notes to the effect that they should not normally be instantiated directly by user code—exactly as I’m proposing here. I do note that they were also added to __all__, but that can always wait for a possible followup and a bit broader survey of existing practices if others have concerns (since so far that seems a bit of a mixed bag as far as existing convention goes).

steve.dower · March 26, 2024, 8:50pm

I don’t think this is a fair list.

Multiple functions have already been referenced in regards to the result type
“Specific defined” (to me, at least) implies publicly defined, which is not the case or we wouldn’t be having this discussion. And our policy is that if it’s not publicly defined, it’s an implementation detail, which is deliberate because it allows us to change things in future releases. The fact of it being a single possible type now does not mean it will always be.
The protocol is defined and established in the documentation (“attributes match the arguments to …”), and by the other functions that (indirectly) interact with it.
Granted, but we don’t have an expiry date on API design. If you could show that there have been no critical logging-related issues in that time, I might agree that it’s very unlikely anything may arise, but log4j begs to differ
Existing users of an undocumented type influence our decision to make changes to it, not our decision to document it.

It’s easy to cherry-pick reasons why something fits the shape you want it to fit. We both just did it. My advantage is that I’m appealing to our established and largely unwritten policies, which I know about because I have the advantage of having been hanging around here longer. You can want to change them, which is fine, but not by stealth or by quietly setting new precedents - to change them, post a proposal in Core Development (presumably along the lines of “we currently default to not documenting return types unless the types are independently useful, but I think we should make all results named and documented types/protocols”) and expect it to get to the Steering Council for a decision.