PEP 727 and 746 alternative: a better way to add metadata

Hello python world!

I recently got interested in adding documentation and metadata to e.g. function arguments for automatic digestion in a UI. Right now I rely on a mixture of docstring parsing and type annotations, but this often feels clunky and is limited, in part due to the ill defined nature of docstring formats (e.g. there is no easy and consistent way of adding metadata on how the UI should handle a certain parameter unless I cook up something myself). Looking for alternatives, I discovered the discussions on PEP 727 and PEP 747 - none of which I particularly liked.

I believe I can suggest a better, more powerful, AND more pythonic alternative and wanted to hear what the community thinks.

class A:
    my_int: int
        """This is my favorite int"""

    my_other_int: int
        __min__=100
        __doc__="""An int with additional metadata"""
        # comments can be placed as normal
        ui_categories=["all-important-ints", "not-so-important"]

def my_func(
    a: A
        """An instance of A to wrangle"""
    x: int = 1
        """Effects are guaranteed to be surprising!"""
        tags=["sometimes-deterministic"]
) -> None:
    """This function will do important work - someday."""
    pass

I present to you: indentation :slight_smile:
From my point of view, this solves several problems with the other approaches:

  1. Not overly wordy. Putting Annotated and brackets everywhere just to add docstrings where they arguably already belong seems clumsy and is not nice on the eyes.

  2. Clearly associates the metadata with the member/argument. As opposed to PEP 224 (attribute docstrings), there is no risk of confusing what the metadata belongs to. My understanding is that that was the reason for rejecting said PEP.

  3. Pythonic. Indentation is one of the core pillars of python to associate one structural element with another. It is already being done for function and class docstrings AND all their contents, and it seems logical to extend it to attributes.

  4. Metadata is associated with the member/argument themselves, and not their type hints. I only saw a few comments on that in the linked PEP discussions, but I find it important to make a distinction here. I want to document the member/argument, not its type hint.

  5. Easy to expand. For example, special attributes like __min__ could be defined for numeric types that can then be digested in a standardized way. In addition, type hints and default values could be handled in the same way, although that might be API breaking.

  6. Tooling friendly (?)

Regarding the implementation, I imagine that members/attributes would have dedicated fields for standardized metadata (e.g. __doc__ or __min__), possibly encapsulated in a __metadata__ dict. The first triple-quote string is assigned to the __doc__ entry, all others have to be defined as keyword arguments. Assigning explicitly to __metadata__ would also be possible, but I see no benefit in that other than visual clutter.

What do you think?
:snake:

1 Like

A deal-breaker is that this would require new syntax constructs.

Currently most (all?) blocks start with a colon. Not to mention, whitespace inside a parameter list is supposed to be insignificant.

2 Likes

I like it, but I don’t like what happened with the commas.
is this valid

def my_func(
    a: A
        """An instance of A to wrangle"""
    x: int = 1):

but this not

def my_func(
    a: A
    x: int = 1):

?
this is ugly because there’s a comma between 2 things that are strongly associated:

def my_func(
    a: A,
        """An instance of A to wrangle"""
    x: int = 1):

this is also ugly, but better imo:

def my_func(
    a: A
        """An instance of A to wrangle"""
    , x: int = 1):

I’m not sure how you would solve these issues.

Maybe

def my_func(
    a: A,
        """An instance of A to wrangle""";
    x: int = 1):

to more strongly separate arguments form preceding docstrings? That’s also a bit ugly.

1 Like

Or we start using colons, as mentioned by @InSync:

class A:
    my_int: int:
        """This is my favorite int"""

    my_other_int: int:
        """An int with additional metadata"""
        __min__ = 100
        ui_categories = ["all-important-ints", "not-so-important"]

def my_func(
    a: A:
        """An instance of A to wrangle"""
    x: int = 1:
        """Effects are guaranteed to be surprising!"""
        tags=["sometimes-deterministic"]
) -> None:
    """This function will do important work - someday."""
    pass

If there’s a colon, the block body must not be empty. In case of a block, comma is forbidden. It’s either colon or comma, not both.

But yeah, that requires changes to the syntax, which is much, much harder to achieve than something like PEP 727.

I like the proposal anyway! It feels Pythonic indeed :slightly_smiling_face:

1 Like

Hmm, that wouldn’t work for returned values though?

def hello() -> str:
    """Function docstring. How do I provide one for the returned value??"""
1 Like

Yeah, the commas are a bit of an issue. I realized it when I finished the post, but thought I’d wait to see what solutions would come up :zipper_mouth_face:

Keeping the commas would be nice for backwards consistency, but it’s easy to argue that a comma isn’t necessary when you have line breaks.

Adding colons could also work nicely and be consistent with other block definitions, but in this case I’d suggest a slightly different approach to avoid double colons as in a: int: ...:

def my_func(
    a:
        """Argument a"""
        __type__=int
        __default__=0
        tags=[...]
) -> None:
    pass

This would be an alternative way for defining type hints and defaults. Whether or not this would require a comma for additional arguments is up for discussion.

Yeah, return values would be the outlier. I think this was also discussed in the 747 proposal, and I believe the solution was to keep the return value documentation part of the function docstring. Since the return value will generally be assigned to a variable which in turn again can be assigned metadata again, I think this isn’t too alien.

I would like to add more support for this, because it could help annotate dataclasses in a natural manner, which is currently a weakness of Python type annotations, and which PEP 727 and PEP 747 don’t address either.

Something like this would allow annotations like

@dataclass
class MyClass
    x = field(default_factory=list)
        __init_type__ = list[str] | str
        __attr_type__ = list[str]
        __default__ = []

where currently the best annotation I can manage (for the purpose of PyLance) is x: list[str] | str = .... (Literally ... because Pylance apparently can’t resolve the field default.)