PEP 813 - The Pretty Print Protocol

Honestly, this seems like an unnecessary footgun. I think it would better to prohibit not using tuples and instead require having None as the name when providing a positional argument - I think this falls under “explicit is better than implicit” and “refuse the temptation to guess”. It’s a slight downgrade in the egernoumics with the benefit of far more reliable semantics.

To be more precise with what I mean with footgun: places where a Generic value is passed as an argument might not properly test all possible argument types - if a user then happens to use a 2-tuple with the first argument being a string the output suddenly has a pretty dramatic change with no obvious reason.

Also, what happens if the yielded tuple isn’t a 2/3-tuple? Does it raise? Does it assume it’s a positional argument?

Also, with regards to the default argument. Currently it says “if it’s equal to”. Does this mean a is b or a == b? Or the combination of both used for sequence tests?

8 Likes

I have mixed feelings about this.

One one hand, convenient pretty printing is something that I would really like.

On the other hand what pprint does is rarely satisfactory for nested structures. I use pprint when I need concise output for inspection, but never for production output. For that I find json with indenting much cleaner option.

If this binds pretty printing to pprint, I would have little use for it.

Something that has both json-like indenting
and pprint-like wrapping would be closer to something that I would like to see taking the honourable place in builtins and special methods.

+++

4 Likes

A powerful and flexible approach would be to adopt Jupyter’s _repr_mimebundle_ standard, which returns a dictionary of representations keyed by MIME type. For example:

{
    "text/plain": "Some string repr",
    "text/html": "<div>Some HTML</div>",
    "image/png": ...
    # Additional custom MIME types as needed
}

This approach would have several benefits:

  • This approach is built on widely adopted Web standards (MIME types), ensuring compatibility and interoperability.
  • The frontend can dynamically select the most appropriate representation for the context - whether it’s a text console, a web notebook, or another environment.
  • Custom MIME types can be added to support specialized formats or future needs, making the system adaptable and future-proof.
  • Many Python libraries already use this pattern, and it is natively supported by Jupyter and other IDEs. Some libraries using older methods (e.g., _repr_html_, _repr_latex_), which are utilised by IPython to compose a MIME type like that returned by _repr_mimebundle_.
  • Beyond Python, this standard has been embraced by other language kernels.
7 Likes

The obvious solution is to treat the 2-tuple case, where the argument name is the empty string or None as positional. Thus, in Rich:

This wouldn’t solve the case where you want to show ("", (1, 2)) itself. It’s still a pair whose elements may be themselves tuples. Same for (None, 8). The same issue happens with the triplet with the defaults: ("a", 1, 2) can’t be distinguished with a=1 and an argument being ("a", 1, 2).

I would rather suggest that we always return a triplet (which is annoying but unambiguous) of the form (name, value) or (name, value, default) with name being possibly None.

To me, “pretty printing” really is essentially a debugging device, even more so than print().

And for me pretty-printing is more than just a debugging device. It can be really used for render whatever you want in a pretty format. It can be used as a richer alternative to __repr__ which is more for eval/exec purposes. While the purpose of “pretty printing” is “[to render] arbitrary Python data structures in a form which can be used as input to the interpreter” (this is the pprint docs), I think we should also consider the fact that pretty printing may be more than that. And this is where I would rather be able to use something different than a __repr__ for that. __repr__ can really be used for debugging wihle __pprint__ can be used for nice rendering. If you believe that this however the other way around, I won’t oppose the fact that we can’t return a string.

That’s pretty much just any callable taking a single argument. I don’t think we need both, so I believe that’s the difference between something like:

Yes. But I don’t think we should restrict ourselves to always require an instance of a pretty-printer. Otherwise the user needs to create a class with a pformat method just for the sake of holding a callable. If it’s for debugging only, I would really want a way to write print(x, pretty=prettyfn) where I define prettyfn as a regular function.


Maybe it wasn’t seen, but what about recursive pprint? how can we say “please also pprint the inner value itself” (or not)?

1 Like

I suppose it doesn’t have to be (name, value) for a positional argument. It could be (value,).

The !p format specifier for f-strings is one of the strongest parts of this PEP for me. I would be very disappointed if it was moved to the deferred ideas section.

I am, however, happy for t-string format specifier to be deffered.

Was the protocol __getnewargs_ex__considered? It already exists and returns an (args, kwargs)-tuple.

This would make the language easier to remember. It also convenienly allows you to define:

__pformat__ = __getnewargs_ex__.

It can also ignore default parameters (just don’t return them).
Or if rich’ protocol is considered superior, then maybe pickle/copy can also use it.

!p is a conversion specifier, and for f-strings I’m not considering deferring it (although I am considering deferring it for t-strings). Format specifiers are things like “<10s”. It’s format specifiers for f-strings that I will probably defer.

2 Likes

Classes can implement a new dunder method, __pprint__() which if present, […]

What are the cons to putting a __pprint__ on object that does the default thing?

The possible pro I’m thinking of is that nobody will ever have to worry about checking for AttributeError when accessing __pprint__

I like this idea, especially that passing pretty=True to print or using !p could pretty-format/print without any need for an import.

However, this part might make this something that I wouldn’t really use very often:

This also means that there’s no way to control the pretty printed format of built-in types like strings, dicts, lists, etc.

I really don’t like the format that pprint uses with [/] and {/} and the initial/final values all on the same line and only a one space indentation for data structures. I find it visually challenging to parse compared to the formatting that Black, ruff, and many other tools tend to use (with brackets/braces on their own line and 4 spaces of indentation and a trailing comma after the final value).

I appreciate that scope creep is undesirable, but I would find this a much more desirable feature if either the pretty-printing format changed to the most common Python code format seen in the wild now or the format easily customizable for the built-in types. The former is most preferable for me personally, but the latter might be more appropriate if maintaining historical pretty-print formatting is important.

10 Likes

The biggest problem with this idea of __pprint__ is that prettiness is subjective. If we think of this as “structured print” instead we can get around most of the problems associated with the subjectiveness. The printer gathers the required structural bits and then applies the indentations, new lines etc. as necessary to the prettiness standards of the specific use cases. Treating the proposal as structured print also allows us to use the same mechanism for custom string serializations for DSLs outside of generating human-readable data.

This circles back to the discussion of __getnewargs_ex__ which is instead intended for pickling. But I don’t think the two use cases overlap: not all classes of objects are designed to be pickled, but all objects are expected to be repr()-able and by extension pprint-able —- even if it sometimes results in syntactically invalid forms. To be able to eval(repr(foo)) == foo is a nice-to-have feature but has never been an absolute requirement (the gibberish that repr(object()) gives is obviously not evaluable) and neither should this be the case with pprint.

5 Likes

The values can be any of the following formats:

  • A single value, representing a positional argument. The value itself is used.
  • A 2-tuple of (name, value) representing a keyword argument. A representation of name=value is used.
  • A 3-tuple of (name, value, default_value) representing a keyword argument with a default value. If value equals default_value, then this tuple is skipped, otherwise name=value is used.

Should it be an exact tuple? A subclass? Or – if the protocol changes to only tuples as suggested in the thread – a sequence?

What if we want to add a fourth item in the future? default_value can’t really be made optional…

Alas, I think we do need to plan for extensions. The REPL does colours now; this does not; it’ll feel out of date as soon as it’s added.

Should we add a dedicated debugging device, and print to stderr?
Or perhaps, following breakpoint(), to something that prints to stderr by default, but lets GUI debuggers set things up to receive inspectable objects?

Yeah. You’ll always get best results with a custom pretty-printer.
But there’s value in a protocol that can destructure arbirtary objects in some default way.

That’s why I think it’s best to think of the proposal as “structured print”, or if we want to keep the pprint name, “pattern print”. With this structured information, fancy printers can add all sorts of color and formatting they want without having to tokenize and parse repr() output again, while implementers are not burdened with the connotations of __getnewargs_ex__ that their objects must be pickleable.

To ease implementation we can add convenience functions that collect __dict__ and __slots__ names and values and return the appropriate objects that the dunder wants.

1 Like

In addition to the existing !s, !r, and !a conversion specifiers, an additional !p conversion will be added. The effect of this specifier with an expression value will be to call pprint.pformat(), passing value as the only argument. In this initial specification, it will be an error to provide any format specifier if !p is used.

pprint.pformat() defaults give you the 80-column, dict-sorting representation, which is optimized to be printed out to a terminal on its own, rather than get embedded into a string.
Is that worth adding a conversion?

4 Likes

There seem to be a few independent aspects to this proposal:

  1. Making pprint more flexible. My feeling is that the PEP doesn’t go far enough here, but clearly the experience of rich is that this is enough for many use cases, so maybe my instinct is wrong.
  2. Integration into print(). This feels a little odd, as the description suggests that the expected use case is printing one object, and doesn’t really address multi-arg print, such as print("Item 1:", one, ", Item two:", two, pretty=True). I doubt that pretty-printing the items individually would give a particularly “pretty” result in general (in the example I gave, line breaks could be a bit of a mess).
  3. The !p conversion specifier. This again doesn’t feel like it would compose well in a string with multiple pretty-printed parts.

I’m in favour of making pprint more flexible, and I’d like to see that go further than the current PEP. More flexible options for handling containers, custom formatting of builtin values, and colour support seem like obvious extensions. I could live with those being deferred for future consideration, though.

As far as the pprint option for print, that seems like the wrong approach, for the composability reasons I gave above. Maybe a pformat builtin would be better? print("The data:", pformat(obj)) isn’t that verbose, and f"The data: {pformat(obj)}" works just as well as f"The data: {obj!p}".

One other option that the PEP doesn’t mention, but which I think could be very useful, is support for pretty printing in the REPL. I find the rich method pretty.install() to add pretty printing to the REPL, to be extremely useful, and I’d love it to be the default (or at least, available as an option) in core Python.

14 Likes

True, sys.displayhook = pprint.pprint doesn’t quite do the right thing.

Edit: I realised I should explain what’s missed when doing that, rather than leaving it to the reader to figure out. In addition to printing the expression result, sys.displayhook is responsible for setting builtins._ and deciding how to handle encoding errors when printing. So the binding shown mostly works, until you hit an encoding error or try to reference the previous result via _.

3 Likes

About:

  • A single value, representing a positional argument. The value itself is used.

  • A 2-tuple of (name, value) representing a keyword argument. A representation of name=value is used.

  • A 3-tuple of (name, value, default_value) representing a keyword argument with a default value. If value equals default_value, then this tuple is skipped, otherwise name=value is used.

Any chance of returning explicit types instead of applying semantic meaning to tuple sizes?

For example:

  • A single value, representing a positional argument. The value itself is used.

  • pprint.KeywordArgument object representing a keyword argument. A representation of name=value is used. If default_value is given and equals value, then this tuple is skipped, otherwise name=value is used.

With KeywordArgument being defined as:

@dataclass(frozen=True)
class KeywordArgument:
    name: str
    value: Any
    default_value: Any = NotGiven()  # NotGiven() is a sentinel, so users are 
                                     # able to pass default_value=None.

This is more explicit and open to future extensions: new types could be added in the future to support other use cases.

I understand this might not be possible given compatibility with Rich seems to be one of the objectives of the proposal.

3 Likes

As much as I get it, I don’t think __pprint__ makes sense since it’s not printing, but rather to stringing.

What about __pformat__ ultimately that’s a reference to the function that will notice the change.

Will the interpreter error if someone returns a string instead of the expected sequence of things?

3 Likes

I noticed a typo in the PEP:

One consequence of print(..., pretty=True) is that it can be more less obvious if you wanted to print multiple objects with, say a newline between the object representations.

I like the idea of this PEP. I”m very much in favor of adding better repr support, as well as better pprint support. But as that sentence implies, I think those are two separate things.

I’ve often found myself writing __repr__ functions that construct a chrome-plus-argument rendering. It would be lovely to have a protocol that let me just specify the arguments and let it build the correct repr. And even better, something to do what rich’s ‘auto’ does. But in my mind that’s separate from pretty printing, though pretty printing might make use of that information.

As other posters have indicated, what I think you really want is a set of structured data that a formatter such as pprint can pretty print. I’m not sure exactly what that protocol would look like, but the currently specified protocol is not enough by itself; it is too limited and inflexible. I understand you want to limit the scope of the pep, but let’s do it in such a way as to not prevent a more general solution in the future :wink:

I would suggest, as others have, that __pprint__ not be used as the name for the attribute. __rich_repr__ also doesn’t sound right to me, for reasons others have mentioned. Maybe __enrich_repr__. Or how about __getcurrentargs__? We should keep this protocol mentally separate from what it might get used for: it provides structured information, not behavior.

My own use case is the ‘pprint’ method of my TokenList class in email._header_value_parser. This PEP would not help with that implementation. My method is a debugging aid. It’s output is not a simple tree of objects; rather, it contains additional meta-information. It really doesn’t have anything to do with or make any use of repr. Or pprint, for that matter.

To make that method integrate with !p, which I’d love to be able to use on parse tree objects, I think it would be sensible if __pprint__ were speced to return an iterator of strings, with each leading space on a string indicating one level of indent. Then pprint would indent all of them per the current position of the returned object in the pretty print output. If pprint were a builtin, or at least lazily imported, __pprint__ methods could then call pprint on sub-objects to produce the tree it wanted. (Ideally pprint would call whatever pprint was specified by pprint=, but I’m not sure how that would work.)

With that as a protocol, my _pp method wouldn’t need an indent parameter, and wouldn’t need to special case non-TokenLists, it would just do for token in self: yield ' ' + pprint(token) in its inner loop.

My __pprint__proposal isn’t about satisfying the enhanced structured data desire. But it would be useful for some of the use cases mentioned by others (I think?) And it aligns better with what my naive expectation for what the semantics of a __pprint__ special method would be. In fact, I would be/will be very surprised if __pprint__ has any other semantics than “return something for pprint to indent into the pretty print is is creating”.

This does not exclude adding additional protocol methods in the future for more structured data return. Nor does it exclude pprint from using __getcurrentargs__ (or whatever it ends up being called) and/or the equivalent of the rich ‘auto’ decorator and/or any structured data methods defined in the future, if a __pprint__ method hasn’t been defined on the object on which it is called.

Going back to __getcurrentargs__ (or whatever), we could also have a way to tell __repr__ to use this information to construct the repr (without pretty printing), as well as way to tell it to do the auto version. Maybe a from __future__ import to change the default behavior of repr, or at least something like def __repr__(self): return autorepr() (with a better name than autorepr :slight_smile:). Not that I’m suggesting that for this PR!

I don’t object to anything this pep does other than its questionable use of the name __pprint__ :slight_smile: I’d also really like the __pprint__ I described above, but its absence wouldn’t be a reason to reject the PEP. What I want to emphasize, though, is that the things that return structured data should be decoupled from the pretty print concepts. Pretty print should use that data, but that data is not pretty print data.

4 Likes