PEP 813 - The Pretty Print Protocol

A couple of the elder statesmen are happy to announce PEP 813, a proposal to build-in optional pretty printing for print(), str.format(), and f-strings, and to define a protocol classes can implement to participate in and customize how their instances are pretty printed.

Enjoy, and let the games begin!

19 Likes

Pretty-printing is really a matter of taste so I would really like a way to specify various kind of pretty printing. Because of that, and in order to ease possible future improvements without breaking compatibility, I’d like to suggest to add the possibility to return an already formatted object for which __repr__ is different. If the argument against this is “just use __repr__”, then I’m afraid I wouldn’t be happy. Except by always passing an pretty formatter every time I use print, I can’t think of a solution for that (but maybe it’s a non-issue and then I’m also fine with not supporting that aspect).

Another comment I have:

  • A single value, representing a positional argument. The value itself is used.
  • A 2-tuple of (name, value) representing a keyword argument. A representation of name=value is used.

How would it distinguish between a positional argument that is a tuple (name, value) and the keyword argument itself? like:

class Bass:
    def __init__(self, pair: tuple[str, int], /):
        self.pair = pair

    def __pprint__(self):
        yield self.pair

Should Base(("a", 3)) be rendered as Base(a=3) or as Base((a, 3)) (it can’t look at the signature because I can also have a *args as a signature and still yield my pair as is)? Are there also restrictions on the types of the elements of the pair/triplet?

While at it, let me also suggest that we could pass directly pass a callable to pretty with the same signature as pformat instead of passing the instance.


Maybe another feature (or non-feature) is whether nested pprinting would be supported and whether we could make it happen without having to call the __pprint__() method. For instance, we could indicate whether or not the value of the argument should also be pretty-printed (e.g., yield (name, value, is_default, should_pprint) or whatever other convention you want).

3 Likes

In the section “__pprint__() methods”, it says:

  • A single value, …
  • A 2-tuple of (name, value) …
  • A 3-tuple of (name, value, default_value) …

Will/can the single value ever be a tuple?

In the section “A new argument to built-in print”, it says:

  • None - the default. …
  • True - …
  • An instance …

Why that choice of None vs True? Why not False vs True?

2 Likes

spec pedantry point in the f-strings and str.format() section:

an additional !p conversion will be added. The effect of this specifier with an expression value will be to call pprint.pformat(), passing value as the only argument.

I suggest clarifying this. Be specific, does this mean the pprint name is looked up in the local namespace or does this mean that internally a import pprint is done at the time of this rendering and its pprint.pformat is called? Explicitly specify when the pprint import happens and if the result of that import may be cached or if PyImport_Import() is called upon every !p rendering.

Also, shouldn’t t-strings gain a template interpolations .conversion == "p” format?

4 Likes

Built-in print() takes a new optional argument, appended to the end of the argument list, called pretty

this isn’t clear as written though I know what you must mean. I suggest simpler wording:

“Gains a new optional keyword only argument called pretty”

Its position is irrelevant. It’s a keyword argument like any other.

3 Likes

+cc: @willmcgugan which I believe is the maintainer of rich for thoughts.

1 Like

I should have worded it as “the equivalent of pprint.pformat”. I was thinking we’d not actually import pprint, but let me discuss it with Barry.

As for t-strings: would they have a use for this? I thought there wasn’t a strong correlation between t-string and f-string conversions, but let me do some research. I’m happy to hear opinions.

1 Like

t-string support for this could be a possible follow-on, we should just mention one way or another in the PEP (rejected or deferred idea maybe).

3 Likes

We should probably add a “deferred ideas” section. In there should also go format specifiers for !p, I prohibit them here, with the idea of deciding at a later date what might be useful.

1 Like

My main issue with this is that developers might not want what is the equivalent of the repr for pprint and this does not give them the ability to specify anything other than that. I might have a class where the repr would not necessarily be a reasonable thing to use (such as for Decimal). One example I can think of is that I often work with a Version class and would like to be able to use pprint to output a dictionary that contains these objects without having to do str(version) when building the dictionary.

In other words, I would like to be able to tell pprint to use the __str__ method (or even a third option) and not necessarily something that is equivalent to __repr__.

1 Like

The protocol for the pprint module sounds fine to me (and I especially like that it is intentionally modelled on the way rich works).

I’m less sure about the proposed built-in integration, as it’s been years since pprint was my first choice debugging printer: that honour instead goes to json.dumps(x, index=2, sort_keys=True). That’s partly an artifact of the specific domains I’ve been working in (the things I want to print have readily available JSON serialisations), but it does mean I think the case for doing more than just defining the protocol is vastly weaker than the case for the protocol itself.

1 Like

Quoting a few things out of order.

Yes, this is exactly the case. @willmcgugan was very helping during the first drafts, to explain how Rich’s pretty print protocol worked and why it had made the design choices it had made. I found him to be convincing, and once we[1] felt confident that Rich’s __rich_repr__() could effectively be equivalent to __pprint__() I was happy to go align PEP 813 with established protocol.

Can you explain why you wouldn’t be happy?

To me, “pretty printing” really is essentially a debugging device, even more so than print(). One day I just got tired of doing explicit imports of pprint and going through that whole rigamarole that it occurred to me we could have a small convenience just like with PEP 553 and built-in breakpoint().

That’s one of the reasons why I was convinced to adopt Rich’s choice to wrap the yielded values in the “class chrome”.

I want to be very careful about scope creep with PEP 813. It isn’t intended to solve every use case for debug printing objects, but it should handle the majority of common cases, and be much more convenient that using pprint.pprint().

This is something I thought about, and actually Rich handles this case pretty well. The obvious solution is to treat the 2-tuple case, where the argument name is the empty string or None as positional. Thus, in Rich:

class Bass:
...     def __rich_repr__(self):
...         yield '', (1, 2)
...         yield None, 8
...
>>> pprint(Bass())
Bass((1, 2), 8)

It’s just that PEP 813 doesn’t specify this and my reference implementation doesn’t handle it. I’ll update both to have the identical behavior.

That’s pretty much just any callable taking a single argument. I don’t think we need both, so I believe that’s the difference between something like:

print(myobj, pretty=MyPrinter())

vs

pprint(myobj, pretty=MyPrinter().pformat)

Because None is traditionally used when an argument is missing, so you can think of today’s print() function as equivalent to print(..., pretty=None).

As Eric says, we can discuss this, but in my mind it works just like built-in breakpoint() today, and how print(..., pretty=True) works in my reference implementation, i.e. pprint is imported, not that the name is looked up in the local namespace. Either way we’ll make that explicit.

And that’s fine! But as mentioned above, I’m trying to keep scope creep to a minimum, so my dry answer is: don’t use pprint.pprint in that case!

Fair enough, but I’ll just point out that the impetus for me going down this path in the first place is getting tired of having to explicitly import pprint either at the module level, or at the call site of the object I wanted to debug. It’s exactly the UX that lead me to propose built-in breakpoint().


  1. I think I can speak for Will here ↩︎

4 Likes

I use pprint.pformat to get a nicely formatted string of a dictionary for use as production output but I need to add steps to convert objects to str as I build the dictionary. This is not a debugging tool for me - this is a major use case I have. Having it do a modified repr is not something that is useful to me - for production or debugging. If that’s all this is I would recommend that the committee not adopt this PEP. What would be useful would be something that lets me have my objects define how they are formatted with pprint and pformat so I can control it.

1 Like

Elevating pprint to a built-in (with a lazy import behind the scenes) feels like it would be a more straightforward way to achieve that level of convenience than adding a new parameter to print

2 Likes

I’d be happy with this PEP in its current form. I use the Rich Repr protocol reflexively now, but having that facility builtin would be a amazing.

I do wonder about the name. I could easily adapt Rich to work with __pprint__, and the name aligns with the pprint module. Which all makes sense, but “print” reads like a verb here, and __pprint__ wouldn’t actually be printing anything itself.

I hope this doesn’t sound too self-serving, but __rich_repr__ makes more sense to me. It complements __repr__ which returns a representation of the object, while __rich_repr__ returns a richer representation of that object. The term “rich” here would be its dictionary meaning, and not a reference to the Rich library.

I’m not particularly hung up about that. __pprint__ works. Just throwing it out there.

I’m also tentatively +1 on a builtin pprint, which feels super convenient. The pretty=True feels a verbose compared to hitting an additional p character. Although they aren’t quite equivalent since pprint has a different signature. Maybe we can have both a builtin pprint and a pretty=True arg?

1 Like

We tossed around adding a pprint builtin, but didn’t come to a conclusion. Would you want it to actually print (ala pprint.pp) or format (pprint.pformat)? I think the latter would be more useful, but then the name doesn’t make as much sense. And you could achieve the same thing (assuming it takes no arguments) with f”{obj!p}”.

__repr__ returns a string representation of an object, whereas __rich_repr__ returns the components required to create a richer representation of that object.

Personally, I’m not attached to __pprint__, but I would expect __rich_repr__ to return a string, like its counterpart __repr__. But that is not the proposal. I’d be a -1 on __rich_repr__.

1 Like

I would think it would print. Most of the time, that’s what I would use it for. Formatting without printing would be a far less frequent thing for me. And I’d be happy with an import there.

__repr__ returns a string representation of an object, whereas __rich_repr__ returns the components required to create a richer representation of that object.

True, but both return a representation of the object. Even the __repr__ may require further processing to build the formatted string (in the case of collections).

My thinking is that they both have essentially the same purpose, but __rich_repr__ returns a more flexible (richer) representation than __repr__.

1 Like

Is the intention here always that the printed output be syntactically valid code?

This is what sympy’s pprint function does:

In [15]: from sympy import *

In [16]: x = Symbol('x')

In [17]: pprint(Integral(sin(x), (x, 0, pi)))
Ď€          
⌠          
⎮ sin(x) dx
⌡          
0  

I would often like to be able to use that pprint function when debugging but it is awkward to access from e.g. pdb. Sometimes I want the opposite though and I actually want syntactically valid code which is also not necessarily what print gives.

SymPy has many printers for different kinds of output:

$ ls sympy/printing/
aesaracode.py   glsl.py         maple.py        printer.py   smtlib.py
codeprinter.py  gtk.py          mathematica.py  __pycache__  str.py
conventions.py  __init__.py     mathml.py       pycode.py    tableform.py
c.py            jscode.py       numpy.py        python.py    tensorflow.py
cxx.py          julia.py        octave.py       pytorch.py   tests
defaults.py     lambdarepr.py   precedence.py   rcode.py     theanocode.py
dot.py          latex.py        pretty          repr.py      tree.py
fortran.py      llvmjitcode.py  preview.py      rust.py

I’m not sure if those are compatible with the argument to print or if it would be possible to make them compatible.

While we are at it can this be fixed?

In [2]: pprint.pprint(10**10000)
...
ValueError: Exceeds the limit (4300 digits) for integer string conversion; use sys.set_int_max_str_digits() to increase the limit

It is hard to fix that for repr/str but if pprint deconstructs the object and implements custom printing for each part then it should be able to handle this. Here is an example of what it could do but using python-flint’s fmpz type:

In [7]: import flint

In [8]: x = flint.fmpz(10)**10000

In [9]: print(x.str(condense=10))
1000000000{...9981 digits...}0000000000

I think that you end up wanting different things in different situations. It seems like the idea here is mainly just a convenience for making a multiline repr kind of like how the repr would look if it was formatted by ruff/black. I’m not sure if it would be more useful to have something more configurable than that.

It seems like __pprint__ is basically __getnewargs_ex__ but for some reason it is a generator. Why is that better than just doing what __getnewargs_ex__ does?

1 Like