Rework PPrint module structure

MonadChains · January 22, 2023, 10:30am

Recently in an issue, it was asked to add an option to have a “proper” indentation of nested objects in pprint. The request for this feature is recurrent, however, it was pointed out that the module has reached its limits in terms of extendability and needs to be reworked internally to be more maintainable. Anecdotally, I agree with this observation, I tried to implement the feature and I couldn’t find an elegant way to do so.
I have prepared a diagram that tries to synthesize the execution of PrettyPrinter.pprint(obj):

The algorithm is recursive: If an object repr cannot be printed in a line call an advanced formatting function according to its type and repeat for nested items.
The recursive structure makes sense to me, however, I’ve thought these possible improvements:

Make the flow of information bidirectional (i.e. make functions in the loop return values) to help implement features especially when they have width requirements/constraints.
Isolate as function reusable code, an example of this could be to have a unique function for nesting items in brackets: (/[/{ + _format(sub_item) + }/]/}/).
Ensure more consistency in the code itself, I’ve seen that some function has some minor difference in code structure that could be uniform.

BorjaEst · January 24, 2023, 8:10am

To contribute with an idea, I would replace “list” by one of the classes at collections.abc. That way, when a user creates its own sequence (my case for example) he can benefit of pprint to print on similar basis than list.

AlexWaygood · January 24, 2023, 4:04pm

We should consider whether it is worth investing considerable amounts of time and energy into improving pprint, given how ubiquitous the third-party library rich is in the Python ecosystem these days. rich is very well maintained, has far more capabilities than pprint ever will, and is even used by pip now.

MonadChains · January 24, 2023, 7:14pm

While I agree that there are very good libraries (including rich), I believe that pprint is still unique for its almost non-existant entry barrier, given its semplicity and its name, so improving its maintainability and its extendibility still makes sense. Given its nature, I’m against bloating it with too many features, however, the issue about formatting objects that I brought above is an example of possible improvement that can be, and in my opinion should be, implemented. The maintainability of the module is also important in case of changes that are “mandatory”, results of broader changes of Python like for the example the introduction of data classes.

steven.daprano · January 25, 2023, 12:23am

I had never heard about rich until now.

It certainly looks impressive, if you want to Do All The Things™ but it is far beyond a pretty-printer. It is a moderately large framework (~50K SLOC) that requires some hefty dependencies like pygments (~170K SLOC) and ipywidgets (~60K SLOC).

This is fine for what it is and I don’t mean to be negative about rich. I am actually impressed by its feature set and apparently easy to use API.

But it is a nuclear reactor, not a battery, and the stdlib pprint battery is a bit old and could do with some love and attention. It would be nice to see some improvements for those who just need a pretty printer that is available everywhere without needing to bring in a stack of dependencies.

To introduce a second metaphor beyond the “Batteries Included” one, this proposal is to modernise pprint and add a couple of bells and whistles, not to compete with the full symphony orchestra that is rich.

ferdnyc · January 31, 2023, 1:23am

And all of that power comes with some cost. Testing against a fairly simple data structure (a dict with 16 string:string key-value pairs, plus two more string keys each holding an empty list, loaded from a .json file plucked randomly from my /etc dir), timeit.repeat says that pprint.pp can dump it 100 times in 0.0172s minimum on my system. Running rich.print() 100 times on that same dict (in a terminal REPL, with colored output auto-enabled) takes no less than 0.275s.

If pprint.pp() can be made sane(r) without losing that speed advantage, I think it still has a place in the ecosystem. Being part of the stdlib is just an added bonus.