Adding level control to pretty-print in json

Currently, the behavior of pretty-print is is controlled by the parameter indent in json.dump:

  • indent=None: everything is rendered on a single line

  • indent set: the entire structure is recursively expanded

That means that if we set indent, pretty-print will “unfold” all the structures in the json.

However, for deeply nested or repetitive small structures (e.g. short lists or argument arrays), fully expanding every level can reduce readability.

For example, we have a json that was pretty-printed.

{
    "Command": "comment",
    "Args": [
        "text1",
        "",
        "",
        "",
        ""
    ]
}

But in cases like this, keeping "Args" on a single line would often be more compact and readable:

{
    "Command": "comment",
    "Args": ["text1", "", "", "", ""]
}

This can be partially achieved by writing a custom JSONEncoder, but that requires reimplementing parts of the encoding logic and is not very convenient.

What’s more, from its name, “indent” appears to describe only the indentation width, but in practice it also implicitly enables pretty-printing. This makes it difficult to control pretty-print behavior more precisely, as there is no dedicated parameter for that.

Proposal

Maybe we can introduce an optional parameter max_pretty to limit pretty-print expansion depth.

(Oh the name max_pretty is just an example, maybe we can have better idea.)

for example:

# module json definition after change
def dump(..., 
         max_pretty=0, # default is 0
         indent=None
         ...)
​
# example
json.dump(data, fp, max_pretty=1, indent=4)

Then the sutrcture whose depth smaller than max_pretty (in this example value is 1) will be “unfolded” by pretty-print, and those equals or bigger than max_pretty will not and will be in a line.

{
    "Command": "comment",
    "Args": ["text1", "", "", "", ""]
}

The key “Command” and “Args” are in root (frame 0), so they will be pretty-printed; but the content of “Args” is in frame 1, so it will keep normal.

That small change can enable us to control how pretty-print works, and indent will no longer control if pretty-print is enabled, only means the indent width just as its name, making things more elegant.

That’s my opinion. What do you think? Would such an option be considered within scope for the standard library?

5 Likes

I’d use this feature.

2 Likes

I’ve seen a different option in other encoding libraries: a max-width parameter. I think that’s more specific to the goal, especially with inconsistently-nested data.

3 Likes

I’ve wanted this before too but also think that disabling pretty-print at a set depth isn’t really representative of what’s intended.

Usually I think the rule is as Laurie says to have things that are small enough to fit on one line be on one line. But sometimes I actually want massive things on one line too to get them visually out the way[1]. Whenever I generate JSON to be passed to PlotlyJS, most of the payload is boring x y float coordinates with tiny bits of interesting metadata hidden in it. If I could tell json to pretty print everything except the x y coordinates, I’d have a decent chance of being able to read the metadata.

Stripped down example of what I'd want one of my plotly payloads to look like
[
  {
    "name":"hirola x1.5",
    "type":"scatter",
    "mode":"markers+lines",
    "marker":{"size":4},
    "line":{"shape":"spline"},
    "x":[1,1,1,1,1,2,2,2,2,2,3,3,3,5,5,7,10,13,19,26,37,51,71,100,138,193,268,372,517,719,1000,1389,1930,2682,3727,5179,7196,10000,13894,19306,26826,37275,51794,71968,100000,138949,193069,268269,372759,517947,719685,1000000,1389495,1930697,2682695,3727593,5179474,7196856,10000000],
    "y":[3.073e-05,3.014e-05,3.054e-05,3.073e-05,3.025e-05,1.268e-05,1.11e-05,1.057e-05,1.027e-05,1.049e-05,9.345e-06,8.441e-06,7.087e-06,5.258e-06,4.861e-06,4.319e-06,2.999e-06,2.32e-06,1.557e-06,1.179e-06,8.72e-07,5.884e-07,4.284e-07,3.098e-07,2.342e-07,1.889e-07,1.328e-07,1.079e-07,8.553e-08,6.95e-08,5.553e-08,5.022e-08,4.008e-08,3.97e-08,3.221e-08,3.18e-08,3.264e-08,2.77e-08,3.336e-08,3.155e-08,2.73e-08,2.756e-08,2.792e-08,2.815e-08,2.962e-08,2.991e-08,4.158e-08,3.64e-08,5.481e-08,5.491e-08,6.648e-08,7.739e-08,8.631e-08,9.17e-08,9.65e-08,1.017e-07,1.037e-07,1.056e-07,1.067e-07]
  },
  {
    "name":"numpy_indexed.unique()",
    "type":"scatter",
    "mode":"markers+lines",
    "marker":{"size":4},
    "line":{"shape":"spline"},
    "x":[1,1,1,1,1,2,2,2,2,2,3,3,3,5,5,7,10,13,19,26,37,51,71,100,138,193,268,372,517,719,1000,1389,1930,2682,3727,5179,7196,10000,13894,19306,26826,37275,51794,71968,100000,138949,193069,268269,372759,517947,719685,1000000,1389495,1930697,2682695,3727593],
    "y":[2.905e-05,2.796e-05,2.765e-05,2.747e-05,2.713e-05,2.71e-05,1.848e-05,1.689e-05,1.531e-05,1.45e-05,1.506e-05,1.17e-05,1.04e-05,8.6e-06,6.641e-06,5.944e-06,4.571e-06,3.285e-06,2.309e-06,1.669e-06,1.127e-06,1.098e-06,7.441e-07,5.409e-07,4.211e-07,3.477e-07,2.713e-07,2.376e-07,1.651e-07,1.427e-07,1.336e-07,1.234e-07,1.177e-07,1.133e-07,1.124e-07,1.166e-07,1.3e-07,1.203e-07,1.266e-07,1.261e-07,1.283e-07,1.332e-07,1.39e-07,1.464e-07,1.525e-07,1.614e-07,1.798e-07,1.957e-07,2.852e-07,1.753e-07,2.051e-07,2.068e-07,2.221e-07,2.533e-07,2.682e-07,2.887e-07]
  },
  {
    "name":"pandas.Categorical()",
    "type":"scatter",
    "mode":"markers+lines",
    "marker":{"size":4},
    "line":{"shape":"spline"},
    "x":[1,1,1,1,1,2,2,2,2,2,3,3,3,5,5,7,10,13,19,26,37,51,71,100,138,193,268,372,517,719,1000,1389,1930,2682,3727,5179,7196,10000,13894,19306,26826,37275,51794,71968,100000,138949,193069,268269,372759,517947,719685,1000000,1389495,1930697],
    "y":[0.0001181,0.0001237,0.0001169,0.0001098,0.0001108,8.636e-05,7.589e-05,5.893e-05,5.681e-05,5.594e-05,4.716e-05,3.923e-05,3.769e-05,2.892e-05,2.442e-05,1.918e-05,1.495e-05,1.11e-05,7.639e-06,5.328e-06,3.905e-06,3.566e-06,2.494e-06,2.12e-06,1.484e-06,1.425e-06,9.617e-07,7.628e-07,5.663e-07,5.01e-07,4.45e-07,4.124e-07,3.801e-07,3.754e-07,3.835e-07,3.645e-07,3.667e-07,3.466e-07,3.593e-07,3.532e-07,3.473e-07,3.757e-07,3.777e-07,3.866e-07,4.175e-07,4.413e-07,6.796e-07,5.136e-07,6.906e-07,5.392e-07,6.443e-07,5.91e-07,7.051e-07,9.834e-07]
  }
]

One issue with max-width is that it requires the serialiser to look ahead to figure out if an object it’ll serialise later is long enough to warrant taking a newline now. That may be a performance killer?


  1. especially in text viewers with no soft wrap ↩︎

@Nineteendo

Yes, I think so. Controlling by length isn’t always what we want, sometimes we want to keep some massive things folded because they are unimportant and we don’t want to waste time scrolling over them.

So, we can introduce multiple ways to make us always able to select the part we want to keep folded. For example, except the max_pretty I mentioned and max_width mentioned by Laurie, we can also introduce something like DOM or CSS Selector, but this would be quite complex—we’d need to figure out how to represent and implement it.

For now, we should probably stick with the two simple approaches, max_pretty and max_width, as the main solutions. These would already cover many cases. As I mentioned in my post, the key is adding max_pretty can stop using indent as a switch for enabling pretty-printing—separating that concern so that indent serves only its original purpose.

For finer-grained control and more complex implementation approaches, perhaps we should discuss each approach in a new post.

IMO, this is something for a third-party package on PyPI, which can iterate without worrying so much about backwards compatibility, and without requiring users to install development versions of Python. Plus, if the feature is merged to stdlib, the PyPI package would provide a backport for older versions of Python.
Many features/modules got into stdlib this way – see contextlib2, compileall2, tomllib, even asyncio.

1 Like

Got it. I’ll start developing this library in a few days. Thanks for introducing these to me!

1 Like

This is what the current docs says about the indent argument:

“JSON array elements and object members will be pretty-printed with that indent level. A positive integer indents that many spaces per level …”

Am I understanding it correctly, that it does not promise that each element/member will be printed on a separate line? For example, would this output (indent = 4 spaces per level) comply with the docs?

{
    "Command": "comment",
    "Args": [
        "text1", "", "", "", ""
     ]
}

No, Xitop, if you use json.dump(obj, fp, indent=4), the result is

{
    "Command": "comment",
    "Args": [
        "text1",
        "",
        "",
        "",
        ""
    ]
}

It may technically comply with the docs. But it doesn’t comply with the behavior that was implemented in the last 18 years, meaning changing this is a no-go without a lot of discussion and good justification. (and that justification would have to be technical in nature, not just it looking prettier)

I was asking to see if this is an option we have. I have no preference in this regard and I do understand your concerns. Anyway I guess for an option promising a “pretty-printed” output will the “prettiness” remain an important point that is hard to measure.

jsonyx (a fork of the stdlib) provides max_indent_level and indent_leaves. The latter keeps arrays and objects of simple objects on 1 line. For performance reasons I didn’t want to implement max_width.

2 Likes

Thanks for sharing that.

paste | jsonyx format -l --indent=4 | bat --language json

This is going to make squinting at REST API payloads so much easier.

1 Like