Doctests failing with Python-3.15.0b1

I have a failing doctest trying to test 3.15.0b1; the tests work for versions <= 3.15.0a8. I think this is an issue with pprint, but need advice on how to get over the problem test.

>>> from pprint import pprint as pp
>>> pp(sorted(config.items('DEFAULT')))
[('dataSource.name', "'mydata_uat'"),
 ('dataSource.password', "'drongo'"),
 ('dataSource.user', "'gordon'")]
>>> pp(sorted(config.items('mychart')))
[('chart.valueAxis.labels.fontName', "'Helvetica'"),
 ('chart.valueAxis.labels.fontSize', '12'),
 ('dataSource.name', "'mydata_uat'"),
 ('dataSource.password', "'drongo'"),
 ('dataSource.user', "'gordon'"),
 ('height', '250'),
 ('someColor', 'red')]

When tests are run the second fails I see this

   >>> pp(sorted(config.items('mychart')))
AssertionError: Failed example:
    pp(sorted(config.items('mychart')))
Expected:
    [('chart.valueAxis.labels.fontName', "'Helvetica'"),
     ('chart.valueAxis.labels.fontSize', '12'),
     ('dataSource.name', "'mydata_uat'"),
     ('dataSource.password', "'drongo'"),
     ('dataSource.user', "'gordon'"),
     ('height', '250'),
     ('someColor', 'red')]
Got:
    [
        ('chart.valueAxis.labels.fontName', "'Helvetica'"),
        ('chart.valueAxis.labels.fontSize', '12'),
        ('dataSource.name', "'mydata_uat'"),
        ('dataSource.password', "'drongo'"),
        ('dataSource.user', "'gordon'"),
        ('height', '250'),
        ('someColor', 'red'),
    ]

testing shows that between 3.15.0a8 and 3.15.0b1 the pprint output did change.

Is there some way to get a readable test that works across the a8 - b1 change or do I have to switch to using pp(calculated) == pp(literal) as the test and ask for a True value?

If this setting doesn’t do the trick, it’s straightforward to parse the expected outputs of the gathered doctest tests, before calling the doctest test runner.

Thanks for the try, but because the pprint output has different sequences of whitespace I can either get it working pre 3.15.0b1 or after.

so 3.14 pprint produces

[(‘chart.valueAxis.labels.fontName’, “‘Helvetica’”),
…
(‘someColor’, ‘red’)]

and 3.15.0b1 does

[
    ('chart.valueAxis.labels.fontName', "'Helvetica'"),
   ......
    ('someColor', 'red'),
]

and there is even a trailing comma difference.

It’s easier to just compare the lists and use \ at the line ends

On of the changes in 3.15b1 is to [improve the formatting]( What’s new in Python 3.15 — Python 3.15.0b1 documentation ) for pprint and pformat.

  • pprint now uses modern defaults: indent=4, width=88, and the default compact=False output is now formatted similar to pretty-printed json.dumps(). (Contributed by Stefan Todoran, Semyon Moroz and Hugo van Kemenade in gh-112632 and gh-149189.)

Changing output formats is clearly a huge advance, pity there isn’t a compatibility option which would support those that have pre-computed test results for comparison.

Calling something modern doesn’t make it better, but probably implies it might change.

I wrote doctest, and it’s dead serious about WYSIWYG. It was surprisingly divisive :wink: Some people love it, others hate it. And a bunch don’t think about it at all.

Nevertheless, many people rely on it, and changes in output are breaking changes for them. This isn’t just for tests, a number of tech docs (including entire books) use doctest to guarantee that all the examples they publish work exactly as shown.

For that reason, backward compatibility dictates that a new output format “should have” been introduced under a new module function instead, say pprint.pprint2(). That is, nothing about pprint.pprint() would change.

So if I were you I’d open a tracker issue and complain. If that doesn’t work out, while I don’t know anything about your config object, this might work instead:

>>> for obj in sorted(config.items(whatever)):
...   print(obj)
[and expected ouitput appears here]

That is, don’t use pprint at all. The enclosing square brackets in your current output aren’t part of the data at all, but an artifact due to sorted() creating a list object. The trailing comma in the b1 output is also a consequence of that.

I’m too old and lack the determination required for pushing back the future. I have just worked around the problem in the small number of tests that used pprint.

So I opened an issue on your behalf. You won’t be the end of this - you’re a canary in a coal mine here :slight_smile:

squawk, squawk :grinning:

As soon as I saw PR 149190 merged, I grepped my own projects to see if it would break my doctests.
I found a couple of occurrences and fixed them. I thought to myself “I should not have used pprint in this context, this was on me”. I checked pprint docs, and nowhere it is mentioned that pprint’s output is supposed to be stable, though I agree that type of “churn” is not great.

For the record I actually like the new pprint defaults, which better align with code formatter like black and ruff.

Yes, this is a “quality of implementation” issue, not a matter of guaranteed behaviors. But those count too: your “churn” is a good word for it. I like the new behaviors too, but not enough so to insist that countless thousands of other programmers may need to change their tests when moving to a new release. For docs like books including doctest examples using pprint, that can be a major expense (if they bother at all - lots of such docs will just become out of date).

The practical thing is to instead introduce a new function (say, pprint2) with the new defaults. Under the principle that, in such a widely used language, visible changes without need are always best avoided. It’s not like current pprint behavior is, in any sense, a bug magnet.

So, new defaults seems reasonable for you.

If they rely on defaults.

This will lead to an ugly API we will never able to remove, no?

And people still will have to adopt their doctests to use the new function.

We could announce change with a warning. But wouldn’t this break doctests as well?

On their own, sure.

Of course they do. Look at the PR. It changed over a dozen files because our own tests broke - which used all-default pprint() as the convenience function it is.

I don’t see where “ugly” comes from. We’d have two functions, and yes, they’d both stick around forever. I don’t care about the name. If you think “2” is ugly, try, e.g., pprint_new, pprint_modern, pprint_more_like_json, pprint_ex, …

If they want different defaults, but they won’t have to change anything if they don’t. Up to them! It’s unreasonable on the face of it to insist that people rewrite published works (like examples in books).

I don’t follow. The dead obvious way not to break anything is not to change anything of what pprint() already did. You want different behavior, fine, ask for it explicitly. I would happily use pprint2() (whatever) in new tests I write, but am highly averse to burning time to rewrite tests that worked fine for years. Many of which were originally written in Python 2, and never changed again (in many such cases the implementations needed to change, but the visible input-output behavior did not).

At the years go by, backward compatibility becomes ever more important to the mushrooming user base. In the absence of repairing an actual bug, it’s almost never justified anymore: the “cost” part of cost/benefit keeps on growing, multiplied by the growing number of users. “Why do they have two versions of pprint?” “Because they produce different output, each serving a large number of users” is not hard to understand.

No, I’m about highlighted sentence, not about naming. More precisely: “We’d have two functions for same thing, and yes, they’d both stick around forever.”

Though, as we already have pp(), maybe it’s the way to go in the given case.

Sure, the dead obvious way to not break anything is not do backward incompatible changes :slight_smile: Or for a language to die.

How common will be such breakages? I don’t sure that CPython test suite is a good representative.

Perhaps, changing the pp()'s defaults will introduce less breakages.

You misquoted me:. “for the same thing” was your invention :wink:. They’re not the same because they produce different output. Which is the entire point of changing anything.

That Python might die because we add a new function isn’t credible enough to entertain. Annoying people with backward-incompatible changes is a more credible existential threat, but still mostly in fantasy-land.

It appears to be nonsense that changing defaults could break less than not changing defaults. The latter can’t break anything.

Nobody can say. how much breakage would occur. We know for sure that one beta-release tester cared enough to post about it, and that our own tests broke. Not good signs.

Are we “typical”? Of course not. Some orgs will have less breakage, and others more. Our core devs are mostly unittest fans. But I worked for a company (Zope Corp) that only used doctests. People writing papers and books have scant choice if they want readable, verifiable Python examples. WYSIWYG is as self-evident as things get, and pprint perfectly fits doctest’s goal of using human-readable output with minimal bother.

Until it breaks for no reason other than that some people prefer differently formatted output.

How many? Nobody can answer that either. But for a backward-incompatible change, the burden of proof is properly on those who say “it won’t break enough to matter”.

Note that none of our tests or alpha/beta/rc releases found the memory-pressure problems inc gc introduced either. But they proved in real life to be so severe that we reverted inc gc in a point release.

In any case, it looks like the SC (via @barry) has decided to revert this change as currently implemented.

Just to be clear, the SC hasn’t voted on it, but I do think it should be reverted and would strongly advocate that position if it did come up for a vote because the breakage isn’t worth it right now for all the reasons Tim stated above.

I’m a big fan of doctests (another of @tim.one 's great ideas) and pretty print in PEP 813. I have some thoughts on a more comprehensive and flexible improvement, but it obviously can’t be done in time for 3.15. Let’s revert this now and work toward a better approach.

If “it might break doctests” was an unequivocal reason for disallowing changes, we’d never be able to change any object’s representation ever again.

The argument for books being affected is especially weak, as adding a new function means the books are still outdated (due to using the legacy API), but their automated tests are no longer alerting the authors to that fact.

The missing piece to me seems to be the lack of a convenient way to globally set the defaults in pprint, so folks affected by this could then choose between:

  • setting the defaults in their doctest setup code to match 3.14 and earlier
  • adapting their tests to match the new defaults
  • migrating away from relying on pprint output details in their test cases

(Edit: as of beta 1, they only have the latter two options, and I agree that’s an unacceptable place to be, especially for projects that need their tests to run on multiple Python versions)

The changes in question are about formatting of container object displays. There is no objectively “right” or “wrong” way to do that, just matters of fashion and taste. I happen, e.g., to vastly prefer that str(int) always produce underscores for ints that don’t fit in 3 decimal digits, but changing that would also be a breaking change (and for more reasons than just doctest).

There’s nothing in the least wrong about anyone relying on that str(int) produces a string with nothing but decimal digit\s, possibly preceded by a minus sign. The language has always done so, and probably always will.

The docs already warn doctest writers against relying on the output of floats, recommending that they stick to small exactly representable (in binary) floats, like 0.25. Not only for output stability, but also for human readability

For an example of a dubious change, for multiline list output, the new pprint adds a trailing comma before the closing square bracket. Why? It’s close to pointless best I can tell, and at first sight sure looks like it forgot to display the final list element.

I do the same for long list literals in my code, and often too for print() calls with many arguments across lines. But that’s for ease of changing the code, not at all because it’s “more readable”.

I don’t believe there’s any objective sense in which the old or new behaviors are better. De gustibus non est disputandum.

So I strongly object to calling the current behavior “the legacy API”. There is no sense in which the current behavior is “wrong”, and it’s not about the API anyway. It’s about the output produced.

Since there is (according to me) no reason to insist the current pprint output is wrong, there’s no reason to fault docs for not changing to some scheme you happen to like better instead. “Works forever” is an aspirational goal, but if it needs to break, please don’t do so over things so objectively trivial.

And then we need thread-local contexts, and then context managers, and …

“Global defaults” suck here for all the same reasons they’ve sucked everywhere else :wink:. The decimal module carried this to what may be an extreme. with its context objects also directly supplying methods for just about every function the module offers as functions.

WRT the first, someone who likes the new behaviors better could just as well set the defaults to what they prefer in their setup code. Although I don’t believe there is an option to control the “mystery trailing comma” change.

WRT the second, “ain’t broke, don’t fix” is usually best advice.

WRT the third, I’ll guess that you don’t use doctest :wink:. It’s visual by nature, and

>>> result == [1, 2, 3]
True

is a poorer doctest on several counts than

>>> result
[1, 2, 3]

People who use doctest “for real” think of resorting to the former as a kind of failure. Extra typing, extra noise, higher cognitive load.

I worry, that your arguments actually affect (ban) any incompatible change in formatting (repr, string formatting, pprint, whatever). All this in the end is about fashion and taste.

See e.g. the black documentation: they also add trailing commas for multiline data structure literals (list, etc). Rationale: smaller diffs. E.g. if you removed last element in the former format — you have to change the previous line as well.

Issue mentioned a new keyword to select “style” (to override function defaults for other kwargs), perhaps it’s the way. But I think it’s not much better than just a set of itertools-like examples in docs.

It’s fine if people will use some shortcuts from such examples to get pprint alternatives. But I find it odd if people will have to do this just to achieve better defaults than legacy ones. Several functions is not much better. Now we have two, third is coming. Which one just does the job? :slight_smile:

Indeed, any backward-incompatible change will create otherwise needless work for many people, so should never be done without strong compensating advantages for even more people. The advantages of the new formatting of container objects strike me as minor. Lists, dicts, and tuples are extremely common in Python programs, and best I know pprint() has shown them the same way for 30 years. It’s not like we’re solving “a problem” here. An earlier post in this topic called the changes “churn”. I’m fonder of the new formats than they are, though :wink:

Not all cases of backward-incompatibility are equal. Nuance matters.

As I said, I also add a trailing comma in many cases of code I write, and for much the same reason: it makes adding and removing elements faster, easier, and less error-prone (smaller “by hand” diff).

But, again, it does not improve readability. And pprint’s job is readability. “The reason” doesn’t transfer across goals.

Ironic: the issue report noted that the new format is similar to pretty-printed JSON. But JSON never produces trailing commas - like current pprint(), commas only appear (in object and array displays) as item separators (never terminators).

I’m all in favor of people hammering out ways to get the formatting they want. I’m all opposed to forcing incompatible defaults on everyone.

“More to their tastes”, sure, but preferences should not be elevated to universal imperatives in the core distribution. If, e.g., a company wants to mandate that a list can never be shown without a trailing comma. that’s up to them. It’s not the core’s job to impose their preferences on the world.

I don’t see signs of a “slippery slope” here. One function that honors 30 years of precedent, and however many others people can force adding :wink:

I keep forgetting about pp(), which on its own was precedent for adding a new function rather than change what pprint() produced. The only difference between them is the default used for the sort_dicts argument.