Reverting the incremental GC in Python 3.14 and 3.15

tim.one · April 17, 2026, 8:12pm

A seemingly eternal problem. Not optimistic. It’s why I said, in a different topic, that the best we can hope for is to stumble into new heuristics to worm around the real-world pathologies that pop up from time to time. Historically, almost all such cases I dealt with in sorting and pymalloc showed up first on Stack Overflow, with “random” users asking for help with “inexplicable slowdowns”.

When I was writing Python’s current sort, the only reports I got of timing results on various platforms came from people running the brief, synthetic, sortperf.py, which was part of the standard distribution at the time. Only one set of results from a real app running real data. They shared the result (“2x faster!”), but could not share their company’s data.

Quite recently I all but begged users to report just timing results (no data required) for a proposed change to the sorting algorithm. Your reply was the only one I got - and thank you for that .

Similar story for judging the string of collision resolution strategies the dict implementation has tried. Overwhelmingly driven by synthetic inputs, and all but the current strategy (which hasn’t changed in years) were eventually discarded for catastrophic behavior on rare reports from real-life apps. But in that case, it’s provable that “catastrophic” collections of keys always exist - the question is the much subtler one of “but how likely is real-life data to stumble into one?”.

Not unique to Python. Sebastian Wild. an academic who co-created the terrific “powersort” merge-ordering heuristic, has reported the extremely poor response to his persistent pleas for “real world data”. Mostly he just attracts contrived bad cases.

Whereas I early on opened an issue about seemingly quadratic-time behavior on the main branch. I had no idea gc changes were to blame at the time, and the whittled test case created essentially no cycles. It just provoked the heuristics at the time to run parts of gc far more often than reasonable.

Much of sorting has quite predictable worst-case O() behavior, but the gc context is much messier than that.

It is! No doubt about it. It’s of scant use in predicting “average” behavior (and there’s no such thing as “a typical app” to begin with), but can be of real help in fleshing out the limits of what might be seen in real life.

It does make a case for making it possible to easily try inc gc in a production release. Else the chance of getting any real-world feedback, ever, shrinks from “slim, and mostly only for catastrophic cases” to “essentially none”

Have to play the hands we’re dealt .