Reverting the incremental GC in Python 3.14 and 3.15

Python 3.14 shipped with a new incremental garbage collector. However, we’ve had a number of reports of significant memory pressure in production environments.

We’ve decided to revert it in both 3.14 and 3.15, and go back to the generational GC from 3.13.

3.15 is still in alpha, so such changes are fine. For 3.14, it is unusual for a patch release, but the old GC is a known quantity, the new incremental GC didn’t go through the PEP process, and was rolled back just before the final release of 3.13. We’ve discussed this in the core team and with the Steering Council.

If we want to reintroduce the incremental GC for 3.16, it can go through the regular PEP process and be more thoroughly evaluated.

Schedules:

  • 3.15: The first beta is scheduled for 2026-05-05, just under three weeks from now. If the revert is ready to release within the next week or so, we can put out an extra alpha 9.

  • 3.14: the next patch release 3.14.5 was planned for 2026-06-09, but we’ll release that early when the revert is ready.

I’ll update this topic and the release PEPs when those dates are known.

33 Likes

Would it be possible to include both GCs and let users choose one at startup, or would that be too costly maintenance-wise?

2 Likes

It’d be too costly. Having two GCs in 3.14 but just one in 3.13 and 3.15 would make maintenance harder, and would also be much riskier. This is the sort of thing that would need evaluating by any future PEP.

7 Likes

While it’s been a long time since I actively worked on CPython’s gc, that sounds right to me. It’s delicate code, and parts got much harder to follow when “clever tricks” were used to allow cutting the gc object pre-header struct from 3 members to 2. The simpler we can keep that code, the better for all.

Especially in a free-threading world, where corrupted memory due to races is likely to show up billions of cycles later, when gc finally gets around to touching every container object. That’s where memory corruption due to flawed extension modules often showed up even with a GIL.

2 Likes

But we can have two GCs in both 3.14 and 3.15. The old GC should be default, but the new one can be available as an experimental option.

Unless this makes the code much more complicated.

2 Likes

That’s what I meant as well :slight_smile:

@nas and I have both created branches with -X flags to toggle between GC versions. While it is doable, maintaining both versions would increase long-term maintenance overhead.

Despite this, I’m +1 to have two versions in 3.16+.

4 Likes

In my prototype that allows choosing on startup, the incremental GC adds about 1600 lines of code. It’s not too bad in terms of maintenance since code is quite well separated. Keeping gc_inc.c and gc_gen.c as separate compilation units didn’t cause any measurable performance regression. It is more code and more complication and so just going back to old one is the safe and conservative thing to do.

Given our experience with trying to introduce a new GC, we should ideally make a new one opt-in and keep the old as the default (e.g. for the 3.16 release). That’s what Java has done. OTOH, they have a team of research people working on new GCs, we have a couple people tinkering in their spare time. BTW, we do have two GC implementations already, the free-threaded one is basically separate and different.

I ran some extra timing benchmarks last night. Based on those, the incremental GC does have smaller maximum GC pause times. So if you care about that, you might prefer that one. The downsides, at least if there is a lot of cyclic garbage being created, is that process memory use can be dramatically higher (5x was the worst case I saw) and runtime is slower (more time spent in GC, longer total execution time). In one run, I had 1.3 ms max pause with incremental GC, 26 ms max pause with generational GC. Peak RSS size was 2.7x with incremental GC though.

BTW, Sergey has offered to create a PR to revert to the old GC. I’ll be reviewing and I would guess it won’t take too long since we both made prototypes already.

9 Likes

FTR, is that on a real-world workload or on a specifically-designed micro-benchmark that generates tons of cycles?

1 Like

It’s not real world, it’s synthetic. And it just creates a lot of cycles. Getting better real-world like benchmarks or more reports from people testing it for real apps would be nice.

The pyperformance suite contains basically no interesting benchmarks in terms exercising the cyclic GC in a realistic way (benchmark programs don’t use much memory, or don’t run for very long, or don’t create any reference cycles). It has “gc_collect” and “gc_traverse” but these are micro-benchmarks and not at all realistic. I recently added a new one but this doesn’t create cycles. It is intended to test the overhead of GC while you create a large object graph. If you were tuning solely based on that, your conclusion would be to make it so the GC never runs. No cycles so it’s just all overhead.

The most interesting case recently was a Sphinx slowdown. That had the two key features: creating a significant number of container objects and having at least some of those contain reference cycles. That slowdown was resolved.

In the absence of realistic benchmarks and real-world reports, I think an extensive set of synthetic benchmarks would be helpful. We can at least confirm that cyclic GC performance doesn’t degrade too much under the range of situations that those benchmarks cover.

5 Likes

This is the one that triggered the revert in 3.13: the revert happened a couple of days after that issue was opened.

A seemingly eternal problem. Not optimistic. It’s why I said, in a different topic, that the best we can hope for is to stumble into new heuristics to worm around the real-world pathologies that pop up from time to time. Historically, almost all such cases I dealt with in sorting and pymalloc showed up first on Stack Overflow, with “random” users asking for help with “inexplicable slowdowns”.

When I was writing Python’s current sort, the only reports I got of timing results on various platforms came from people running the brief, synthetic, sortperf.py, which was part of the standard distribution at the time. Only one set of results from a real app running real data. They shared the result (“2x faster!”), but could not share their company’s data.

Quite recently I all but begged users to report just timing results (no data required) for a proposed change to the sorting algorithm. Your reply was the only one I got - and thank you for that :smiley:.

Similar story for judging the string of collision resolution strategies the dict implementation has tried. Overwhelmingly driven by synthetic inputs, and all but the current strategy (which hasn’t changed in years) were eventually discarded for catastrophic behavior on rare reports from real-life apps. But in that case, it’s provable that “catastrophic” collections of keys always exist - the question is the much subtler one of “but how likely is real-life data to stumble into one?”.

Not unique to Python. Sebastian Wild. an academic who co-created the terrific “powersort” merge-ordering heuristic, has reported the extremely poor response to his persistent pleas for “real world data”. Mostly he just attracts contrived bad cases.

Whereas I early on opened an issue about seemingly quadratic-time behavior on the main branch. I had no idea gc changes were to blame at the time, and the whittled test case created essentially no cycles. It just provoked the heuristics at the time to run parts of gc far more often than reasonable.

Much of sorting has quite predictable worst-case O() behavior, but the gc context is much messier than that.

It is! No doubt about it. It’s of scant use in predicting “average” behavior (and there’s no such thing as “a typical app” to begin with), but can be of real help in fleshing out the limits of what might be seen in real life.

It does make a case for making it possible to easily try inc gc in a production release. Else the chance of getting any real-world feedback, ever, shrinks from “slim, and mostly only for catastrophic cases” to “essentially none” :frowning:

Have to play the hands we’re dealt :smiling_face:.

3 Likes

Process wise since we know this is a large change for a patch release, do we still have the ability to do release candidates for patch releases to enable some broader testing before declaring it stable? 3.14.5rc1 for example?

2 Likes

My take: unconditional inc gc is too risky for a patch release. The cardinal rule for those is “first do no harm”. That the change is large is less important than that:

  • it’s in a fundamental area of the implementation, which affects all programs (even those that never create a cycle)
  • parts of the code are subtle and delicate
  • it’s an area with a long history of producing highly app-dependent performance “surprises”

The pre-release history of this change is no exception so far, although I think all surprises so far have been discovered by core devs (like the “Sphinx report” was the result of heroic debugging efforts by @AlexWaygood),

And Neil’s synthetic tests establish without doubt that much worse surprises are still possible.

So, “too risky” for my tastes.

@nas seems to believe that a startup option to enable inc gc is doable with reasonable effort. If so, I like that:

  • with luck, no visible changes by default
  • supplies a way for motivated users to get real-world “both ways” results with minimal hassle, potentially greatly increasing the amount of real-world data we can hear back about. Will that actually happen? No, probably not :frowning:.

When I was writing Python’s current list.sort(), for development testing I gave a patch that added it as a new method of list. Comparing “before” and “after” just required those who cared to change one letter in their Python driving code.

That was effective, and made my testing life a lot easier too.

A startup option could make life similarly easier for comparative gc investigations.

4 Likes

I’m put off by that the blurb only mentions an upside (reduced max gc pause times) but no downsides:

I look to NEWS for information, not a happy-talk sales pitch :wink:

I personally don’t care about pause times in most of my apps. Some of them can run for days to complete, and even 5% longer would matter to me: 3 or 4 extra hours of waiting These are typically doing research, and I need results to inform directions to try next. Although, ya, I typically disable gc for hours on end, in phases I know won’t be creating enough (if any) cycles to care about.

Not saying everyone “should be” like me in this respect, and I fully understand that reducing gc pause times is very important to some others’ apps. Am saying it’s important to be up-front about tradeoffs - and even better to discuss them broadly before they’re made. Which the PEP process would address - but somehow that seems like a very heavy process for an internal implementation change to carry.

Then again, most implementation changes don’t come with significant potential downsides.

So no easy answers here from me, just a hope that we can do better at “full disclosure” in the future.

3 Likes

Hmm, we’ve not done that since 3.9.2rc1 in 2021. A quick demo build seemed to work, stopping short of uploading anything to servers, so that whole other half might have made assumptions about no RC after final. I also didn’t trying macOS or Windows builds. It would probably be a case of just doing it and fixing things up as needed.

1 Like

Nit pick: the Sphinx report had been discovered earlier by a contributor, but it was only during the Bellevue sprint we found the root cause. Having lots of us in the same room definitely helped! (Thanks, Meta!)

2 Likes

And great work! Thanks for clarifying.

Another issue I haven’t seen addressed: people are already living with 3.14. Based on my conviction that “first do no harm” is the cardinal rule for patch releases, reverting inc gc can also harm them. In particular, those who’ve put in possibly substantial work to pick values for gc.set_threshold() that work well for their apps with inc gc (and doing so appears to be an effective mitigation for those whose apps were hurt by the swtich to inc gc) may see that effort backfire when going back to the 3-generation collector.

While there’s no way for me to know, my impression is that most people who went down that path didn’t muck with threshold 1, but reduced threshold0 to make gen 1 collections more frequent (and so collect longer-lived cycles sooner). Which is a wrong thing to do for the older 3-gen collector: it collects all the cycles there are every time a gen2 collection is done (inc gc only collects a fraction of them per gen1 try, so to “get the same effect” gen1 collections have to be done more often under inc gc).

From that view, perhaps it’s least harmful overall to keep inc gc the default (it’s already what’s out there), but add a startup option to switch to the older 3-gen collector (acknowledging that inc gc is still delivering unpleasant surprises for some apps).

No pure win to be had here, alas.

1 Like

I made this. Better than a sharp stick in the eye, I suppose. I already found a pretty serious issue with the 3.14t GC, working on a fix for that.

6 Likes

Some results from that tool, 3.13 vs 3.14:

base=/usr/bin/python3  vs  new=./py-3.14/bin/python
   cycle    extra     live     t(s)      t%      rss    rss%     trash    trash%    pause   pause%  peaked
----------------------------------------------------------------------------------------------------------
      10        0      100     1.52   +23.8      17M     +29       24k      +398     1.35      +68     yes
      10        0    10.0k     1.72   +18.6      24M     +38      181k      +124     1.07      -80     yes
      10        0    30.0k     1.71   +12.0      27M     +14      231k        +8     2.31      -77     yes
      10    10.0k      100     1.88   +23.7      41M    +135       24k      +398     1.52      +84     yes
      10    10.0k    10.0k     2.51   +22.1     198M    +108      181k      +124     1.70      -75     yes
      10    10.0k    30.0k     2.60    +9.1     250M     +13      231k        +8     2.56      -82     yes
      10   100.0k      100     6.17   +67.5     251M    +360       24k      +398     1.80       +7     yes
      10   100.0k    10.0k     7.81    -2.6     1.8G    +135      190k      +134     1.70      -89     yes
      10   100.0k    30.0k     7.96   -12.0     2.2G     +13      231k        +8     3.11      -91     yes
      10   300.0k      100    17.09   +15.3     717M    +469       24k      +398     2.47      -27     yes
      10   300.0k    10.0k    19.02    -4.0     5.1G    +125      181k      +124     1.96      -93     yes
      10   300.0k    30.0k    19.42    -7.5     6.5G     +14      231k        +8     2.86      -92     yes
     100        0      100     1.13   +22.9      17M     +30       28k      +431     1.27      +47     yes
     100        0    10.0k     1.27   +21.1      25M     +41      191k      +135     1.54      -63     yes
     100        0    30.0k     1.29   +16.4      27M     +24      231k        +8     1.79      -80     yes
     100    10.0k      100     1.18   +22.0      20M     +44       28k      +431     1.29      +45     yes
     100    10.0k    10.0k     1.35   +20.2      43M     +77      191k      +135     1.40      -68     yes
     100    10.0k    30.0k     1.44   +20.9      49M     +15      231k        +8     2.38      -77     yes
     100   100.0k      100     2.19   +29.3      38M    +138       28k      +431     1.72      +81     yes
     100   100.0k    10.0k     1.89   +24.3     206M    +121      191k      +135     1.57      -70     yes
     100   100.0k    30.0k     1.97    +2.4     247M     +14      231k        +8     2.06      -85     yes
     100   300.0k      100     3.49   +31.2      53M    +142       28k      +431     2.06      +96     yes
     100   300.0k    10.0k     3.06   +10.3     567M    +130      191k      +135     1.16      -88     yes
     100   300.0k    30.0k     3.19    +1.4     683M     +15      231k        +8     2.47      -80     yes
    1.0k        0      100     1.10   +16.2      21M     +48      111k      +383     0.84      -41     yes
    1.0k        0    10.0k     1.24   +20.2      26M     +45      207k      +123     1.47      -71     yes
    1.0k        0    30.0k     1.27   +17.7      29M     +31      268k       +20     2.29      -74     yes
    1.0k    10.0k      100     1.20   +28.4      22M     +52      111k      +383     1.27       +3     yes
    1.0k    10.0k    10.0k     1.28   +23.6      28M     +48      207k      +123     1.44      -67     yes
    1.0k    10.0k    30.0k     1.26   +11.6      31M     +30      268k       +20     1.53      -86     yes
    1.0k   100.0k      100     1.18   +13.8      31M    +102      111k      +383     2.80      +75     yes
    1.0k   100.0k    10.0k     1.28   +22.7      45M     +71      207k      +123     1.09      -68     yes
    1.0k   100.0k    30.0k     1.30   +11.1      53M     +26      268k       +20     1.50      -82     yes
    1.0k   300.0k      100     1.26    -0.3      50M    +196      111k      +383     1.35       -0     yes
    1.0k   300.0k    10.0k     1.45   +26.4      84M     +93      207k      +123     1.51      -68     yes
    1.0k   300.0k    30.0k     1.48   +12.1     102M     +27      268k       +20     2.17      -83     yes

Legend (base vs new, matched by cycle/extra/live):
  t(s)       total time for new build
  t%         percent change in time vs base, (new-base)/base*100
  rss        peak RSS for new build
  rss%       percent change in peak RSS vs base
  trash      max uncollected cyclic-garbage for new build
  trash%     percent change in max trash vs base
  pause      max GC pause (ms) for new build
  pause%     percent change in max GC pause vs base
  peaked     yes if new build RSS and trash peaked before final 25% of run

And then 3.13 vs 3.14t (there is bug with RSS based deferral of full collections)

base=/usr/bin/python3  vs  new=./py-3.14t/bin/python
   cycle    extra     live     t(s)      t%      rss    rss%     trash    trash%    pause   pause%  peaked
----------------------------------------------------------------------------------------------------------
      10        0      100     1.33    +8.3      34M    +150       93k     +1786     4.66     +479     yes
      10        0    10.0k     1.34    -7.3      34M     +89      100k       +24     4.51      -15     yes
      10        0    30.0k     1.36   -10.9      36M     +51      130k       -40     6.09      -40     yes
      10    10.0k      100     2.22   +46.3     126M    +628       91k     +1746     6.81     +726     yes
      10    10.0k    10.0k     2.25    +9.2     138M     +44      100k       +24     7.78      +14     yes
      10    10.0k    30.0k     2.23    -6.8     170M     -24      130k       -40     7.86      -45     yes
      10   100.0k      100     8.09  +119.7     1.2G   +2084       91k     +1746     7.36     +339     yes
      10   100.0k    10.0k     8.13    +1.4     1.3G     +70      100k       +24     7.44      -50     yes
      10   100.0k    30.0k     8.13   -10.2     1.6G     -17      125k       -42     8.75      -76     yes
      10   300.0k      100    19.83   +33.8     2.8G   +2194       91k     +1746     8.10     +138     yes
      10   300.0k    10.0k    19.76    -0.2     3.1G     +38      100k       +24     8.39      -71     yes
      10   300.0k    30.0k    20.24    -3.6     3.9G     -32      125k       -42    10.68      -69     yes
     100        0      100     1.01    +9.0      34M    +150       93k     +1683     4.04     +368     yes
     100        0    10.0k     1.01    -3.5      34M     +90      100k       +23     4.71      +13     yes
     100        0    30.0k     1.02    -8.1      36M     +65      131k       -39     5.40      -40     yes
     100    10.0k      100     1.10   +13.7      42M    +202       91k     +1642     6.69     +656     yes
     100    10.0k    10.0k     1.07    -4.7      42M     +71      100k       +23     5.46      +25     yes
     100    10.0k    30.0k     1.08    -9.8      48M     +12      130k       -40     5.54      -46     yes
     100   100.0k      100     1.71    +1.1     146M    +814       91k     +1642     5.21     +448     yes
     100   100.0k    10.0k     1.72   +13.3     172M     +84      100k       +23     5.57       +8     yes
     100   100.0k    30.0k     1.71   -11.3     189M     -13      125k       -42     6.51      -51     yes
     100   300.0k      100     2.96   +11.3     332M   +1399       91k     +1642     5.35     +410     yes
     100   300.0k    10.0k     2.95    +6.1     364M     +47      100k       +23     8.84       -7     yes
     100   300.0k    30.0k     2.94    -6.5     460M     -23      130k       -39     7.42      -40     yes
    1.0k        0      100     1.02    +7.4      34M    +133       93k      +304     4.15     +189     yes
    1.0k        0    10.0k     1.03    -0.1      34M     +89      100k        +8     4.41      -12     yes
    1.0k        0    30.0k     1.05    -2.9      36M     +64      130k       -42     5.09      -42     yes
    1.0k    10.0k      100     1.03   +10.4      34M    +130       93k      +304     4.51     +267     yes
    1.0k    10.0k    10.0k     1.03    -0.8      34M     +80      100k        +8     4.50       +3     yes
    1.0k    10.0k    30.0k     1.07    -5.7      36M     +50      129k       -42     5.76      -48     yes
    1.0k   100.0k      100     1.06    +2.4      44M    +185       93k      +304     4.52     +182     yes
    1.0k   100.0k    10.0k     1.07    +2.4      44M     +65      100k        +8     4.47      +33     yes
    1.0k   100.0k    30.0k     1.09    -7.1      50M     +18      124k       -44     5.19      -36     yes
    1.0k   300.0k      100     1.22    -2.9      76M    +351       93k      +304     4.64     +241     yes
    1.0k   300.0k    10.0k     1.25    +9.1      76M     +73      100k        +8     5.18      +11     yes
    1.0k   300.0k    30.0k     1.25    -5.3      76M      -6      125k       -44     5.78      -55     yes

After my fix applied to 3.14t, fixes RSS deferral bug:

base=/usr/bin/python3  vs  new=/home/nas/src/cpython/python
   cycle    extra     live     t(s)      t%      rss    rss%     trash    trash%    pause   pause%  peaked
----------------------------------------------------------------------------------------------------------
      10        0      100     2.00   +62.7      27M    +102       23k      +364     1.83     +127     yes
      10        0    10.0k     2.04   +41.2      27M     +53       34k       -58     1.77      -67     yes
      10        0    30.0k     2.08   +36.5      29M     +24       47k       -78     2.59      -75     yes
      10    10.0k      100     2.87   +88.8      31M     +80        6k       +20     1.22      +48     yes
      10    10.0k    10.0k     3.06   +48.9      43M     -55       18k       -78     1.24      -82     yes
      10    10.0k    30.0k     3.31   +38.8      73M     -67       42k       -81     2.67      -81     yes
      10   100.0k      100     8.59  +133.4     105M     +93        6k       +20     1.27      -24     yes
      10   100.0k    10.0k     9.03   +12.6     265M     -66       18k       -78     1.72      -89     yes
      10   100.0k    30.0k     9.42    +4.1     583M     -70       42k       -81     3.57      -90     yes
      10   300.0k      100    20.48   +38.2     211M     +67        6k       +20     2.04      -40     yes
      10   300.0k    10.0k    20.90    +5.5     616M     -73       18k       -78     2.13      -93     yes
      10   300.0k    30.0k    21.20    +1.0     1.3G     -77       42k       -81     3.82      -89     yes
     100        0      100     1.40   +51.5      27M    +102       23k      +338     1.52      +76     yes
     100        0    10.0k     1.45   +38.1      27M     +54       34k       -58     2.27      -46     yes
     100        0    30.0k     1.45   +30.4      29M     +35       47k       -78     2.54      -72     yes
     100    10.0k      100     1.86   +93.3      27M     +97       12k      +121     1.18      +33     yes
     100    10.0k    10.0k     1.96   +74.4      27M     +11       17k       -79     1.08      -75     yes
     100    10.0k    30.0k     1.94   +62.4      31M     -27       42k       -81     2.09      -79     yes
     100   100.0k      100     2.22   +31.1      33M    +108        6k       +13     1.00       +5     yes
     100   100.0k    10.0k     2.24   +47.6      47M     -49       17k       -79     1.05      -80     yes
     100   100.0k    30.0k     2.39   +24.1      73M     -66       42k       -81     2.37      -82     yes
     100   300.0k      100     2.85    +7.0      45M    +104        6k       +13     1.42      +35     yes
     100   300.0k    10.0k     3.45   +24.4      83M     -66       18k       -78     1.16      -88     yes
     100   300.0k    30.0k     3.63   +15.5     169M     -72       42k       -81     2.43      -80     yes
    1.0k        0      100     1.45   +53.8      27M     +88       25k        +9     1.54       +7     yes
    1.0k        0    10.0k     1.47   +43.0      27M     +53       34k       -64     1.69      -66     yes
    1.0k        0    30.0k     1.48   +37.1      29M     +34       47k       -79     1.85      -79     yes
    1.0k    10.0k      100     1.51   +61.8      27M     +85       19k       -17     1.23       -0     yes
    1.0k    10.0k    10.0k     1.51   +45.3      27M     +45       30k       -68     1.83      -58     yes
    1.0k    10.0k    30.0k     1.54   +36.6      27M     +14       56k       -75     2.03      -82     yes
    1.0k   100.0k      100     1.97   +90.5      27M     +77       11k       -52     1.26      -21     yes
    1.0k   100.0k    10.0k     1.88   +81.1      29M     +10       17k       -82     1.01      -70     yes
    1.0k   100.0k    30.0k     1.93   +64.9      31M     -26       41k       -82     1.96      -76     yes
    1.0k   300.0k      100     2.05   +62.3      29M     +73        7k       -70     1.13      -17     yes
    1.0k   300.0k    10.0k     2.05   +78.1      33M     -24       18k       -81     1.43      -69     yes
    1.0k   300.0k    30.0k     1.98   +50.3      41M     -49       41k       -82     2.01      -84     yes
2 Likes