Generation 2 GC events occur frequently and regularly collecting zero objects

rohitsanj · September 19, 2023, 5:47am

We recently tried tweaking the GC threshold values in our FastAPI server running Python 3.11.5 from the default of (700, 10, 10) to incredibly high values of (100_000, 500_000, 1_000_000). From the screenshot below showing generation 2 GC event logs in reverse chronological order, you can see that the only time we logged a non-zero collected value was when I manually ran gc.collect() (generation kwarg defaults to 2) in the main event loop (using a live aiomonitor console session).

All the other logs before and after this show zero collected objects. What’s more peculiar is that they occur almost periodically at 3-4 minute intervals.

Why are the generation 2 GC events occurring so frequently and reporting zero collected objects consistently?

Here’s the code for the log callback function we appended to gc.callbacks:

    def log_gc_lifecycle_event(phase: str, info: dict):
        nonlocal gc_start_time

        now = time.monotonic()

        if info['generation'] != 2:
            # We only care about gen 2 collections.
            return

        info_to_log = {
            'gc_phase': phase,
            'generation': info['generation'],
        }

        if phase == 'start':
            # Retain pre-GC figures.
            gc_start_time = now

        if phase == 'stop':
            duration = now - gc_start_time
            info_to_log.update(
                {
                    'duration': duration,
                    'collected': info['collected'],
                    'uncollectable': info['uncollectable'],
                    'generation_2_object_count': len(gc.get_objects(2)),
                }
            )

            gc_start_time = None

        gc_logger.info(f'Garbage collection event: {phase}', **info_to_log)

nas · September 19, 2023, 5:20pm

Can you determine if something is calling gc.collect() explicitly? That would explain why it happens more often than expected. Given your adjusted thresholds, it should not be happening that often. One way to do it might be to “monkey-patch” the gc.collect() function with your own function (patch is early in startup, before other modules are loaded) and then log a traceback to see what code is calling it. The only other way a generation 2 collection should happen that frequently is at Python process exit. I don’t know if your FastAPI server forks/spawns processes but I would guess not.

rohitsanj · September 19, 2023, 11:02pm

Thanks for your reply! Monkey patching gc.collect led us down the right path to figuring out the root cause for this oddity of seeing regular GC events given such high thresholds.

Here’s the situation: We had GC threshold adjustment code in the on_startup FastAPI lifecycle hook and GC logging code in a separate utility function. One of the places where we called this function was in the module where we initialized the FastAPI app. This caused us to mistakenly associate all GC logs from the container to the hypercorn worker process, not realizing that there’s a separate “main” hypercorn process as well, whose GC logs correspond to the default GC threshold.