I’m trying to understand the expected memory behavior of a Python service (FastAPI in my case, but the question is more general).
Scenario:
Baseline memory: service starts around ~600 MB.
During load: memory usage rises significantly (e.g., 1 GB+ under heavy requests).
After load ends: memory doesn’t return to baseline. Instead, it stabilizes at a higher level (say ~900 MB) and stays there.
My questions are:
1. Should memory return all the way back to the baseline once the load is gone, or is it normal for it to stabilize at a higher “working set” because of Python’s allocator, pools, or library caches?
2. If I run a second load test at the same traffic level, should memory rise further, or will it reuse what was already allocated?
3. How do I distinguish normal plateau behavior from an actual memory leak? For example:
Case 1: Memory rises under load, then flattens at a higher level than baseline and remains stable → is this normal?
Case 2: Memory rises under load and keeps growing even after load ends, or drifts upward across test cycles → is this a leak?
Looking for guidance on what “healthy” memory usage looks like in Python services, and how to tell when to worry.
Please note it’s Fast API microservice dockerized and running on kubernetes pod and python version is 3.10.1
If there’s a leak, the memory usage will continue increasing. It might fluctuate up and down, but the overall trend will be upwards. However, if it’s a small leak, it might take a long time before such a trend becomes noticeable.
I’d expect that the memory usage would stabilise at some level or, at least, remain below some size.
If it increased a little on a second load, I wouldn’t be too concerned, but if it increased again on a third load, and again a fourth load, etc, then I start looking for a possible leak. The bigger the increase each time, the more likely it is that there’s a leak somewhere.
It stabilizing at a higher baseline after heavy load is not unexpected. This could e.g. caused by a higher fragmentation where long lived objects were created during the period of high load leading to pages not being cleared even if they are mostly unused. An extra 300MB is quite a bit, but if your programs does a lot of allocations and deallocations it’s possible.
Again, the question is what happens under repeated high loads in the same process? A single high load doesn’t tell you anything except maybe max memory usage.
Since the interpreter handles the memory I don’t know how you could get a leak, per se. Garbage collection is based on reference count so if you are keeping references to unused object inside a collection you might cause one. But Python not releasing memory back to the system is a known thing.
If the algorithm adds an object to a list on each load and does not later remove it from the list at the end of processing you will see the python process memory use go up and up.
We re simulating a load of 10 users, total 500 requests, point to mention is there are asynchronous API calls (LLM calls to OpenAI models) that are happening in the microservice and asynchronous objects as well, what we are making sure is closing asynchronous API call connections and any asynchronous objects, doing explicit gc.collect and using global session rather than having new session per request and closing that session
We re simulating a load of 10 users, total 500 requests, point to mention is there are asynchronous API calls (LLM calls to OpenAI models) that are happening in the microservice and asynchronous objects as well, what we are making sure is closing asynchronous API call connections and any asynchronous objects, doing explicit gc.collect and using global session rather than having new session per request and closing that session
In general the Python interpreter will hold on to the memory (or at least a large part of it) for future use once it has been allocated. It will then reuse this memory as needed rather than asking the system for additional memory while it has that available.