After load test, should memory return to baseline or stabilize higher? How to tell normal behavior from a leak?

Hi all,

I’m trying to understand the expected memory behavior of a Python service (FastAPI in my case, but the question is more general).

Scenario:

Baseline memory: service starts around ~600 MB.

During load: memory usage rises significantly (e.g., 1 GB+ under heavy requests).

After load ends: memory doesn’t return to baseline. Instead, it stabilizes at a higher level (say ~900 MB) and stays there.

My questions are:

1. Should memory return all the way back to the baseline once the load is gone, or is it normal for it to stabilize at a higher “working set” because of Python’s allocator, pools, or library caches?

2. If I run a second load test at the same traffic level, should memory rise further, or will it reuse what was already allocated?

3. How do I distinguish normal plateau behavior from an actual memory leak? For example:

Case 1: Memory rises under load, then flattens at a higher level than baseline and remains stable → is this normal?

Case 2: Memory rises under load and keeps growing even after load ends, or drifts upward across test cycles → is this a leak?

Looking for guidance on what “healthy” memory usage looks like in Python services, and how to tell when to worry.

Please note it’s Fast API microservice dockerized and running on kubernetes pod and python version is 3.10.1

Thanks!

The Python interpreter will hold the memory and not return it to the system. It will then use that memory prior to requesting more from the system.

@brass75 Are you saying what I am seeing is expected? And what exactly would define memory leak from the 2 scenarios I mentioned.

1 Like

If there’s a leak, the memory usage will continue increasing. It might fluctuate up and down, but the overall trend will be upwards. However, if it’s a small leak, it might take a long time before such a trend becomes noticeable.

@MRAB are you saying what I am seeing is expected? Which of the 2 scenarios mentioned by me would fall in the bracket of memory leak.

1 Like

I’d expect that the memory usage would stabilise at some level or, at least, remain below some size.

If it increased a little on a second load, I wouldn’t be too concerned, but if it increased again on a third load, and again a fourth load, etc, then I start looking for a possible leak. The bigger the increase each time, the more likely it is that there’s a leak somewhere.

1 Like

@MRAB understood…but it would never return to the base memory? Is that understanding correct? Unless I restart the kubernetes pod ofcourse.

1 Like

It stabilizing at a higher baseline after heavy load is not unexpected. This could e.g. caused by a higher fragmentation where long lived objects were created during the period of high load leading to pages not being cleared even if they are mostly unused. An extra 300MB is quite a bit, but if your programs does a lot of allocations and deallocations it’s possible.

Again, the question is what happens under repeated high loads in the same process? A single high load doesn’t tell you anything except maybe max memory usage.

1 Like

Since the interpreter handles the memory I don’t know how you could get a leak, per se. Garbage collection is based on reference count so if you are keeping references to unused object inside a collection you might cause one. But Python not releasing memory back to the system is a known thing.

If the algorithm adds an object to a list on each load and does not later remove it from the list at the end of processing you will see the python process memory use go up and up.

2 Likes

@barry-scott @MRAB

We re simulating a load of 10 users, total 500 requests, point to mention is there are asynchronous API calls (LLM calls to OpenAI models) that are happening in the microservice and asynchronous objects as well, what we are making sure is closing asynchronous API call connections and any asynchronous objects, doing explicit gc.collect and using global session rather than having new session per request and closing that session

@brass75 @MegaIng

We re simulating a load of 10 users, total 500 requests, point to mention is there are asynchronous API calls (LLM calls to OpenAI models) that are happening in the microservice and asynchronous objects as well, what we are making sure is closing asynchronous API call connections and any asynchronous objects, doing explicit gc.collect and using global session rather than having new session per request and closing that session

What kind of memory it wont release back to the system in this case?

Typically no memory is released unless its big, say >100k.

The details of what big means will depend on the C runtime and malloc library that is being used.

Below are the load test observations

Sr.No vUser Request Memory (MB) Growth (MB)
1 0 0 113.1 0
2 10 500 155 41.9
3 20 1000 217.2 62.2
4 30 1500 260.5 43.3
5 40 2000 300.6 40.1
6 10 500 169 -131.6

In general the Python interpreter will hold on to the memory (or at least a large part of it) for future use once it has been allocated. It will then reuse this memory as needed rather than asking the system for additional memory while it has that available.

1 Like