Memory leak in sockets

Hello,

When you run this small code on Linux, the memory increases forever. Any idea why?
This code simply create 2 servers and exchange data between a client and the 2 servers like this:
client <=tcp=> forwarder server <=tcp=> echo server

The client send a random data to forwarder server that forward data to “echo server”, and the echo server send back received data. When client got back original data, he send again a new random data and so on forever.

If you comment/uncomment the BUFFER_SIZE at start of application, no more memory leak. Why?
Also seems to have same behaviour on any python version, including python2.

#!/usr/bin/env python3

import os
import time
import random
import struct
import socket
import threading

BUFFER_SIZE =   32768
#BUFFER_SIZE =   1048576

# Connect to a host
def connect(address, port):
    handle = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    handle.settimeout(None)
    handle.connect((address, port))
    return handle

# Download data from socket
def download(host, size):
    data = bytearray()
    while size > 0:
        buffer = host.recv(1048576)
        if len(buffer) > 0:
            data += buffer
            size -= len(buffer)
        else:
            return None
    return bytes(data)

# Run a new thread
def thread(target, args):
    handle = threading.Thread(target=target, args=args)
    handle.start()
    return handle

# Exchange data
def exchange(local, remote):
    while True:
        remote.sendall(local.recv(BUFFER_SIZE))

# Data forwarder
def forwarder(local):
    print("New forwarder client <=> forwarder <=> echo")
    remote = connect("127.0.0.1", 22222)
    thread(exchange, [local, remote])
    thread(exchange, [remote, local])

# Echo
def echo(client):
    print("New echo")
    while True:
        data = client.recv(4)
        size = struct.unpack("!I", data)[0]
        data = download(client, size)
        if data:
            client.sendall(data)
        else:
            break

# Client
def client(hostname, port, funder):
    print("New client")
    remote = connect("127.0.0.1", 11111)
    while True:
        size = random.randint(1, 4096 * 64)
        data = os.urandom(size)
        remote.sendall(struct.pack("!I", size) + data)
        if data != download(remote, size):
            print("Invalid echo")

# Server
def server(function, port):
    print("Server binding on port {}".format(port))
    handle = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    handle.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    handle.settimeout(None)
    handle.bind(("127.0.0.1", port))
    handle.listen(5)
    while True:
        thread(function, [handle.accept()[0]])

# Main
def main():
    thread(server, [forwarder, 11111])
    thread(server, [echo, 22222])
    time.sleep(1)
    for i in range(100):
        thread(client, ["127.0.0.1", 22222, True])
    while True:
        time.sleep(1)

if __name__ == "__main__":
    main()

Quickly looking I do not see you closing any sockets.
If you run lsof on the server processes do you see open TCP connections that grow?

Not sure if it’s significant, but your two buffer size options are 32K and 1024K, with your message being 256K. That means the smaller buffer size is going to have to break up the message into pieces, where the larger one won’t.

Are you sure memory increases forever? How far did you let it run?

@barry-scott there is no “close” because I did try to make this code as small as possible to reproduce this bug I have in a bigger application. Also after all tcp connections are established, there is no new tcp connections : only data are exchanged between existing tcp connections. So “lsof” doesn’t show any growing tcp connections.

@Rosuav yes memory increases forever, you can try. I did run it for 1 hour.

What os and version are you running this on?
What python version and where from?

I run it on Linux (Ubuntu) and on any python3 version (even python2) : it seems to always memory leak whatever version used.

I asked for the version, please tell us which version of python 3

Sure I use Ubuntu 20.04.4 LTS on Python 3.8.10.
But as I told you, seems to memory leak on all versions of python, including python2.

Also I did a new code with static buffer size (same maximum size for send / receive!) that memory leak only if the size of the original buffer sent is randomized. Here we go:

#!/usr/bin/env python3

import os
import time
import random
import struct
import socket
import threading

RANDOM =        1
BUFFER_SIZE =   32768

# Connect to a host
def connect(address, port):
    handle = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    handle.settimeout(None)
    handle.connect((address, port))
    return handle

# Download data from socket
def download(host, size):
    data = bytearray()
    maximum = BUFFER_SIZE if size > BUFFER_SIZE else size
    while size > 0:
        buffer = host.recv(maximum)
        if len(buffer) > 0:
            data += buffer
            size -= len(buffer)
        else:
            return None
    return bytes(data)

# Run a new thread
def thread(target, args):
    handle = threading.Thread(target=target, args=args)
    handle.start()
    return handle

# Exchange data
def exchange(local, remote):
    while True:
        remote.sendall(local.recv(BUFFER_SIZE))

# Data forwarder
def forwarder(local):
    print("New forwarder client <=> forwarder <=> echo")
    remote = connect("127.0.0.1", 22222)
    thread(exchange, [local, remote])
    thread(exchange, [remote, local])

# Echo
def echo(client):
    print("New echo")
    while True:
        data = download(client, 4)
        size = struct.unpack("!I", data)[0]
        data = download(client, size)
        if data:
            client.sendall(data)
        else:
            break

# Client
def client():
    print("New client")
    remote = connect("127.0.0.1", 11111)
    while True:
        size = random.randint(1, BUFFER_SIZE - 4) if RANDOM else BUFFER_SIZE - 4
        data = os.urandom(size)
        remote.sendall(struct.pack("!I", size) + data)
        if data != download(remote, size):
            print("Invalid echo")

# Server
def server(function, port):
    print("Server binding on port {}".format(port))
    handle = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    handle.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    handle.settimeout(None)
    handle.bind(("127.0.0.1", port))
    handle.listen(5)
    while True:
        thread(function, [handle.accept()[0]])

# Main
def main():
    thread(server, [forwarder, 11111])
    thread(server, [echo, 22222])
    time.sleep(1)
    for i in range(100):
        thread(client, [])
    while True:
        time.sleep(1)

if __name__ == "__main__":
    main()

If you swap RANDOM = 1 to RANDOM = 0, no more memory leak. Really strange behaviour of python.

I think I know what you are seeing.

You have over 100 threads running.
Each thread has its own Malloc arena to keep lock contention low in glibc.
You are allocating memory of a random size.
Malloc does not free the memory if unless it big,
(I forget the threshold)
Also python has look aside memory lists that will not be freed as well.

You will cause approx 100 arenas to grow as the program runs. (There are ways to control the max num of arena in globe)
You are also likely to be causing fragmentation that will grow memory usage.

If you leave the program to run for long enough it’s memory usage will top out I expect.

Have a look an man mallopt that lists env vars you can set to change the Malloc works.

Barry

I think you are right, the memory usage seems to not increase after a long time.

But still, 100 threads that only exchange data between 2 sockets make this script uses 1.5 GB of memory on my computer which is quite huge.

Do you know how to prevent this quite huge memory usage? I mean by modifying python script algorithm, not by changing malloc settings.

Thank you.

It’s the way that malloc works, and change will involve malloc options.
You can change the options using environment variables.
Set them before you start python.

The other answer is do not use threads use async I/O.
You may need to use a multi-process design to scale to huge numbers of connections, but 100 is likely to be ok on 1 process.

OK thank you. :slight_smile:

FYI this is from the systemd service that we use to start our products main service.
It also had this issue with memory here is wh
at we added:

# In the <service>, because of the GIL, lock contention is not
# usually a problem. The cost of low lock contention is less efficient
# use of memory. In the case of the <xxx> process the worst
# case is each thread allocates and later frees memory for a <class>
# object. That memory cannot be use in other threads, which leads
# to the size of <xxx> process being at least 2 times bigger then
# when MALLOC_ARENA_MAX is 1. But could in the worst case to 11
# times bigger (because there is a thread pool with 10 threads).
#
Environment=MALLOC_ARENA_MAX=1

BTW since this was added we removed use of thread pools becuase the use of threads benchmarked as slowing down the service. The service uses the twisted async frame work.

Barry

Yes I will use asyncio too, it seems to work a lot better. Thanks a lot man! :slight_smile: