Threading memory bug in Windows, not macOS or Linux

I believe I’ve found a bug in Python’s threading but I’m still a bit novice to programming in general so I don’t want to post it to the Github bug reports without confirming here first. The bug does not reproduce on macOS or Linux using Python 11.2, but does reproduce reliably on Windows 11 and Windows Server 2016.

I have two files coded as below:
threading_mem_leak.py

import threading
import socket
import gc

def main():
    ServerSideSocket = socket.socket()
    host = ''
    port = 8083
    try:
        ServerSideSocket.bind((host, port))
    except socket.error as e:
        print(str(e))
    
    #print('Socket is listening')
    ServerSideSocket.listen(5)

    while True:
        Client, address = ServerSideSocket.accept()
        thread_unique_name = address[0] + ":" + str(address[1])
        thread = threading.Thread(group=None,target=multi_threaded_client, 
                                  name=thread_unique_name,
                                  args=(Client,), kwargs={}, daemon=None)
        #print("Connected to: " + address[0] + " : " + str(address[1]))
        thread.start()
        print("The thread for connection " + thread.getName() + " has started")
        #print(threading.enumerate())
        thread.join()
        gc.collect()


def multi_threaded_client(connection):
    data = connection.recv(2048)
    data = data.decode('utf-8')
    if not data:
        connection.close()
        del data
        del connection
        gc.collect()
        return
    connection.close()

    msglist = data.split('|')

    var1 = msglist[0][2:]
    var2 = var1[:-3]
    var3 = msglist[1]

    # code that does stuff here #

    del connection
    del msglist
    del data
    del var1
    del var2
    del var3

    gc.collect()
    return

if __name__ == "__main__":
    main()

client.py

import socket

while True:
    ClientSideSocket = socket.socket()
    try:
        s = socket.socket()
    except socket.error as err:
        print("Socket Error")
    bytes2send = str.encode("var1|var2|var3")
    try:
        s.connect(('127.0.0.1', 8083))
    except socket.error as err:
        print("couldn't connect")
    try:
        s.sendall(bytes2send)
    except:
        print("couldn't send data")
    s.close()

If you start it, you will watch the memory use grow uncontrollably on Windows. Overnight and with multiple clients it grew to over 1 GB. If you run both scripts in CMD on your local workstation, you can watch the memory usage grow in as little as an hour. This is a long running server side script that accepts data from clients based on events that trigger the data.

I tried replicating the memory growth running the same scripts on macOS Ventura 13.2 and Fedora Linux 37. On these systems, the threads seem to be garbage collected appropriately. On Windows, they do not.

I’ve refactored the code numerous times, adding the dels and the manual garbage collection. I tried setting the generational garbage collection thresholds, and using both the _thread module instead of threading and the memory leak issue seems to be present in all versions.

Is this a bug or am I terrible at programming and threads? If I’m terrible at programming threads, what can I do better?

Maybe your code is stuck in the recv?
Try adding print statements to show where the code is going.
The decode() results is not stored anywhere.

Cannot see why you do this. What are you expecting this to do?

My guess is that it’s to convert the bytestring into a text string, which normally should be done by decoding, but data.decode() isn’t being assigned back to data.

This is probably irrelevant to the memory leak, but here’s how I would do it:

def multi_threaded_client(connection):
    data = connection.recv(2048)
    if not data:
        connection.close()
        del data
        del connection
        gc.collect()
        return
    connection.close()
    data = data.decode("utf-8")

    msglist = data.split('|')

    var1 = msglist[0][2:]
    var2 = var1[:-3]
    var3 = msglist[1]

Though in a production system, I would probably end up with a buffered-read system, guaranteeing that we receive an entire message (even if it doesn’t fit in one packet). That would look something like this:

    buf = b""
    while b"\n" not in buf:
        data = connection.recv(2048)
        if not data:
            connection.close() # might not even be necessary, it's already closed
            return
        buf += data
    data, _, buf = buf.partition(b"\n")
    data = data.decode("utf-8")

Or variants on that idea.

1 Like

Check with task manager if your server is leaking threads.
If leaking threads debug why your threads do not terminate.

I understand that when I refactored my code to sanitize it of unnecessary information I forgot to change one of my variables.

Focusing on that and not addressing the main concern isn’t helpful at all.

Unfortunately, when you ask for help, all we can look at is the code you posted. We can’t magically figure out anything about your intentions. That’s why it’s VERY important to get this sort of sanitization correct - which means, first you refactor and sanitize, then you test again to make sure the problem still happens, and only once that’s confirmed, post the actual code that you tested to ask for help.

Except that it’s necessary. It might not be the help you want, but it’s the help you get. We can’t tackle the “main concern” if the code you post doesn’t run.

It won’t show a problem as posted because threading_mem_leak.py simply defines 2 functions and then quits. It never calls either of the functions.

Also, in client.py, you’re sending to the socket even if it fails to connect.

I’ve tried it on Windows 10 and I didn’t see any memory growth.

Thank you for pointing that out! The missing part has been added and is:

if __name__ == "__main__":
    main()

I did change that in a later version I made today. It now has try except exit() in place for connection failures.

Do you know which Python version you’re using? I’m using CPython 3.11.2 from the Windows 64-bit installer on Windows 11 Education 10.0.22000 Build 2200. When I start the file it starts around 7 MB of Memory usage per the details tab of Windows Task Manager. If I let it run say 10 minutes, it will be up over 8 MB of memory usage, and it will keep growing from there. The growth isn’t exponentially fast, but it is steady. When left overnight, with multiple clients sending data to it, it will be in the 100s of megabytes or with 140 clients sending data to it, over a gigabyte of memory usage after roughly 14 hours.

Interestingly enough if I start with

from threading import Thread

and change

thread = threading.Thread(stuff here)

to just

thread = Thread(same stuff here)

then change

thread.start()

to

thread.run()

and then eliminate

thread.join()

the memory leak seems to disappear completely. It’s somehow related to threading.Thread.start which is supposed to invoke thread.run in a separate thread of control, but it doesn’t appear to return or sys.exit in a manner that allows the same garbage collection of the thread’s resources, at least on my Windows 11 device and a Windows Server instance.

One other point that is of note, I was able to put the socket creation in a loop, then everything after socket.listen() in a nested loop, and if I exited the nested loop and closed the socket to reopen it later, the memory usage immediately went down. So it has something to do with the relationship between using threading.Thread, passing it the Client result from socket.socket.accept, and then using thread.start instead of thread.run.

Oh well, I guess it’s just me and my Windows Servers and Windows workstations that experience this :slight_smile:

I’ve since improved the entire script by subclassing Thread, and there’s no memory leak and it likely won’t ever go above 7 MB of usage with just one client sending data to it. It also has no manual garbage collection.

from threading import Thread
import socket

class con_thread(Thread):
    def __init__(self, Client, group=None, target=None, name=None,
                 args=(), kwargs=None, verbose=None):
        super(con_thread,self).__init__()
        self.connection = Client

    def run(self):
        data = self.connection.recv(2048)
        data = data.decode('utf-8')
        if not data:
            del data
            return
        self.connection.close()
        #print(data)
        msglist = data.split('|')
        var1 = msglist[0][2:]
        var2 = var1[:-3]
        var3 = int(msglist[1][:1])

        # code that does stuff here #

        del self.connection
        del msglist
        del data
        del var1
        del var2
        del var3
        return

if __name__ == "__main__":

    ServerSideSocket = socket.socket()
    host = ''
    port = 8083
    try:
        ServerSideSocket.bind((host, port))
    except socket.error as e:
        print(str(e))

    while True:
        try:
            ServerSideSocket.listen(40)
        except socket.error as err:
            print(err)
        try:
            Client, address = ServerSideSocket.accept()
        except socket.error as err:
            print(err)
        thread_unique_name = address[0] + ":" + str(address[1])
        thread = con_thread(Client)
        thread.run()
        #print(thread.getName() + " has started")

You’re right. But on that new script I can change out thread.run to thread.start, add a thread.join at the end and watch the memory use climb.

If I change it to be like this:

if __name__ == "__main__":

    while True:
        ServerSideSocket = socket.socket()
        host = ''
        port = 8083
        try:
            ServerSideSocket.bind((host, port))
        except socket.error as e:
            print(str(e))

        while True:
            try:
                ServerSideSocket.listen(40)
            except socket.error as err:
                print(err)
            try:
                Client, address = ServerSideSocket.accept()
            except socket.error as err:
                print(err)
            thread_unique_name = address[0] + ":" + str(address[1])
            thread = con_thread(Client)
            thread.start()
            thread.join()
            #print(thread.getName() + " has started")

I just watch the memory use climb. And my threads should be returning and joining. Then if I exit the nested while true loop using some if statement that breaks out of it and I close the socket and re-open it, the memory usage goes back down. I’m not sure why, I’ve tried to trace back through the Python module code and I’m not savvy enough yet to understand all of its inner workings.

It’s like having the socket listen and accept connections inside the nested loop and using that connection info to spawn threads means things, presumably including things inside the thread created to process the data received from the connections, don’t get garbage collected.

It’s got to be the thread data, right? Because if it was just socket files not being garbage collected, then I’d see the memory issue when the loop contains thread.run instead of thread.start, right?

        thread = con_thread(Client)
        thread.run()

this executes the con_thread.run code in with the caller thread:

In [8]: import threading

In [9]: t = threading.Thread(target=lambda *a,**kw: print(threading.current_thread()))

In [10]: print(threading.current_thread())
<_MainThread(MainThread, started 140044052092736)>

In [11]: t.run()
<_MainThread(MainThread, started 140044052092736)>

I ran your code in 64-bit Python 3.11.2 on a Windows 11 system for a few hours. The memory usage and handle count were stable – no leaks. You can use Sysinternals Process Explorer to check the running threads and open handles. You can use Sysinternals VMMap to examine the process memory.

I tested it with Python 3.11.1. I won’t expect that to make a difference, and I see that Eryk Sun didn’t see a problem even with Python 3.11.2 on Windows 11.

Did you run both scripts so one was sending data to the other? And did you run the thread.start/thread.join version or the thread.run version?

I ran the client, and I ran the version of the server that actually starts new threads via the start() method. Each new thread exited naturally with the return of the target function, multi_threaded_client(). Because of the join(), the threads ran sequentially, such that the server process never had more than two threads, the main thread and the worker thread.

There’s no point to using run(). That’s just calling a method on the current thread.

[quote=“Eryk Sun, post:16, topic:24605, full:true, username:eryksun”]

Thank you for providing this verification.

Thank you for providing this verification as well. I’ve now also replicated the issue using thread.start and thread.join on a Windows 10 Education device using Python 3.11.2. Later this evening I will try on a Windows 11 Home device. I think I’ll also try to screen record it on one of these devices and post it on YouTube and share it here so you can see that I’m not insane.

You ran both scripts so that they communicated with each other and the server script that was accepting the connections was using thread.start? Did you test on Windows 11 as well?

I ran the original threading_mem_leak.py with the fix to run main and 100 of client.py on Windows 10. I don’t have Windows 11.

Well, looks like you guys are right. It must be environment specific to something that is installed in the environment where I first discovered the issue. For some reason, a Windows 11 Education device, a Windows 10 Education device, and a Windows 11 Pro device in that environment all exhibit the same behavior.

When I bring the code outside of that environment and run it on macOS 13.2, Windows 11 Home 22H2, or Fedora 37 the issue goes away completely. Rock solid memory usage.

The only thing I can think of that is the same across all of the devices where the issue is found is the AV/Firewall software, Trend Micro Apex One, or possibly some Windows settings. I have further testing to do. Thank you for your assistance and your patience.

The interpreter doesn’t keep an open handle for kernel thread objects, and when a thread terminates, the thread environment block (TEB), stack, and thread-local storage (TLS) should all be deallocated automatically. Some monitoring program or debugger could be keeping terminated thread objects, TEBs, and stacks around for inspection. In Process Explorer, if you see that the “python.exe” server process has thousands of thread handles, that’s not due to Python. In VMMap, if you see that the process has thousands of thread stacks, but Process Explorer shows only a couple of threads alive, that’s not due to Python.

I haven’t gone that far with it yet, but I will probably take a look again next Tuesday. Thank you for the heads up and confirming that you didn’t have the issue in Windows. I don’t always have easy access to a Windows device outside of work due to my home preference being macOS, so all of my Windows devices were showing the exact same issue.

I will have to learn to get comfortable with those sysinternals tools. They definitely can provide more information in Windows that is useful. If I find more information about the root cause, are you interested in hearing more?