I believe I’ve found a bug in Python’s threading but I’m still a bit novice to programming in general so I don’t want to post it to the Github bug reports without confirming here first. The bug does not reproduce on macOS or Linux using Python 11.2, but does reproduce reliably on Windows 11 and Windows Server 2016.
I have two files coded as below:
threading_mem_leak.py
import threading
import socket
import gc
def main():
ServerSideSocket = socket.socket()
host = ''
port = 8083
try:
ServerSideSocket.bind((host, port))
except socket.error as e:
print(str(e))
#print('Socket is listening')
ServerSideSocket.listen(5)
while True:
Client, address = ServerSideSocket.accept()
thread_unique_name = address[0] + ":" + str(address[1])
thread = threading.Thread(group=None,target=multi_threaded_client,
name=thread_unique_name,
args=(Client,), kwargs={}, daemon=None)
#print("Connected to: " + address[0] + " : " + str(address[1]))
thread.start()
print("The thread for connection " + thread.getName() + " has started")
#print(threading.enumerate())
thread.join()
gc.collect()
def multi_threaded_client(connection):
data = connection.recv(2048)
data = data.decode('utf-8')
if not data:
connection.close()
del data
del connection
gc.collect()
return
connection.close()
msglist = data.split('|')
var1 = msglist[0][2:]
var2 = var1[:-3]
var3 = msglist[1]
# code that does stuff here #
del connection
del msglist
del data
del var1
del var2
del var3
gc.collect()
return
if __name__ == "__main__":
main()
client.py
import socket
while True:
ClientSideSocket = socket.socket()
try:
s = socket.socket()
except socket.error as err:
print("Socket Error")
bytes2send = str.encode("var1|var2|var3")
try:
s.connect(('127.0.0.1', 8083))
except socket.error as err:
print("couldn't connect")
try:
s.sendall(bytes2send)
except:
print("couldn't send data")
s.close()
If you start it, you will watch the memory use grow uncontrollably on Windows. Overnight and with multiple clients it grew to over 1 GB. If you run both scripts in CMD on your local workstation, you can watch the memory usage grow in as little as an hour. This is a long running server side script that accepts data from clients based on events that trigger the data.
I tried replicating the memory growth running the same scripts on macOS Ventura 13.2 and Fedora Linux 37. On these systems, the threads seem to be garbage collected appropriately. On Windows, they do not.
I’ve refactored the code numerous times, adding the dels and the manual garbage collection. I tried setting the generational garbage collection thresholds, and using both the _thread module instead of threading and the memory leak issue seems to be present in all versions.
Is this a bug or am I terrible at programming and threads? If I’m terrible at programming threads, what can I do better?