Need Help: Pickle returns 'unsupported procol: 128'

Bug report

Bug description:

I am trying to build a video chat using sockets (basically send a video from the client side and receive a video on the server side).
I ran up this problem that no one on the internet encountered before:

Already tried specifying each protocol on the dump call, but it did not change anything. All the answers online are error caused by version mismatch (python 2.7 calling protocol=3, as an example).

Here are my files:

Server.py:

import cv2
import socket
import pickle
import numpy as np
import struct

host = ''
port = 8000

def main():
    
    server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    server.bind((host, port))
    print("Socket bind complete")
    server.listen(10)
    print("Listening...")

    conn, addr = server.accept()

    data = b''
    payload_size = struct.calcsize("s")
    while True:
        while len(data) < payload_size:
            data += conn.recv(4096)
        msg = data
        msg_size = len(msg)
        if msg_size == 0:
            break
        while data < msg:
            data += conn.recv(4096)
        frame = data
        data = b''

        frame = pickle.loads(frame)
        cv2.imshow('VideoS', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    conn.close()

if __name__ == "__main__":
    main()

client.py

import cv2
import socket
import pickle
import struct

server = 'localhost'
port   = 8000


def main():
    # camera
    cap = cv2.VideoCapture(-1)
    if cap.isOpened() == False:
        print("Couldn't open the Camera. Exiting..")
        exit()
    clientsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    clientsocket.connect((server, port))
    print("Connected to " + server + " at " + str(port))
    while cap.isOpened():
        ret, frame = cap.read()
        if ret == False:
            print("Couldn't read from camera")
            break
        # display
        cv2.imshow("VideoC", frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
        # frame serialised
        data = pickle.dumps(frame,protocol=pickle.HIGHEST_PROTOCOL)
        clientsocket.sendall(struct.pack('s', data)+data)
    clientsocket.sendall(struct.pack('s', b''))
    print("Exiting..")
    cap.release()
    cv2.destroyAllWindows()
    clientsocket.close()

if __name__ == "__main__":
    main()

CPython versions tested on:

3.9

Operating systems tested on:

Linux

Stepping through the code:

    payload_size = struct.calcsize("s")

payload_size will be 1.

    while True:
        while len(data) < payload_size:
            data += conn.recv(4096)

recv will return to 1…4096 bytes. At this point, 1 <= len(data) <= 4096.

        msg = data

msg is now the same as data.

        msg_size = len(msg)
        if msg_size == 0:
            break

len(msg) >= 1, so the condition is always false.

        while data < msg:

msg is the same as data, so data == msg and the condition is always false. Why are you even asking whether data < msg?

            data += conn.recv(4096)
        frame = data

frame is the same as data.

        data = b''

        frame = pickle.loads(frame)

You’re giving all of the received data to loads, but what if there were more bytes after the actual pickle data?

When sending bytes over a socket, how does the receiver know how much to expect?

Some options are:

  1. Known, fixed size.

  2. Size of data followed by data.

  3. Data then end marker that won’t occur within data.

2 Likes

Hi Matthew,

This condition

while data < msg:

Truly doen’s make sense. Already removed it. But it’s my first time messing around with sockets and I can’t realy understand how to read those bytes from client on the server side.

Thanks.

In client.py, send the size of the data followed by the data:

clientsocket.sendall(struct.pack('!I', len(data)))
clientsocket.sendall(data)

You could mark the end by sending a size of 0:

clientsocket.sendall(struct.pack('!I', 0))

In server.py, receive the size of the data and then the data:

while True:
    size = struct.unpack('!I', recvall(clientsocket, struct.calcsize('!I')))

    if size == 0:
        break

    data = recvall(clientsocket, size)
    ...

Here, for convenience, I’ve defined a function recvall:

def recvall(conn, size):
    data = b''

    while len(data) < size:
        chunk = conn.recv(size - len(data))

        if not chunk:
            break

        data += chunk

    return data
1 Like

This makes no sense. struct.pack('s', data) gets the first byte from data, because the format 's' means a single byte (it’s supposed to be a “byte string” of a specified length, but that length defaults to 1 since it wasn’t specified). Then the entire data is appended to that, so basically the first byte just gets duplicated for no reason.

That corrupts the data. Presumably, the byte in the data that is supposed to tell pickle the format of the picked data, picks up some other byte instead, which happens to have the 128 value in your case.

This line should just say clientsocket.sendall(data). data is already a bytes object, so there is no necessary or meaningful “conversion” to do.


As an aside: pickle is not really meant for this kind of serialization. First off, it is not secure. The server has no way to verify that the client is even based off of your client.py code, never mind that the data is valid and not malicious.

But even in a controlled environment where you know it is only ever your own client and your own server (but why do you need a client-server architecture??): at a minimum, it would require the client and server to have the same version of cv2, and it might depend on platform details as well. Simply re-constructing the object on the server, based on the data the client has, doesn’t mean that the result is compatible with the server’s definition for that object’s class.

As a simpler example, suppose we simulate being on the “client”, writing a pickle file that the “server” will later read:

>>> class x:
...     def method(self): print("I don't need a value")
... 
>>> import pickle
>>> with open('my_object.bin', 'wb') as f:
...     pickle.dump(x(), f)
... 
>>>

Now, let’s quit and restart the interpreter, and suppose we are on a “server” that doesn’t have a definition for class x, but has downloaded the my_object.bin file:

>>> import pickle
>>> with open('my_object.bin', 'rb') as f:
...     my_object = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
AttributeError: Can't get attribute 'x' on <module '__main__' (built-in)>

Even if we define the class, it has to be exactly compatible, or other problems can occur:

>>> class x:
...     def __init__(self, value): self._value = value
...     def method(self): print("I have", self._value)
... 
>>> with open('my_object.bin', 'rb') as f:
...     my_object = pickle.load(f)
... 
>>> my_object.method()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in method
AttributeError: 'x' object has no attribute '_value'