Trying to serialize/deserialize an object using pickle 5 protocol. Having looked at the data size of the object before serialization and after deserialization, I wonder why the original data size is greater than deserialized one? Is that an issue or expected behavior? Note that this is a simplified example. In the original code serialization takes place in one process and deserialization in other to exchange the data between processes.
It is strange that you get the error. What is your python version? Note that Changed in version 3.8: The buffer_callback argument was added. I just noticed that even without pickle 5 protocol the original data size is larger than deserialized. Also, if it had something to do with numpy array views, then making a change in the original array would reflect on its view. However, this is not the case.
Oh, that is interesting. If we make a change in a view of the original array in your example, it will be reflected in the original array. However, if we make a change in the unplickied array, it will not be reflected in the original array.
import numpy as np
import pickle as pkl
import sys
# case 1
arr1 = np.zeros(10)
arr2 = arr1[:]
arr2[0] = 555
# the change is reflected in both arrays
print(arr1)
# [555. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
print(arr2)
# [555. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
# case 2
arr3 = np.zeros(10)
packed_data = pkl.dumps(arr3)
unpacked_data = pkl.loads(packed_data)
unpacked_data[0] = 555
# the change is not reflected in both arrays
print(arr3)
# [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
print(unpacked_data)
# [555. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
I am wondering what object owns the data of the unpickled array then? Doesn’t this seem to be an issue?
It’s puzzlement, yes, but have you had any problems?
Hmm, if the array doesn’t think it owns the data block, then what does? and if the answer is nothing, then there may be a memory leak here, as then nothing would delete the data when it’s done with it.
Yes, as noted at the beginning of my post, in the original code serialization takes place in one process and deserialization in other to exchange the data between processes. And I want to know the data size on both the sender side and the receiver side. However, it looks like I can’t do that with the current state of the things.
Do you think what would be the right place to report the issue? Would it be numpy?
Then sys.sizeof is not the right tool – if you want to know how big a numpy array is, use array.size.
But it seems you are trying to debug numpy – when there’s no indication of a bug :-). – a numpy array will be the same size after being unpickled unless numpy pickling is broken – and I don’t think you’ve had any other indication of an issue, have you?
yes, the numpy list is the place to ask about this – I dont hink anything is broken, but it is interesting.