How to unpickle a large object?

BrentBaccala · July 27, 2020, 10:00pm

I’m working on some code (Sage’s multivariate polynomials) where I need to pickle and unpickle large objects (polynomials with millions of terms), and I’m trying to figure out how to do it well.

The current code converts the polynomial to a dictionary (exponent -> coefficient maps) and pickles the dictionary. Unpickling is done by supporting initialization of the polynomial object from a dictionary.

Obviously, this is problematic for large polynomials, since we need to duplicate all of the data, both when pickling and when unpickling.

Pickling doesn’t seem like too much of a problem - just create an iterator that produces key/value pairs and return it from __reduce__.

Unpickling is the problem. First, the object is supposed to be immutable and doesn’t currently have a __setitem__ method. A more serious problem is that the data is maintained as a sorted linked list in the underlying C library. Inserting individual items is slow. The current code, when initializing from a dictionary, puts everything into a bucket, then once everything is present, sorts the bucket once and forms the linked list.

So, how to implement __setitem__ efficiently? I’m thinking that it would be best to get some kind of notification when the unpickling is complete, so that the sort can be delayed until then.

Looking at the unpickle code, it seems like __setstate__ is the very last thing that gets called. So, I could implement a __setitem__ method that just sticks everything into a bucket, and a __setstate__ method that finalizes the initialization (sorting and forming the linked list).

Don’t know if I dare publish such code. Would it be reliable? Don’t think there are any guarantees about the order that __setstate__ gets called.

Any other ideas?

Topic		Replies	Views
How to pickle a derived class which base class has __slots__ but does not have __setstate__/__getstate__ Python Help help	3	1452	November 23, 2021
What does the pickle module do in laymans terms? Python Help help	7	882	August 17, 2023
Sixth element of tuple from __reduce__(), inconsistency between pickle and copy Core Development	4	1415	January 10, 2022
Custom unpickler and pickler for the shelve module Ideas	6	851	December 1, 2022
[pickle] Original data size is greater than deserialized one using pickle 5 protocol Python Help	8	1275	February 2, 2023

How to unpickle a large object?

Related Topics