File path support for pickle.dump and pickle.load

I’ve been using the pickle library for sometime (mainly the dump and load functions to write python objects into filesystem). I have an idea that may be useful for developers (like me and many others).

While loading or dumping any Python object into the filesystem, we use:

with open('file/path/filename.pkl') as f:
    pkl_obj = pickle.load(f)

# OR to dump
with open('filepath/filename.pkl') as f:
    pickle.dump(pkl_obj, f)

Or some could also use the f = open(filepath) syntax.
Would it be more convenient if we could load or dump files by giving the filepath as an argument to pickle? Instead of the file handle.
I got the inspiration from TensorFlow’s model.save('path') method. Where you need not open files and pass file handles explicitly.

Here’s an example of how the updated usage could look like:

# Dumping an object
pickle.dump(obj, 'data.pkl')

# Loading an object
loaded_obj = pickle.load('data.pkl')

P.S. As suggested by AlexWaygood here, we could also have support for os.PathLike objects.

A wrapper class could provide the same functionality, also, as an individual user I could just go into site-packages and make the required changes. But would it help beginner developers if pickle inherently had file_path support in load and dump?

Here’s a reference to this issue on GitHub.

1 Like

The pickle module shares its API with quite a few others (including json and marshal). Do they also need to grow this functionality?

Programming generally involves composing primitives to achieve our goals. We don’t need a single tool that does everything; we need tools that do one thing and do it well. (Nobody needs a function for “read one line from a gzipped file and strip HTML tags from it”.) This is one where it’s in a bit of a grey area (since reading pickles from file is so very common), but I’m inclined to put this particular one into the “better to do it in two steps” category. Bear in mind that there are a LOT of ways you could open a file (notably, you may use os.open() with its dir_fd parameter to read a file relative to a specific directory), and it would unnecessarily complicate the pickle API to have all of those features.

2 Likes

As I noted in the original issue, I agree that we don’t want to add this functionality. The logical end result would be finding every place in the stdlib that takes a file-like object and extending it to taking a file name as the same parameter, too. Just from a typing perspective alone it seems like a mess.

3 Likes

It is a recurrent theme. Every few month we ge a proposition to make json.load(), or pickle.load(), or whatever to accept also a file path instead of an open file object.

It is true that it could be convenient in some cases, especially in toy projects. But in my practice I rarely loaded JSON or Pickle from a plain file. It is often a compressed file, an entry in the ZIP archive, a result of the HTTP request (or the part of hand-made protocol). So this feature would not be so useful like you think.

This is in addition to what others have said. Accepting such idea for one particular case will open a large can of worms, and we will be forced to answer for requests to add such features in all other cases where the file object is accepted. In some cases it is not possible, because the interface is already overloaded and accepts either raw binary/text data or a file object.

3 Likes