Introducing my library for fast sharing and modification of objects across processes

(I’ve made a new post because the original comment, which was moved to an independent post by a moderator, contained a broken link to my Github project.)

In the ongoing discussions about using Python’s shared memory to pass objects between processes without serialization, I’ve noticed several challenges mentioned, particularly the inability to share arbitrary objects directly due to the frequent need for pickling. While the native multiprocessing.shared_memory primarily supports primitive types and bytes, handling complex data structures efficiently remains an issue.

I’ve explored a few alternatives that try to address this problem, though they still rely on some form of serialization:

In response to these limitations, I’ve developed a package that attempts to facilitate the sharing and modification of more complex Python objects across processes without needing periodic serialization. It’s designed to integrate with the existing Python multiprocessing framework and supports various data types, including numpy and torch arrays, through shared memory. The idea is to reduce the overhead typically associated with pickling large data structures when sharing between processes.

This tool is part of an open-source project, and I welcome any feedback or contributions from the community to improve its functionality or discuss potential integration issues you might encounter. For those interested in looking into the technical details or contributing, here’s the link: GitHub - InterProcessPyObjects .

Looking forward to your thoughts and any feedback you might have!

1 Like

Impressive. Great job.

Does this:

only one process has access to the shared memory at the same time
working cycle:

ii) acquire access to shared memory

iv)release access to shared memory

and

Lock-Free Synchronization

just mean a Python level lock is used, not an OS level lock?

Thank you, James!

It uses its own locking mechanism based on memory barriers (or an alternative mechanism depending on the target CPU architecture), which is used for implementing atomics and OS-based synchronization primitives on the x86-64 platform. Essentially, it employs appropriate intrinsics or other calls available for the target C compiler.

As a result, the implementation is faster than using OS-based synchronization primitives (since there is no need for a syscall in my implementation) and is even slightly faster than using atomics.

At the same time, the whole idea is similar to the concept of Python’s GIL: we temporarily lock the entire shared memory block to perform a large number of operations on objects within it, without numerous locks and unlocks during the working session. This approach is great for asynchronous processing, for example, with asyncio. While another process is working on my request, I’m free to work on other tasks, such as my socket-based code, like a web server.

2 Likes

PS: in an upcomming update with more that two connected processes, a locking algorithm will be changed (and an additional tools will be added) in order to support parallell processing of the different data pieces by several workers.

1 Like