PEP 703: Making the Global Interpreter Lock Optional (3.12 updates)

ofek · June 10, 2023, 1:24am

I am also mostly a bystander here but I would like to express how much this would help Datadog (the company I work for) and its customers. I will be speaking in a personal capacity and therefore anything I say is not necessarily endorsed by my employer.

The source of most of the data we collect comes from our Agent that users install on their infrastructure. This collects metrics, events, logs, traces, process information, network information (eBPF), and more that I don’t feel like enumerating (and definitely other stuff that I don’t even know about). This software runs on an extremely large number of hosts across the globe for many customers (small subset) whose names would be familiar to the average person anywhere.

I work on a team that creates and maintains integrations that are shipped OOTB with the Agent. These could be for databases like Postgres, web servers like NGINX, Kubernetes running anywhere, hypervisors like vSphere, IaaS cloud offerings like Azure IoT Edge, SaaS cloud offerings like Amazon MSK (Apache Kafka), hardware devices like Cisco routers, system internals like disk partitions or Windows services, broadly used things like TLS/certificates, etc.

Basically, for anything our customers care about, it is my job to find a way to extract meaningful data

Most of these Agent integrations are written in Python, with each being its own (namespaced) package. While the Datadog Agent is written in Go, integrations aren’t usually for more rapid maintenance and to make it easier for customers since they can not only contribute but also run custom integrations just for their own use cases.

A single Agent will often have enabled at least a dozen integrations, usually with many instances of each. This level of concurrency with the resource usage required is directly hindered by the GIL and the way we ameliorate the constraint is hacky.

We wrote a little about connecting Go to Python here. This component is called the rtloader and here is an example of exposing a new binding to integrations (fun fact: CI passing on the first commit and getting merged was by sheer luck because my dev environment broke that morning and I could not build!).

We run each instance of each enabled integration in its own goroutine, which are assigned to a runner for work. By default, the number of runners is set to 4 and each instance is scheduled to run every 15 seconds. The GIL is managed by Go and while there is “concurrency” and performance is better than without, there is not parallelism, meaning although execution of instances can ping-pong back and forth between each other rapidly based on syscalls and other heuristics there can only ever be one running at any given moment.

Being limited by the GIL has had a negative, material impact on us. Occasionally, a customer’s environment is so large that we simply have to rewrite the integration in Go. Other times a customer has to work out how to best spread the load between different configured instances of an integration or even run multiple hosts to distribute load to different Agents. Every time I hear of a performance issue for a large customer that cannot be resolved I think “darn, a rewrite is coming” or I just feel sad that so much compute is wasted working around the lack of parallelism.

Having the option (hopefully default and only eventually) to remove the GIL would make our use case actually work without hardship for customers and us engineers. Additionally, although I’m not allowed to say how many Agents are deployed this moment nor can I give my savings estimate for fear of inferring the former, the optimization of our resource usage would have a very positive impact on energy consumption/the environment, for those who are concerned about that. We are still growing rapidly so the environmental impact would be commensurate to our growth.

If it is optional for some time rather than the default and only, dependencies that provide extension modules would have minimal impact for us since we already have a build system that can do everything from scratch and is not necessarily reliant on provided wheels. To be clear, my preference would be for this to be the default (Python 4?).

Some notes:

I don’t have much insight into our backend but much of it is still written in Python (less so over time) and we would benefit there just as much as Instagram and other organizations
I write a lot of CLIs and single threaded performance would help me the most, so please interpret my advocacy for proper parallelism as in direct opposition to my personal preference because I view that as the best path here for the long term future of Python