Overcome *PoolExecutor class dangers

In their present form, the *PoolExecutor classes can be dangerous in high-pressure production environments, if used purely as documented.

(For example, library calls in an asyncio environment which [in]directly cause DNS lookups will trigger the use of the event loop’s current thread pool executor. If that executor is maxed, this can cause massive delays in completion of the DNS lookup which, in turn, can cause intermittent timeouts much higher in the chains, and an avalanche of unpredictable failures, which can become difficult and time-consuming to debug.)

The only current remedy to the various dangers is if a developer subclasses the pool executors and implements off-label features by studying the stdlib class internals. But this is unsafe in a different way, because internal implementations of library classes can change radically between Python versions.

Currently, the various pool executors, such as ThreadPoolExecutor, derive directly from Executor. Also, they are highly opaque black boxes, with very few documented methods.

For example, there appears to be no document-compliant way to manage the pool for production-critical operations like:

  • determining the current worker limit

  • determining how many workers are active, and how long they’ve been active for

  • increasing or decreasing the worker limit

  • taking inventory of active workers

  • selectively killing arbitrary workers

So, this post is a feature request to implement an abstract PoolExecutor class which:

  • is derived from Executor

  • becomes the parent class of all other worker executors such as ThreadPoolExecutor, ProcessPoolExecutor and InterpreterPoolExecutor

  • exposes abstract methods/attributes for:

    • implementing the above management operations

    • requiring stdlib subclasses to provide their own implementations of these, which will have the same method signatures, but internally will of course vary hugely

  • provides hooks for user-written subclasses to intercept adding/termination of workers, and other events

  • provides a timeout mechanism to propagate exceptions up the various chains if it takes too long in a full pool object for a new worker to get added

  • is ergonomic for subclassing, both directly from PoolExecutor, and also from subclasses like ThreadPoolExecutor

If users of PoolExecutor-based classes are able to monitor current/maximum worker counts, worker startup delays, and modify resource limits in real time, it will go a long way towards increasing the safety of these classes in real world production environments, and further erode the already diminishing case against using CPython as a production software platform.

I’m not saying doing any of this is a bad idea, but this is a tall order. If you’re interested in making this happen, I think the best course of action would be for you to create subclasses and publish them as a third-party package to PyPI. This will allow you to get signal on whether the design is good and if there is a widespread need that would enable these changes to go into the standard library.

Asyncio already allows users to change the default thread pool (Event Loop — Python 3.14.3 documentation), so a more robust subclass can be slotted in quite easily.

2 Likes

I’m not clear why changing the class hierarchy and introducing an abstract class would be necessary. Isn’t it sufficient to add specific capabilities to the thread pool executor?

Different executors will have different management tools, appropriate to their domains.

I would start on these one at a time, and focus first on the information which is available today in private attributes, but not readable via public APIs.

There’s no point discussing whether or not you should be allowed to change the worker limit (I don’t see the use case for it) if you can’t at least read it. And killing threads is going to take you into territory you probably don’t want to discuss – that one would potentially stymie the whole proposal.

But checking an executor’s level of saturation is, by contrast, much clearer in the utility it offers and very easy to implement.

Thanks for the feedback.

The ability to check an executor’s level of saturation – current worker worker count, worker limit, and if possible, a count of waiting requests – is by far the most important of the features I’ve proposed.

That alone would make a world of difference to help narrow down otherwise-unexplainable delays that can happen higher up the chains.

David

1 Like