I was reviewing some of these libraries and noticed they’re becoming free-threading friendly. My first thought was: why are they using threads, and is there a way to control or limit how many they use? Do I have to fork each one to manage that? Without limits, they could potentially make the system unresponsive.
Technically any library you use could make your system unresponsive. I don’t think free threading support is related to that.
If worried about what a lib might do: you need to look at the source yourself.
To some extent this is a problem packages already have to deal with. That’s why threadpoolctl was a thing long before free-threaded python.
Right now there isn’t anything free-threaded specific and threadpoolctl is more about controlling openblas thread pools than Python-level threading.
IMO as an ecosystem we should probably come up with tools (or figure how to integrate with threadpoolctl) that allow controlling multilevel parallelism. I don’t think we’re quite at a place where people are running into issues caused by several libraries attempting to use internal thread pools leading to oversubscription of system resources, but it’s probably coming soon.
Yes, but while I can easily contain libraries with poor design choices using containers, I can’t do the same for libraries that rely on free-threading. Doing so would defeat the purpose of free-threading.
That said, users usually don’t need to deal with low-level thread management. For instance, as @ngoldbaum pointed out, libraries like NumPy can be controlled using tools like threadpoolctl
to limit the number of threads they use internally.
The problem isn’t with cpu_count
or free-threading on their own, but with the flexibility libraries have to combine them. Even a moderately complex call tree can unintentionally spin up hundreds of threads this way.
Does this need a universal solution? It seems like it could be solved by feature requests for specific projects, if/when it becomes an issue.
I think almost all of them are not launching threads. The compatibility is that they are just making changes so they will work if you use the library while you choose to use multiple threads. Any library that does launch its own threads should only do so while providing you with some control over how it does that.
The point @ngoldbaum makes is relevant if you happen to use many libraries that launch their own threads but few libraries do that. Mostly the libraries just become threadsafe and you can choose whether or not to use them with threads.
This is not a new problem, how about a C++ program that uses a library that uses threads?
Best practice for a library that uses threads is to have API’s that allow limits on thread use. For example by setting a limit on thread pool sizes.
Also having a libraries documentation making it clear what use it makes of threads.
But I cannot see how python can policy thread use.
I’ve been evaluating a platform-agnostic solution, but it quickly becomes overly complex. It requires fine-grained thread filtering, such as distinguishing between I/O-bound and CPU-bound threads, to implement a meaningful and usable thread-limiting mechanism.
While this can work on specific platforms or in targeted use cases, it’s neither suitable nor maintainable within CPython, particularly because it relies on non-orthodox programming techniques and platform-specific assumptions.
I am not going to pursue this in Ideas