Python users are encountering the cryptic and non-deterministic ImportError: ... cannot allocate memory in static TLS block. This issue affects popular performance-critical libraries like TensorFlow, PyTorch (requirement: libgomp), and the binary _mysql_connector package (requirement: libstdc++) that require dynamically loaded binary libraries that have static Thread-Local Storage (TLS) requirements (compiled with -ftls-model initial-exec).
As I understand it, the issue manifests itself when the initially allocated TLS space is used up (e.g. by libraries compiled with the less performant -ftls-model global-dynamic) before a library with a static TLS requirement that is too large for the remaining space is loaded. When this happens, the dlopen() call fails and the import machinery passes on the error cannot allocate memory in static TLS block, which is provided by the linker (returned by dlerror()).
While the root cause is a low-level linker limitation, Python’s import system exposes the dlopen() error directly to the user with no context. This creates a “leaky abstraction” that is extremely difficult to debug, especially since the failure is can depend the order of imports and is highly specific to the execution environment. The standard workaround, LD_PRELOAD, is effective but obscure to most users and may not be available in all environments.
Could we improve this situation in CPython? I propose two potential mitigations:
Documentation: Add a section to the docs (e.g., in the FAQ or importlib section) explaining this specific error and the LD_PRELOAD solution.
Improved Error Message: Where an ImportError is created, check if the underlying linker error message contains “cannot allocate memory in static TLS block”. If so, replace the error message with a brief explanation and a reference to the new documentation.
This seems like a hidden trap for Python users that, often unknowingly, use binary packages compiled with static TLS for performace. The issue is exacerbated by the issue triggering (seemingly) non-deterministically. Providing better guidance directly from CPython would be a significant user experience improvement, even though the underlying problem may be challenging to work around.
Can you provide links of where you are thinking of putting it? This is the first I have ever heard of this issue being reported, so I don’t know how widely it’s seen.
How brittle is that going to be? Does every linker on every OS provide the exact same error message?
The issue is isolated to environments using ld.so (Linux and most BSDs). The string is defined in elf/dl-reloc.c in the glibc source (v2.42) but it is localized. the dlopen() and dlerror() calls are also available on macOS, though it uses another linker to fulfill the requests. AFAIK the mach-o architecture does not experience the TLS issue of the ELF arch, so the message would be unique to Linux/BSD.
I’m not sure where it would belong in the docs. A longer explanation would be warranted, but I don’t see an obvious place to put it. One place I can think of to put a short note, however, is under the ImportError on the library/exceptions page. This would then be saying that ImportErrors are raised also when dynamic linking fails on platforms that supports it, with a link to the more detailed info.
A better solution may be to raise a more specific instance of an ImportError that can then be properly documented as such. I suspect this is not an ideal solution.
This is probably not a common issue. I have never triggered it before as far as I can remember. And when googling I find three shared libraries that trigger the issue in Python, libstdc++ v6 on x64 (context: mysql_connector and xgboost) and libgomp (context: opencv, pytorch, tensorflow). These are, on the other hand, ubiquitous and expose a lot of users to potentially issues, depending on what other libraries they use and in what order. As it took me a deep-dive into linking and help from an expert to solve the issue (on the second attempt after I gave up the first time) so I’m worried for the average Python user.
And potentially glibc based on your explanation (e.g. not musl).
I think this is way too niche of an issue to go there.
Correct as the only subclass right now is ModuleNotFoundError and this isn’t a broad enough issue to introduce a new built-in exception.
I do appreciate the concern, but searching the issue tracker for the error message turns up no reported issues, and we have kept our issue history going for decades. Luckily this site does get indiexed by search engines, so this post alone might be enough documentation for anyone else who runs into it.
I don’t think we should recommend LD_PRELOAD as a solution here as this has much more impact than just a “quick fix” and is potentially going to cause all sorts of other issues. In particular any exported symbol will be placed on the global dynamic table as Python is compiled with –export-dynamic and also it will shadow any other symbol from the DT_NEEDED set.
While this is a real issue that trips up users, it’s worth emphasizing that using initial-exec TLS in shared objects is technically not supported by the ELF specification. The initial-exec TLS model assumes the TLS block is allocated statically at program startup and is intended only for use in the main executable or in shared objects that are known to be loaded at startup time (e.g., via DT_NEEDED). The fact that this works at all in some dynamic library setups is thanks to GNU-specific extensions and implementation quirks, particularly in glibc, which reserves some extra static TLS space at startup to support this case as a workaround, not as a guaranteed behavior.
The error message "cannot allocate memory in static TLS block" is, therefore, a result of relying on undefined behavior that just happens to work in common environments. It also explains why this problem appears to be non-deterministic depending on import order or what’s already loaded. This is also not even possible to do in other libc implementations.
So putting the weight on Python to surface, document, and handle this in a more user-friendly way feels a bit misplaced. The underlying behavior is outside Python’s control and is a low-level linker/loader issue tied to binary compatibility choices made by other projects (compilers, linkers, and library maintainers). If a library needs TLS and might be loaded via dlopen, it should not use initial-exec TLS and if it does it needs to be prepared to deal with the incompatibility. This is a packaging and build system responsibility—not a Python import system issue.
Not only that but as the behavior is outside our control is unclear if we should do any recommendations as is very likely that hey start to diverge with different libc versions or configurations. For example you could export GLIBC_TUNABLES=glibc.rtld.optional_static_tls=XXX and not he recommendation would change depending on your system