Python 3.14.0 is incompatible with stack-switching systems. What do we do?

encukou · November 14, 2025, 5:38pm

Hello,

Python 3.14 has an issue: it is incompatible with systems that switch tasks by switching the machine stack.
This was reported as #139653 and confirmed by several users:

C++ Boost’s make_fcontext() (KiCad)
C Argobots
Julia coroutines
Realm HPC runtime

The regression comes from new stack protection, which has a long history – roughly:

2021 (3.10 alphas): in PEP 651 “Robust Stack Overflow Handling”, which was rejected (but is now mostly implemented)
2022 (3.11 alphas): Steering council request #102, which was granted with some exceptions
2022 (3.12): Implemented in #96510
Feb 2025 (3.14): Implemented a different way in #130398 on Linux & windows, then expanded to other platforms
Aug 2025 (3.14 rc): Documentation on how in works

There are several issues with the approach:

Use of platform-specific API for getting the stack size:
- On platforms we don’t explicitly support, the code needs to guess (luckily WASI is one of those, so the guessing gets tested).
- As we learned after 3.14.0: When a task-switching library sets another location for the stack, the platform-specific API is wrong (in a way we can’t reliably distinguish from stack overflow).
There’s a need to guess how big a frame is (see e.g. #140222)
As we also learned after 3.14.0: The system needs to be notified when a task-switching library has switched the stack.

Now, the guessing makes this comparable to CPython’s pre-3.12 solution (C recursion was counted in sys.getrecursionlimit()) and the 3.12 solution (C recursion is tracked separately from Python recursion, as per PEP 651). One issue with new parameters to guess is that what worked before needs to be re-tweaked, and there have been many issues and fixes around that. But, that’s mostly done now (except issues like #113655).

But the incompatibility with stack switching is a bigger problem.
We’ve now added API that allows stack-switching libraries to inform the interpreter of new stack location, but feedback from affected users is that this is not enough. That’s because in users’ systems, the stack-switching mechanism can be decoupled from the Python interop mechanism – the code that calls Python API doesn’t “know” when the stack is switched, and/or where the new stack is.
(Another issue is that if we add functions in 3.14.1, extensions that use them will fail to load in 3.14.0 – which packaging tools probably aren’t prepared for. But that’s a relatively minor docs/tooling problem here.)

How do we solve this? I don’t know; @markshannon (the author of the mechanism) is silent.
I see some alternatives:

Declare Python 3.14 incompatible with stack switching systems
Backport PyUnstable_ThreadState_SetStackProtection and declare Python 3.14 incompatible with stack switching systems that can’t call the new function
Add API to disable C stack protection for a given process? (AFAIK at least some of the systems have their own stack protection mechanisms.)
Re-add the old counting mechanism and add API to switch to it
Revert the change and go back to counting

pablogsal · November 14, 2025, 7:08pm

I’m not really sure what the right answer is here. Using CPython on systems that do their own stack switching was always technically undefined behavior: it just happened to work before because our stack checks were simpler and didn’t rely on platform information. At the C level, the language model assumes that each thread has a single, continuous call stack managed by normal function calls and returns. There’s no standard-defined way to swap out or restore the stack pointer, and doing so breaks assumptions about local variable lifetimes, return addresses, and the overall call structure. Once you start switching stacks manually, you’re already outside what the standard guarantees. Given that it’s a bit difficult to support something that is technically not supported by the underlying abstract machine and just works via a carefully choreographic dance with the compilers and the platform ABIs I am not sure if is in our place to add compatibility APIs because we are basically fighting against an unbounded amount of problems. What’s worse: supporting all Tier 1/2 platforms may be impossible since this is heavily platform dependent.

So I think this is really a matter of balancing the value of the new, stricter stack protection against the real compatibility problems it introduces for packages that depend on stack switching.

cjdoris · November 14, 2025, 8:51pm

Where is this documented please?

I’m the maintainer of the main Julia-Python interoperability packages PythonCall/JuliaCall. Based on what you’ve said, the threading and tasking models of the two runtimes sound fundamentally incompatible, and we’ll need some major major reworking to support Python 3.14+.

lightsighter · November 14, 2025, 8:51pm

At the C level, the language model assumes that each thread has a single, continuous call stack managed by normal function calls and returns.

That’s just not true. The C programming language standard does not prescribe a direct relationship between stacks and threads at the language level. Instead, it delegates threading and stack management to the implementation, such as operating systems or runtime libraries, by omitting any mention of stack implementation details in the standard. If you believe the opposite please provide a pointer showing where the C11 language standard says that stacks have to be implemented as a contiguous allocation of memory. The C11 threading library may implement threads that way (likely because that is how POSIX threads are implemented), but the language specification is silent on how stacks need to be implemented (for good reason). I think you’ll find that the majority of the threading libraries impacted by this change are compliant with the C language standard and work with all major compilers without needing any ABI changes. You can say that Python doesn’t work with stacks that are implemented in non-traditional ways, but the C11 language standard definitely allows it.

How do we solve this?

I’m going to copy over my suggestion from the github issue here:

Given that the relationship between stacks and (kernel) threads is something that isn’t controlled by Python (except when threads are made through the Python interpreter), I think it should avoid enforcing these checks by default on “external” threads. A better solution would be to have the checks be on by default for threads made through the Python interpreter and off by default for threads not created through the interpreter (which can be easily detected) and then have a way for clients to dynamically toggle whether the checks are on/off for their specific thread at their prerogative.

gpshead · November 14, 2025, 8:56pm

I think just providing a way for people to outright disable the C stack size guessing (as in, via an environment variable / -X option) is the best we can do. No Python code should ever be written to rely on catching a recursion depth error (yet i’m sure someone does). People with that knob set are more likely to experience crashes, that’d be a documented caveat. And packages that transitively depend on native code doing C stack abuse would be kind to acknowledge this when recommending their users use it.

Should that environmental switch flip over to the old imperfect counting mechanism? i’m not sure if doing that actually matters in practice. Does anyone have a feel for who sees the stack errors and when? I thought it was mostly a debugging aid good for both new folks and to make debugging easier than a dead process. at least with faulthandler people can still get a stack trace when crashing. mostly.

The underlying platform fundamentally does not allow us to do better.

C in a sense is always undefined behavior in this area.

At a past job, we used tiny thread stacks to avoid virtual address space exhaustion in the face of thousands of (C++ spawned) threads. As in 64k small. Because “good” programs don’t need much stack. This caused problems when those called into CPython which consumes more. So we worked around it by detecting being within a CPython process and increasing the thread stack size to a “whopping” 256k. With an environmental or command line knob for people with a rare need, discovered via stack overflow crashes, for more to use more.

Point being: This unique unusual environment stack abuse was our (past job’s) own internal problem, not CPython’s to deal with.

gpshead · November 14, 2025, 8:58pm

this actually sounds feasible off the top of my head. I believe we know when a thread was created by us or not when we discover we don’t have a python thread state for the current thread and go to create one…

pablogsal · November 14, 2025, 9:16pm

Mike Bauer:

That’s just not true. The C programming language standard does not prescribe a direct relationship between stacks and threads at the language level. Instead, it delegates threading and stack management to the implementation, such as operating systems or runtime libraries, by omitting any mention of stack implementation details in the standard. If you believe the opposite please provide a pointer showing where the C11 language standard says that stacks have to be implemented as a contiguous allocation of memory. The C11 threading library may implement threads that way (likely because that is how POSIX threads are implemented), but the language specification is silent on how stacks need to be implemented (for good reason). I think you’ll find that the majority of the threading libraries impacted by this change are compliant with the C language standard and work with all major compilers without needing any ABI changes. You can say that Python doesn’t work with stacks that are implemented in non-traditional ways, but the C11 language standard definitely allows it.

Thanks for the clarification, and apologies for the confusion: I indeed phrased that poorly. What I was trying to express is that, in CPython, the guarantees we rely on come either from what the subset of the C11 standard we support defines or from what’s provided by the specific platforms we officially support in the specific ways we use them. In that context, “platform” refers to the target triples we list in our support tiers, which encompass not just the OS and architecture but also the associated libc and ABI conventions.

Outside of those defined environments, things can get blurry very quickly especially when behavior depends on mechanisms like user-space stack switching, which are outside both the C abstract machine model and the platform ABIs we test against. That’s the distinction I was trying to make, and I should have said “platform-defined behavior” explicitly.

I still think this point is important because there must be some cooperation or agreement of some sort between runtimes here. Even if a library doesn’t technically violate the ABI by itself, the combination of that library with CPython’s internals (and potentially with other components such as JITs, signal handlers, debuggers, or sanitizers) can lead to situations that effectively do. For example, unwinding and exception metadata assume contiguous, ABI-conforming frames; stack introspection and tracing tools expect stable frame pointers; and signal handling relies on predictable stack alignment and saved contexts. Without coordination, those assumptions can all quietly break, even if each piece individually follows the rules of the platform.

That’s a fair point. I do wonder, though, how this would interact with threads that are created outside the interpreter but later call into Python by attaching a thread state. In those cases, users and libraries often don’t have a clear model of what’s going on as they may be called from arbitrary native threads, possibly managed by another runtime, and can’t reliably tell whether they’re running in a “Python-created” or “external” thread. As a result, they usually have to assume the worst-case scenario to stay on the safe side. That uncertainty makes it quite hard to define consistent behavior around things like stack checks or protection toggling. Even if we provided a way to switch protections on or off, it’s not clear that users would always know when it’s actually safe to do so.

And just to clarify, I’d very much like to see this fixed. I’m only pointing out the current perceived support (or the confusion around it) and integration gaps, not suggesting that we shouldn’t try to improve the situation.

brettcannon · November 14, 2025, 9:46pm

Unfortunately that’s not actually true. The mechanism doesn’t work on Wasm platforms, so for both WASI and Emscripten we simply ignore it and have turned off tests that trigger recursion depth as they don’t work appropriately. At EuroPython 2025 we agreed that the host runtime will catch any nasty stack issues and catching RecursionError is typically futile, so we just live without the protection.

Basically the only way this would work for Wasm is if the counting mechanism came back.

pablogsal · November 14, 2025, 9:56pm

I also vaguely remember the mechanism had problems with windows debug builds due to the size of the stack frames and we had to work around that in the parser.

lightsighter · November 14, 2025, 10:52pm

That’s a fair point. I do wonder, though, how this would interact with threads that are created outside the interpreter but later call into Python by attaching a thread state.

Presumably the PyThreadState object, if it already exists, would already have a flag stored internally saying whether checks are on/off for this particular thread and I believe (correct me if I’m wrong), but a PyThreadState object can only be associated with a single kernel thread and cannot be transferred between kernel threads so the setting can persist and does not need to be modified every time a PyThreadState is attached/detached from a thread. If a new PyThreadState has to be made because this is the first time a thread is calling into Python, then the interpreter can check whether it knows about the thread or not and set the flag accordingly (checks on if it is known and off if not).

In those cases, users and libraries often don’t have a clear model of what’s going on as they may be called from arbitrary native threads, possibly managed by another runtime, and can’t reliably tell whether they’re running in a “Python-created” or “external” thread.

Correct (which is also why the other solutions proposed don’t work), but the Python interpreter does know about all the threads it creates and can set the default for the checks being on/off using that information in the call to create the PyThreadState. Tracking this is easy. You just make a TLS variable that defaults to false, and whenever the Python interpreter creates a thread it toggles the bit to true. It can then easily check if the thread is one that it made or not and set the default for checks on/off when creating the PyThreadState object.

Even if we provided a way to switch protections on or off, it’s not clear that users would always know when it’s actually safe to do so.

That was exactly my complaint with the other solutions. This approach effectively absolves users of the responsibility to get it right in all cases by making it a property of the PyThreadState created by the interpreter. When Python is sure that it’s safe to turn the checks on because it made the thread, it does so. When it can’t be sure, the checks are off by default, but smart clients can still opt-in if they know how their thread was created (without any expectation that they have to do so).

Even if a library doesn’t technically violate the ABI by itself, the combination of that library with CPython’s internals (and potentially with other components such as JITs, signal handlers, debuggers, or sanitizers) can lead to situations that effectively do. For example, unwinding and exception metadata assume contiguous, ABI-conforming frames; stack introspection and tracing tools expect stable frame pointers; and signal handling relies on predictable stack alignment and saved contexts. Without coordination, those assumptions can all quietly break, even if each piece individually follows the rules of the platform.

That feels like a separate discussion as the only thing that’s broken right now are these stack checks which aren’t necessary for basic functionality (code was running just fine in Python 3.13). At a minimum, it feels like there is some documentation missing on the Python side about what its expectations are regarding what clients are and are not allowed to do when calling into Python from external code (perhaps I’m just not aware of it). As an aside, I believe that the issue stems from the fact that Python behaves like a runtime in some cases and as a library in others. When you’re the runtime you can do whatever you want because you created the world, but when you’re just a library being invoked by external code then you can’t assume you control the world anymore. Python does publicly document and support the library-like functionality and in my opinion needs to work like any other C-compliant library in such cases or explicitly document its restrictions.

pablogsal · November 14, 2025, 11:27pm

I understand the technical approach, but I’m concerned this creates a confusing user experience. From a developer’s perspective, they’re just calling Python code: they shouldn’t have to understand that the interpreter behaves fundamentally differently depending on whether they happen to be on a thread that Python created versus one their runtime created. Having the answer to “does Python protect against stack overflow?” be “sometimes, depending on thread provenance” means users don’t have a clear model of what to expect, and refactoring to use a different threading model would suddenly change Python’s behavior underneath them in non-obvious ways.

Mike Bauer:

That feels like a separate discussion as the only thing that’s broken right now are these stack checks which aren’t necessary for basic functionality (code was running just fine in Python 3.13). At a minimum, it feels like there is some documentation missing on the Python side about what its expectations are regarding what clients are and are not allowed to do when calling into Python from external code (perhaps I’m just not aware of it). As an aside, I believe that the issue stems from the fact that Python behaves like a runtime in some cases and as a library in others. When you’re the runtime you can do whatever you want because you created the world, but when you’re just a library being invoked by external code then you can’t assume you control the world anymore. Python does publicly document and support the library-like functionality and in my opinion needs to work like any other C-compliant library in such cases or explicitly document its restrictions.

I’d argue that these questions are actually deeply intertwined rather than separate. The reason we’re having this discussion about the stack checks breaking this use case is precisely because we haven’t clearly defined our level of support for non-contiguous stacks and stack-switching systems. The documentation you’re asking for about what clients are allowed to do literally depends on us answering whether we support non-contiguous stack as something that can interact what CPython and what rules or restrictions are there (or even in what platforms this is supported!), and that’s something where core developers may not agree. Furthermore, we cannot document something that we don’t know we can keep when we evolve the JIT, our debugging or profiling story and guarantees or other parts of the interpreter. At the very least what we support needs to be tested as is required in several PEPs including PEP 11 and this particular aspect was never tested nor even considered so far.

I’m completely sympathetic to your position that “it worked in 3.13” is a strong argument for trying to do as much as possible to support it. But without consensus on whether this is a use case we want to support going forward, it’s hard to decide on the right technical approach. If we decide Python should support this use case then your proposal about toggleable checks makes perfect sense and we’d document the guarantees accordingly. If we decide it’s not officially supported, then we’d document it as a known limitation. Either way, we need to agree that support level first before we can design the right solution or write the right documentation. Does that make sense?

elliottslaughter · November 15, 2025, 1:04am

I’m coming into as a non-core Python developer of mostly HPC software. My impression from the discussion above is that the new stack checking features in 3.14 are much more deeply entangled with the platform/OS than previous versions.

I realize why that might be an advantage in some ways, but I want to reiterate the point that people use Python in many different ways, far exceeding the set enumerated at the top of this post. User-level threads may feel like a weird, quirky feature, but they’re far more common in the HPC world (qthreads, nOS-V, Charm++, etc.). To me, an expectation that Python is going to require deep integration with each of these systems, or else rule them all out of scope, seems unfortunate. Especially given the upshot of the integration is (as far as I understand) a slightly less conservative check for stack space.

Again, I understand there are tradeoffs, but just wanted to underscore the point from my perspective.

h-vetinari · November 15, 2025, 7:26am

People might be interested in this C thread library by the current editor of the C standard^[1]. Its docs state:

The corresponding standards paper will probably make it into C2y; but since it’s a library-only feature, it doesn’t need more than C11 and is available today.

Obviously an experimental library is never the first choice in engineering. But since there’s somewhat of an impasse now, it might be slightly less bad than the other choices^[2].

and prolific proposal author for C/C++ ↩︎
and, if it turns out to be useful, real-world feedback to the C committee can definitely improve the odds of something like this making it into a future standard version ↩︎

pitrou · November 15, 2025, 8:54am

How often is that important though?

Most Python usage only executes Python code from Python-created threads, and most Python usage also never triggers any recursion errors. Python code executed from non-Python threads that happens to potentially trigger recursion issues must be vanishingly rare.

A more refined solution, though, would be for a runtime to inform CPython that a given thread might be stack-switched at some point. Non-CPython threads would still be recursion-checked until that API is called.

Something like:

.. c:function:: void PyUnstable_ThreadState_DisableStackProtection(PyThreadState *tstate)

   Disable stack protection permanently for a Python thread state.

pablogsal · November 15, 2025, 2:42pm

In my personal opinion? Almost never

Without invalidating my previous answer I wan to still answer this separately as that really depends on the framework. For example, I have used for years a framework that wraps a C+±created thread pool where the user implements Python callback handlers to handle incoming requests. In this scenario, ALL my code is called by native threads. This pattern is actually quite common in high-performance server applications, embedded Python scenarios, and integration layers where Python is being used as a scripting language within a larger C/C++ application.

In any case just to be absolutely clear where I stand: I think the cost that the stack checking system is imposing is too high given how many problems it’s causing across different use cases. It has needed several adjustments, it doesn’t work well with WASI, It doesn’t work well in Windows with debug builds, it works badly in the parser and many libraries are having problems. I don’t deny there are advantages, but given how situational those benefits are and how few users actually benefit from them (versus how dire it is to break existing libraries and workflows) I think we need to seriously reconsider the tradeoffs here and either remove it or make it opt-in.

That said, I think it’s still important to agree on the level of support for this. I suspect this aspect may come up again in the future with other features or changes, and we need to have some sort of agreed “official” position for this use case so we’re not rehashing the same fundamental discussion every time something new breaks this pattern

cjdoris · November 15, 2025, 9:05pm

Just thinking about how to work around these issues in my own codebase by doing more fine-grained control of thread state. Does PyThreadState_New recompute the stack bounds? That is, if I create a new thread state for each stack (task/co-routine/etc) can I expect things to behave correctly?

And follow up question - can I use PyThreadState_Clear instead to reuse a thread state, in particular recomputing the stack bounds? It’s unclear what it does from the docs but from the name it sounds like it’s sort of equivalent to deleting and creating a new threadstate.

encukou · November 17, 2025, 10:15am

My opinion:
CPython should require C, and that’s it. Platform-specific functionality and assumptions should power optional optimizations or exposing optional functionality (from sockets to profiler support).

Any hard requirement (e.g. threads/atomics, IEEE floats, stack growth assumptions) is best seen as an exception, and if we add a new one we should be extremely explicit about it. An unexpected new requirement should be treated as a serious bug.

In other words, the PEP 11 tiered platforms should define what we can test and debug, more than what we consider a bug. We should not over-fit CPython to the quirks of the tiered platforms.
(Of course, and as we learned here, something like “Linux” is not a single platform…)

FWIW, the SC approval for stack protection explicitly said it needs to be “portable”. It turns out that the feature is not portable. The issue is that we found this out too late. IMO, part of the issue is that the design was only revealed in the 3.14 RC phase, and hasn’t been discussed publicly as far as I know. Questions like “why not use mprotect?” now seem too late.

Anyway, the real question is what to do now.

markshannon · November 17, 2025, 12:17pm

I just want to correct a few points here.

The stack protection actually works OK with WASI, but it is awkward to test.
WASI has two stacks, one of which has builtin stack protection, and one that does not. Our stack protection protects that second stack.The reason it is awkward to test is that the builtin stack protection doesn’t raise an exception, just halts the runtime.

It works perfectly well with the parser, AFAICT. If the parser would blows the C stack, you will get an exception before that happens. Do you have an example where it doesn’t work?

It also works just fine on Windows with debug builds. Again, do you have an example where it doesn’t?

Having said all that, I do agree that libraries that switch stacks aren’t supported, we have broken those, and we need to fix them.

Can we focus the discussion on the actual problem?

markshannon · November 17, 2025, 12:24pm

CPython should require C, and that’s it.

Unfortunately that rules out any operating system features outside of the C standard library, like threads, or the JIT.

pablogsal · November 17, 2025, 12:36pm

Thanks, Mark. I completely agree that it’s better to focus on the actual issue rather than going case by case about which platforms work or not. My intent was to provide context and is based on my recollection not to reopen old threads or assign blame. I completely understand that whether something is considered “flaky,” “complex,” or just “delicate” is subjective but given the number of adjustments and exceptions we had to make, I think describing it as at least “delicate” or “nuanced” seems like a fair characterization and that was my main thesis here when comparing with the benefits.

Since you asked for links, here’s what my recollection of the situation was:

Unless I’m missing something, we deactivated stack protection for WASI and several other systems — for example GH-134469 and GH-131338.
WASI also had particular issues with the parser: GH-131770.
On Windows, my recollection is that the stack limits were adjusted multiple times to avoid overflows (GH-113655). If that’s now fully resolved, then my knowledge may be out of date — perhaps it was addressed by GH-91079? (That issue is still open, so I’m not sure if it should be closed.)

That said, I really don’t want to go through these one by one. As you say: it’s not productive at this point. The important part is to acknowledge that we’ve had to make quite a few platform-specific adjustments, and to make sure we’re aligned on what level of fragility or portability we’re comfortable accepting.