Python 3.14.0 is incompatible with stack-switching systems. What do we do?

Just a small correction. The SC said (emphasis mine):

as long as the mechanisms are either portable or conditional on platform support.

So in that sense @markshannon has followed correctly the SC directive.

2 Likes

One other option is to go back to frame counting for recursion detection, but set the default limit much lower. (And bear in mind that these days, this only counts native frames, as Python frames get condensed into a single frame per native call.)

The problem with counting frames (which doesn’t take into account the size of the frame) is that we were always trying to set the limit as high as possible, and then we need to test that the machinery works by deliberately hitting that limit. When the limit is too high, that crashes the test, and it can become too high because of the change of the frame size.

But if we found an actual reasonable recursion limit and set the default lower, that wouldn’t be anywhere near as much of an issue. I think we had it set at a couple thousand previously? I bet there’s a number that the vast majority of code never exceeds that is much lower, and the mechanism is already there to raise/disable the limit if you like your scripts to crash when you blow past the limit.

The ability to detect the actual end of the stack is a beautiful engineering solution, and I’m glad we went there. But we didn’t foresee that there would be tools changing the stack around us at the time.

2 Likes

Or the opposite: let non-Python threads opt-in to stack protection using an unstable API, rather than opt-out. That way, third-party extensions don’t need any code changes to be compatible with 3.14, but users can still get the fancy stack protection system if they want to.

Basically, I propose that thread states created via PyThreadState_New will have stack protection disabled by default. CPython can add a private API (or just reuse _PyThreadState_New) for keeping its own threads opted-in.

4 Likes

It turns out the fix is almost trivial.

We have the base of the stack (either known or estimated, it doesn’t matter) as well as a limit pointer c_stack_soft_limit. cpython/InternalDocs/stack_protection.md at main · python/cpython · GitHub

Currently we raise if the stack pointer is less than the the limit pointer, even if we are way outside our stack bounds.

If we instead raise when the stack pointer is less than the the limit pointer and greater than the base pointer, then everything works.
Stack protection works as designed for normal threads, but no exception is raised for user-space threads.
It isn’t even any slower, as the additional comparison is on the slow path.

I’m working on a PR.

6 Likes

How would this work on platforms without stack limits API, where c_stack_soft_limit is just a guess?

2 Likes

Something I don’t have clarity on is what are we expecting to come from the stack protection? Is it to provide a semblance of hope of recovering from a potential stack overflow by raising RecursionError and catching it? If that’s the case I would argue that it isn’t working for WASI as that outcome should still be testable but it isn’t.

But if it’s just to prevent a security issue, then WASI is covered indirectly thanks to it having to run in a host runtime which already prevents stack overflows from turning into something worse as the host runtimes kill the guest. So I would still argue the mechanism isn’t working as originally intended in that case, but as Mark, Hood, and I agreed to at EuroPython, it doesn’t matter if all you’re trying to do is prevent a stack overflow becoming a buffer overflow.

3 Likes

I would say “get a nice understandable exception rather than an obscure crash”. Whether the exception can be cleanly recovered [1] from may be less important.


  1. Obviously it should, but perhaps it’s not that easy if the except clause is itself very nested and needs its own stack overhead. ↩︎

2 Likes

More concretely: if a task-switching library uses an array of stacks, slightly smaller than Py_C_STACK_SIZE, on a *nix system without pthread_get_stackaddr_np/pthread_getattr_np, then starting a Python thread in task n will poison the beginning of the stack for task n+1, and calling Python API there will abort the process.
Is there any way to guard against this?

(An orderly array might be improbable, but it’s not necessary – in the same way Python can mark some unrelated piece of memory as off-limits, so you get a heisenbug instead.)

Yes. As I said in the post you’re replying to, we should be extremely explicit about adding the extra requirements needed by those features.
We should weigh them against the benefits, and document them prominently – both to other devs when they’re proposed, and to users once they’re no longer optional.

Well, you can port them with some effort.
If that’s the portability we’re going for, should the tweakable parameters be in pyport.h, documented, and announced widely in What’s New?


Is it OK that the mechanism is ineffective if you compile without optimizations? That seems to be by design.

2 Likes

I tried to describe my mental model around this, and came up with a possible larger framework to make the decision in.

There are several kinds of requirements we have on the underlying platform (mostly on its C implementation).
Do they look reasonable? Which of them should include “A contiguous stack”?

“Baseline” requirements

Porting CPython to a platform that meets these requirements should only involve configuration, rather than rewriting application code.
(Of course the line between configuration and code is blurry.)

These are not strict requirements!
We can and do have alternate implementations or workarounds for platforms that implement some of the baseline features differently, incorrectly, or not at all.
In particular, Windows is fully supported, and it has its own implementations for threads, atomics, paths, and others. Other “known” platforms have a bunch of #ifdefs and configure checks.

IMO, adding a new one should be an explicit project-wide decision (i.e. a SC decision).

  • A hosted implementation of C11 (i.e., the C standard library)
  • POSIX threads.
  • C11 atomics.
  • IEEE 754 double floating-point represenation, including NaN.
  • POSIX-style filesystem paths.
  • The intptr_t, int8_t, int16_t, int23_t, int64_t types, and their unsigned counterparts.
  • POSIX types like ssize_t
  • A UTF-8 C locale – see note in PEP 11.
  • Identifiers starting with _Py or _PY must be available as ordinary identifiers for CPython to define and use. (The C standard reserves them.)

Accidental/Practical requirements

These requirements sort of sneaked in. Removing them is currently impratical – it would need sustained testing effort on a rare platform, and in some cases, breaking public API. But they aren’t really necessary.
IMO, new code that requires these should only be added when there is a strong reason (for example, compatibility with an existing CPython feature); serious, testable efforts to remove the requirements would mostly be welcome.

See #111178 UBSan: Calling a function through pointer to incorrect function type, which removed what would have been a bullet point here after a sanitizer started flagging it.

  • Specific pointer sizes (any size you want as long as it’s 32 or 64).
  • Little- or big-endian encoding of integers; two’s complement when signed.
  • Lossless casts between function pointers and object pointers/void*
  • ssize_t must be able to represent the full range of negative values (between -SSIZE_MAX to 0)
  • Various assumptions about alignment and padding.
  • ASCII-compatible C character sets.
  • No limits on significant characters in a C identifier, nesting levels, number of identifiers, etc.
  • Identifiers starting with an underscore must be available as ordinary identifiers for CPython to use. (The C standard reserves them for some uses.)

Build requirements

Many tools are needed to build the interpreter: make, a POSIX shell, sed, and so on. Or Windows and the Microsoft tools.

These are not needed after (cross-)compiling.

Requirements for optional features

There are many “optional” features – ones that can be disabled and the (rest of the) interpreter will still work.
Some prominent examples are:

  • The object allocator (pymalloc), which is needed for acceptable performance but can be disabled, requires a page-based memory model.
  • Loading extension modules requires a platform-specific mechanism for loading shared libraries. (There are 3 in-tree implementations and a relatively nice configuration option!)
  • Debugging and profiling tools have additional requirements, AFAIK.
  • Various modules, for example zlib, socket, ssl, sqlite3, ctypes or asyncio, expose or require platform-specific functionality[1]. They’re technically optional, even if general users will expect a build of Python to include them.

  1. (I consider third-party libraries part of the platform.) ↩︎

4 Likes

What does this mean? Does it imply that you can only call into Python from a POSIX thread or just that when Python makes its own threads that it will make POSIX threads? The latter seems fine, while the former will be extremely problematic as you’ll be discarding many different users at the peripheries of computing. I know of software both in HPC as well as embedded settings that call into Python from the C side using non-POSIX threads. Ruling out all those possibilities for users that have been depending on Python for decades seems like it should be a non-starter.

CPython itself is only ever going to use the pthreads library and has no way to automatically understand anything else.

Native code with its own concept of threading that is not pthread API based will likely need to make API calls to inform Python about the context change.

For anyone who has platforms doing this kind of non-standard native “threading”: It’d be helpful if you could create a regression test PR for use within the CPython project itself that represents said non-pthreads based threading system design. Without such a thing, we have no way to even attempt to support such platforms or know that a change we’re making might be lead to issues on a specific uncommon system design.

5 Likes

That’s fine for the threads that Python makes itself.

Native code with its own concept of threading that is not pthread API based will likely need to make API calls to inform Python about the context change.

Is that what you do for Windows too? Windows threads are not pthreads.

For anyone who has platforms doing this kind of non-standard native “threading”

As I illustrated earlier in this discussion, this is not “non-standard” threading. All the threading we do is completely sanctioned by the C11 standard. They are not pthreads, but they abide by all the requirements of the C11 machine model and ABI. The C11 standard library might implement threads using pthreads, but that does not mean that all threads that abide by the C11 standard have to use pthreads to be compliant. C11 threads != pthreads.

It’d be helpful if you could create a regression test PR for use within the CPython project itself that represents said non-pthreads based threading system design

We can certainly do that, although I find it hard to believe that is actually what you want, because then you’re requiring every single HPC and embedded systems developer that use non-pthreads and depend on Python to have to add to the Python test suite if they want to know their code isn’t going to break. That creates a major hassle for them, and a large maintenance burden for the Python developers, to say nothing of the CI costs that you’d likely incur supporting probably hundreds if not thousands of additional tests for many different users with unusual configurations.

[Python] has no way to automatically understand anything else.

I think the crux of the issue is why does Python suddenly now (as of 3.14) think that it needs to “understand” this? I’ve been using Python since 2.5 (almost two decades) and it has worked on every single release, because Python behaved liked a normal C library and worked with C threads regardless of how they were made. Do you really believe that after decades of working on all C threads, it is now acceptable for Python to impose a new restriction on users that mandates that it can only work with very specific threads that are more restrictive than the C11 standard requires? I’d like to think that the steering committee will come to the obvious conclusion that it is too late in the game to impose such a stringent constraint on many users who have come to depend on Python behaving like a normal C library and working with generic C threads.

3 Likes

Turn that thinking around and understand that from our perspective: We’re not guaranteeing support of these non-standard setups. People who want that need to involve themselves during the alpha and beta phases. The intent of adding tests is a way to say “if we cannot see it, we cannot even try to consider supporting it”.

We’re not volunteering to maintain functionality for those if it becomes a burden to do so, but knowing when they’ll break by way of having some automated visibility sooner allows us to attempt to engage people who care about them at that time and consider if we need workarounds in our new designs rather than hearing complaint only after release. One example: People contribute buildbots with unique configurations for this reason. If offers visibility into potential problems for everyone, even if it is unstable and isn’t a config that we consider a PEP-11 tiered platform.

We have a lengthy beta period in which we encourage everyone who care about Python runtimes to be testing those for compatibility with unique areas of their own software stack so that feedback can be incorporated before the next release.

3 Likes

This is a good time for a reminder that this topic contains personal opinions, not official statements of the Python project.


In the bullet ponint summary, meant that CPython will run fine on a platform with POSIX threads. One-way implication. Other platforms can be (and are) supported too.

In my opinion, ideally there would be pluggable mechanism like DYNLOADFILE for dynamic loading. (Or PyMem_SetAllocator for a more modern example, though making it switchable/selectable at runtime is probably too much. Or like atomics for another example, but I’d prefer a user-visible knob here.)
An API would help specify exactly what we’re requiring from the underlying platform, and document any stability guarantees (e.g. “this knob is marked Unstable, if you use it, you need to test the result yourself and you might need to adjust your code for each release”).


Generally, I don’t think we should “allow” Python releases to just break anything that’s not tested. Yes, if we can’t test something it will break in practice, but we can still consider it a bug that needs fixing.

3 Likes