What’s this?
It’s a proposal that we make it possible to write context managers that prevent yield
ing out of their with
block. This is something @yselivanov and I have been discussing because we need it for “structured concurrency” features in Trio and asyncio. It will become a PEP but I wanted to post something early to start discussion.
What’s wrong with yield
?
Trio pioneered two new constructs, which are both implemented as context managers: cancel scopes, and nurseries. A cancel scope lets you delimit a block of code, and apply timeouts or otherwise cancel it. A nursery is a structured way to run a collection of concurrent tasks. We believe these are the right primitives to model concurrency in Python, and Yury is working on adding them to asyncio.
But, both of these constructs assume that “the code that runs inside the with
block” is a well-defined thing. Generators and async generators break that assumption. If users write code like this:
def bad_agen1():
with trio.CancelScope():
yield # yield inside 'with cancel scope' doesn't work
async def bad_agen2():
async with trio.open_nursery():
yield # yield inside 'async with nursery' doesn't work
…then the generator’s caller’s code ends up running in between the calls to __(a)enter__
and __(a)exit__
, and this corrupts the concurrency system’s runtime state.
If you’re lucky this produces a confusing crash at some later time, and if you’re unlucky it could produce incorrect results. We’d like to make it so attempts to yield
in these contexts instead raises a prompt and informative error.
Is this a problem in practice?
Yes. Our experience with Trio is that new users hit this frequently. Trying to use a cancel scope or nursery inside an async generator is a natural thing that people often try. The resulting errors are hard to interpret or debug. The details of the problems are highly technical, so it’s hard to teach users which cases are OK and which ones aren’t.
For example, here’s a tricky case:
# https://github.com/HyperionGray/trio-websocket
from trio_websocket import open_websocket
async def my_agen():
async with open_websocket(...) as websocket:
while True:
yield await websocket.get_message()
This looks innocent, but is actually broken. The websocket protocol requires a background task for each connection, so that PING frames can be handled promptly. Therefore, the open_websocket
context manager encapsulates a nursery context manager inside it, which means that this code actually has a yield
inside a nursery! As you can imagine it’s hard for new users to anticipate this, and even if you understand the theory about where yield
is and isn’t allowed, it’s still easy to accidentally write buggy code if you don’t realize that the context manager you’re using has a cancel scope or nursery hidden inside it.
When we add structured concurrency to asyncio, then we expect asyncio users will run into the same issues.
There’s several years worth of discussion in these issues:
Raising an error seems unfriendly. Can’t we fix it so yield
just works?
We’ve tried. For example, the proposals in PEP 533 and PEP 568 were originally motivated by problems we encountered with yield
inside nurseries and cancel scopes, respectively. But now that we understand the issues better, we think the only general solution is to raise an error if someone attempts to yield
inside these blocks. There are some special cases where PEP 533 and PEP 568 could help, but it’s not worth adding them just for that. (At least as far as Trio/asyncio go – they might be useful for other cases, I’m not taking a position that here.)
Here’s the fundamental issue: yield
suspends a call frame. It only makes sense to yield
in a leaf frame – i.e., if your call stack goes like A -> B -> C, then you can suspend C, but you can’t suspend B while leaving C running.
But, nurseries are a kind of “concurrent call” primitive, where a single frame can have multiple child frames that run concurrently. This means that if we allow people to mix yield
and nurseries, then we can end up in exactly this situation, where B gets suspended but C is actively running. This is nonsensical, and causes serious practical problems (e.g., if C raises an exception, we have no way to propagate it). And this is a fundamental incompatibility between generator control flow and structured concurrency control flow, not something we can fix by tweaking our APIs. The only solution seems to be to forbid yield
inside a nursery block.
If you want more details on all the specific problems that arise, and how they relate to this proposal, and to PEP 533 and PEP 568, then see this comment
So how could this actually work?
The basic idea is: when we enter one of these context managers, we set a flag in the runtime state that says "yield
is not allowed", and then when we exit the context manager we restore the flag to its original state.
So let’s say we add a new entry to PyInterpreterState
:
typedef struct {
PyObject* yield_forbidden;
} PyInterpreterState;
When yield_forbidden
is NULL
, yields are allowed. When it’s non-NULL
, yields are forbidden, and the object holds an error message explaining why yield
was forbidden. The sys
module gains a way to set and restore tstate->yield_forbidden
:
with sys.forbid_yield(error_message):
...
The yield
and yield from
statements are modified to check this attribute, and if it’s non-NULL
, do raise RuntimeError(yield_forbidden)
.
But, there are a bunch of important subtleties.
First, this should only apply to yield
and yield from
statements; we never want to forbid await
:
async def myfunc():
async with open_nursery():
await sleep(1) # This is fine
Currently await
and yield from
use the same opcode. The simplest solution would be to make the check: if (yield is forbidden && the current frame is a generator, not a coroutine) { raise RuntimeError }.
Another complication: we need to handle nested generators. We don’t actually want to forbid every yield
that happens inside our with
block; we only want to forbid yield
s that temporarily exit the with block. It’s fine if the code inside the with
block iterates over a generator that has some internal yield
. For example:
def inner_frame():
yield "hi"
def outer_frame():
with forbid_yield(...):
# There's a 'yield' in inner_frame, but that's OK
for obj in inner_frame(): # no error
print(obj)
# This 'yield' temporarily exits the with block, so it's illegal
yield # error
Therefore, we add an additional rule: when entering a generator via __next__
, send
, throw
, or close
, and when entering an async generator via __anext__
, asend
, athrow
, or aclose
, we set tstate->yield_forbidden = NULL
. On exit, we restore the previous value. That makes the example above work as expected, because the forbid_yield
in outer_frame
doesn’t affect inner_frame
.
And finally, we want to allow people to define their own context managers using @contextmanager
and @asynccontextmanager
, that wrap a no-yields-allowed context manager. This is especially subtle, as can be seen from an example:
@asynccontextmanager
async def open_websocket(...):
async with open_nursery():
yield Websocket(...) # This yield is OK
async def caller():
async with open_websocket(...):
yield # This should be an ERROR
Syntactically, there’s a yield
inside the @contextmanager
function. But semantically, this is totally different from a generator, which can be suspended/resumed/abandoned at any time. In a @contextmanager
function, the yield
is essentially replaced by the contents of the block where our new context manager is used (and @contextmanager
goes to great lengths to preserve this illusion). So here we want the forbid_yield
inside open_nursery
to take effect in caller
, not in open_websocket
.
While subtle, this turns out to be fairly straightforward. We add a boolean attribute that can be set on generator objects and async generator objects – something like gen_obj.__passthrough_yield_forbidden__
. By default, it’s False
. @contextmanager
and @asynccontextmanager
should set it to True
. (And so would a few other closely-related use cases, like @pytest_trio.trio_fixture
.)
When this attribute is True
, then it does two things:
-
It disables the check on
yield
/yield from
. A generator/async generator with this attribute set is always allowed toyield
, regardless of whattstate->yield_forbidden
says. -
It disables the save/restore when entering/exiting an (async) generator. This means that changes to the state are allowed to “leak out”.
So in our example, (1) means that yield Websocket(...)
is allowed, even though open_nursery
has set tstate->yield_forbidden
to a non-NULL
value. And (2) means that this setting is will remain effect when we return to the body of caller
, so the yield
in caller
will fail, as we wanted.
Summing up
So the final version of all the pieces would be roughly:
-
Generator
__anext__
and friends:if (!self->__passthrough_yield_forbidden__) { saved_state = tstate->yield_forbidden; tstate->yield_forbidden = NULL; } // ... actual functionality here ... if (!self->__passthrough_yield_forbidden__) { tstate->yield_forbidden = saved_state; }
-
The
YIELD
opcode and friends:if (tstate->yield_forbidden && !gen_obj->__passthrough_yield_forbidden__ && !gen_obj->is_coroutine) { raise RuntimeError }
Other use cases?
Since PEP 568 is currently not accepted, context managers that set some context-local state – for example, decimal.localcontext
– have a surprising interaction with generators, where yielding inside the context manager can let the local context accidentally “leak out” to the calling code. As long as this is the case, it might make sense to use the mechanism here to forbid yield
inside localcontext
and similar cases? But, that’s not the major motivation, and we’re not currently proposing to change the decimal
module.