Let generator.close() return StopIteration.value

ntessore · March 13, 2023, 2:13pm

I often write functions that go through a lot of data and apply arbitrary transformations. The best interface for this is often using generators, which might look like this:

def transform():
    while True:
        try:
            data = yield
        except GeneratorExit:
            break
        # gather data
        ...
    # compute
    ...
    return result

In short, the generator first gathers a lot of data before performing a computation and returning the final result.

The only way to obtain the return value of such a generator is currently to throw the GeneratorExit and catch the resulting StopIteration manually:

def close_and_return(g):
    try:
        g.throw(GeneratorExit)
    except StopIteration as e:
        return e.value
    else:
        return None

It would be much more convenient if the .close() method of the generator, which already catches StopIteration, also returned its value. Since .close() currently never returns anything, this change would not break existing code. The improved convenience might even give a lease of life to generator return values, and this type of coroutine more generally. Or, at the very least, we’d know why we are wasting precious keypresses to put the third type into Generator[...] annotation.

That being said, I do imagine the existing behaviour was chosen on purpose, but I didn’t find anything specific in e.g. PEP 479.

orent · March 15, 2023, 8:58am

+1

A possible extension to this idea would be to always keep the generator return value in the generator object and return it once on .close(). This enables simple access to the return value whether or not the close() was what stopped the generator.

storchaka · March 15, 2023, 10:39am

It is either too early or too late.

If generator.close() is called before the iteration starts, the code of the generator function has not even started to execute yet, and there is no return value.
If it is called on non-exhausted generator, the generator function has been paused on the yield expression which will be interrupted by GeneratorExit before reaching the return statement.
If the generator object has been exhausted, the returned value was only available as a StopIteration attribute raised by the last __next__(), and was likely discarded when StopIteration was implicitly (in for) or explicitly handled. When generator.close() is called later, the returned value is already gone.

In case 3 you need to save the returned value in the generator object. It can prolong the life of it, and can even create unwanted reference loops which will prolong the life of the returned value and the generator object and all linked objects even more.

In case 2 you need to explicitly catch GeneratorExit in a generator function and silence it. It is considered an antipattern. GeneratorExit was intentionally not made a subclass of Exception to prevent it from accidentally being silenced.

ntessore · March 15, 2023, 11:14am

I only want to improve the ergonomics of the specific case that a generator exits gracefully because of the call to close(), i.e. the StopIteration case here.

Rosuav · March 15, 2023, 11:43am

That’s not quite the same; the call to close() raises GeneratorExit, not StopIteration. (The other check in the same code.) So there won’t be any useful return value. StopIteration happens when the generator hits a return statement.

ntessore · March 15, 2023, 11:59am

Indeed, but the generator can only exit gracefully by explicitly catching GeneratorExit and returning.

Now I’m not sure if @storchaka considers that “silencing” and hence part of the anti-pattern he mentions, but I don’t, because the generator still acts on the GeneratorExit by exiting. The fact that close() ignores StopIteration seems to confirm that interpretation.

GalaxySnail · March 15, 2023, 5:28pm

For this certain example, what about this workaround?

def transform(set_result: Callable[[T], None]) -> T:
    while True:
        try:
            data = yield
        except GeneratorExit:
            set_result(result)
            raise
        # gather data
        ...
    # compute
    ...
    return result


def main():

    def set_result(res):
        nonlocal result
        result = res

    result = None
    g = transform(set_result)
    next(g)
    for data in data_set:
        g.send(data)
    g.close()
    print(f"{result = }")

or something like:

STOP_TRANSFORM = object()  # a sentinel object

def transform():
    while True:
        data = yield
        if data is STOP_TRANSFORM:
            break
        # gather data
        ...
    # compute
    ...
    return result

def close_and_return(g):
    return g.send(STOP_TRANSFORM)

ntessore · March 16, 2023, 10:21am

Thanks, but the example is already a workaround that works well. The idea is to make the workaround unnecessary.

However, there is an upshot to using a sentinel value, in that it makes the entire boilerplate “nicer” to some people:

while (data := (yield)) is not SENTINEL:
    …

But then you are just replicating the GeneratorExit mechanism by other means.

maarten · May 18, 2023, 4:51pm

Hi all, I’m just chiming in to let you know that if this feature made it into Python, it would find use in industry right away. I’m a data scientist developing simulations for a large logistical company and we would gratefully use .close() to obtain the return value of a generator as soon as the feature would be available.

We use generators a lot in our code, which uses the simpy library for simulations. In this framework generators are used to represent processes and the yield and send values are exclusively used to communicate about dependencies on other processes and the passage of simulated time. Any useful work done by the generators is always passed back using a return value.

The nature of our simulation requires that some of those “process” generators run indefinitely. However, once the simulation is complete, we need to terminate them and get their value out. Now we .throw an exception into the generator to change its state and evaluate it one more time to trigger the StopIteration exception and its value.

Being able to do something like:

def endless_process(...):
    while True:
        # do work (with update_message if there is one)
        # accumulate final_result and determine resume_dependency
        try:
            update_message = yield resume_dependency
        except GeneratorExit:
            break
    return final_result

and catching the result value with:

result = my_endless_process.close()

would be fantastic.

There are multiple workarounds to achieve a similar result and we have settled on one, but the above way of working seems natural and would be the cleanest by far. When starting out on using generators this way, I was surprised to find that calling .close on a generator did not return its return value. At least to me it seemed and still seems like an obvious and very useful feature.

Anyway, hope this feature makes it at some point in the future, although I realize that such things must be weighed against a lot of considerations.

Kindest regards,

Maarten Oosten

Rosuav · May 18, 2023, 5:06pm

Hmm. It sounds like you’re really looking for async/await rather than generators here, so maybe there are tools in the asyncio library to do what you want?

In any case, I don’t think it would be possible for a synchronous close method to achieve what you want, so the closest might end up being something like this:

class HaltTask(Exception): pass

def endless_process(...):
    while True:
        # do work (with update_message if there is one)
        # accumulate final_result and determine resume_dependency
        try:
            update_message = yield resume_dependency
        except HaltProcess:
            break
    return final_result

my_endless_process.throw(HaltProcess())
result = yield from my_endless_process

with the last two lines potentially being wrapped up in a function for convenience. I might have some details wrong here as I’m more thinking in terms of async and await these days, but the broad concepts should be similar.

ntessore · May 18, 2023, 5:45pm

I work in simulations as well, so I expect Maarten and I are talking about exactly the same use case for generators, which is not async, but externally controlled and running until ended:

my_endless_process = endless_process(...)
my_endless_process.send(None)

simulation = ...
for dt in simulation.timesteps:
    stuff = my_endless_process.send(...)
    ...

result = my_endless_process.close()

(So just coroutines as they were understood before they got their asynchronous connotation.)

Again, it is clear that this can be done right now using a number of workarounds. But it cannot be done in this natural and easy manner, which would only require changing one line of code in CPython.

guido · May 18, 2023, 5:54pm

As a challenge, let someone try to implement this. We will learn much from that experience.

maarten · May 18, 2023, 6:47pm

@ntessore explained it well. None of our code is async. We use generators as the “pre-async” coroutine concept.

ntessore · May 18, 2023, 9:13pm

Ok, I’ll bite. What am I not getting?

Patch

diff --git a/Objects/genobject.c b/Objects/genobject.c
index 9252c65..9e1c662 100644
--- a/Objects/genobject.c
+++ b/Objects/genobject.c
@@ -408,9 +408,33 @@ gen_close(PyGenObject *gen, PyObject *args)
         PyErr_SetString(PyExc_RuntimeError, msg);
         return NULL;
     }
-    if (PyErr_ExceptionMatches(PyExc_StopIteration)
-        || PyErr_ExceptionMatches(PyExc_GeneratorExit)) {
-        PyErr_Clear();          /* ignore these errors */
+    if (PyErr_ExceptionMatches(PyExc_StopIteration)) {
+        /* retrieve the StopIteration exception instance being handled,
+         * and extract its value */
+        PyObject *exc, *args, *value;
+        PyThreadState *tstate = _PyThreadState_GET();
+        if (tstate == NULL) {
+            PyErr_Clear();
+            Py_RETURN_NONE;
+        }
+        exc = tstate->current_exception;
+        if (exc == NULL || !PyExceptionInstance_Check(exc)) {
+            PyErr_Clear();
+            Py_RETURN_NONE;
+        }
+        args = ((PyBaseExceptionObject*)exc)->args;
+        if (args == NULL || !PyTuple_Check(args)
+                || PyTuple_GET_SIZE(args) == 0) {
+            PyErr_Clear();
+            Py_RETURN_NONE;
+        }
+        value = PyTuple_GET_ITEM(args, 0);
+        Py_INCREF(value);
+        PyErr_Clear();
+        return value;
+    }
+    if (PyErr_ExceptionMatches(PyExc_GeneratorExit)) {
+        PyErr_Clear();          /* ignore this error */
         Py_RETURN_NONE;
     }
     return NULL;

mdrissi · May 18, 2023, 9:45pm

A PR and is CI happy with the change. If all unit tests pass that’d be nice sign. Is there any test/code path that relies on close returning None?

guido · May 18, 2023, 11:05pm

Tests. And I suspect you shouldn’t be clearing errors so much.

ntessore · May 19, 2023, 12:00am

Sure, I can see the obvious tests. All existing tests pass (or so I think, doing make test).

Tests

class GeneratorCloseTest(unittest.TestCase):

    def test_close_no_return_value(self):
        def f():
            yield

        gen = f()
        gen.send(None)
        self.assertIsNone(gen.close())

    def test_close_return_value(self):
        def f():
            try:
                yield
                # close() raises GeneratorExit here, which is caught
            except GeneratorExit:
                return 0

        gen = f()
        gen.send(None)
        self.assertEqual(gen.close(), 0)

    def test_close_not_catching_exit(self):
        def f():
            yield
            # close() raises GeneratorExit here, which isn't caught and
            # therefore propagates -- no return value
            return 0

        gen = f()
        gen.send(None)
        self.assertIsNone(gen.close())

But you sounded very ominous about implementing this. Are you seeing cases where a naive implementation fails, and which should be tested?

storchaka · May 19, 2023, 5:40am

Consider the following cases:

Closing a not started generator.
Closing an exhausted generator.
Closing already closed generator.

Note that close() should be an idempotent operation.

guido · May 19, 2023, 5:47am

All the scenarios from Serhiy’s post.

ntessore · May 19, 2023, 7:08am

Thanks for these. Since neither case generates a StopIteration, the close() behaves exactly as it does now.

More tests

    def test_close_not_started(self):
        def f():
            try:
                yield
            except GeneratorExit:
                return 0

        gen = f()
        self.assertIsNone(gen.close())

    def test_close_exhausted(self):
        def f():
            try:
                yield
            except GeneratorExit:
                return 0

        gen = f()
        next(gen)
        with self.assertRaises(StopIteration):
            next(gen)
        self.assertIsNone(gen.close())

    def test_close_closed(self):
        def f():
            try:
                yield
            except GeneratorExit:
                return 0

        gen = f()
        gen.send(None)
        self.assertEqual(gen.close(), 0)
        self.assertIsNone(gen.close())

In the last case, close() is idempotent in its effect on the generator, but of course only the first close() will handle a StopIteration and return a value.