I often write functions that go through a lot of data and apply arbitrary transformations. The best interface for this is often using generators, which might look like this:
def transform():
while True:
try:
data = yield
except GeneratorExit:
break
# gather data
...
# compute
...
return result
In short, the generator first gathers a lot of data before performing a computation and returning the final result.
The only way to obtain the return value of such a generator is currently to throw the GeneratorExit and catch the resulting StopIteration manually:
It would be much more convenient if the .close() method of the generator, which already catches StopIteration, also returned its value. Since .close() currently never returns anything, this change would not break existing code. The improved convenience might even give a lease of life to generator return values, and this type of coroutine more generally. Or, at the very least, we’d know why we are wasting precious keypresses to put the third type into Generator[...] annotation.
That being said, I do imagine the existing behaviour was chosen on purpose, but I didn’t find anything specific in e.g. PEP 479.
A possible extension to this idea would be to always keep the generator return value in the generator object and return it once on .close(). This enables simple access to the return value whether or not the close() was what stopped the generator.
If generator.close() is called before the iteration starts, the code of the generator function has not even started to execute yet, and there is no return value.
If it is called on non-exhausted generator, the generator function has been paused on the yield expression which will be interrupted by GeneratorExit before reaching the return statement.
If the generator object has been exhausted, the returned value was only available as a StopIteration attribute raised by the last __next__(), and was likely discarded when StopIteration was implicitly (in for) or explicitly handled. When generator.close() is called later, the returned value is already gone.
In case 3 you need to save the returned value in the generator object. It can prolong the life of it, and can even create unwanted reference loops which will prolong the life of the returned value and the generator object and all linked objects even more.
In case 2 you need to explicitly catch GeneratorExit in a generator function and silence it. It is considered an antipattern. GeneratorExit was intentionally not made a subclass of Exception to prevent it from accidentally being silenced.
I only want to improve the ergonomics of the specific case that a generator exits gracefully because of the call to close(), i.e. the StopIteration case here.
That’s not quite the same; the call to close() raises GeneratorExit, not StopIteration. (The other check in the same code.) So there won’t be any useful return value. StopIteration happens when the generator hits a return statement.
Indeed, but the generator can only exit gracefully by explicitly catching GeneratorExit and returning.
Now I’m not sure if @storchaka considers that “silencing” and hence part of the anti-pattern he mentions, but I don’t, because the generator still acts on the GeneratorExit by exiting. The fact that close() ignores StopIteration seems to confirm that interpretation.
For this certain example, what about this workaround?
def transform(set_result: Callable[[T], None]) -> T:
while True:
try:
data = yield
except GeneratorExit:
set_result(result)
raise
# gather data
...
# compute
...
return result
def main():
def set_result(res):
nonlocal result
result = res
result = None
g = transform(set_result)
next(g)
for data in data_set:
g.send(data)
g.close()
print(f"{result = }")
or something like:
STOP_TRANSFORM = object() # a sentinel object
def transform():
while True:
data = yield
if data is STOP_TRANSFORM:
break
# gather data
...
# compute
...
return result
def close_and_return(g):
return g.send(STOP_TRANSFORM)
Hi all, I’m just chiming in to let you know that if this feature made it into Python, it would find use in industry right away. I’m a data scientist developing simulations for a large logistical company and we would gratefully use .close() to obtain the return value of a generator as soon as the feature would be available.
We use generators a lot in our code, which uses the simpy library for simulations. In this framework generators are used to represent processes and the yield and send values are exclusively used to communicate about dependencies on other processes and the passage of simulated time. Any useful work done by the generators is always passed back using a return value.
The nature of our simulation requires that some of those “process” generators run indefinitely. However, once the simulation is complete, we need to terminate them and get their value out. Now we .throw an exception into the generator to change its state and evaluate it one more time to trigger the StopIteration exception and its value.
Being able to do something like:
def endless_process(...):
while True:
# do work (with update_message if there is one)
# accumulate final_result and determine resume_dependency
try:
update_message = yield resume_dependency
except GeneratorExit:
break
return final_result
and catching the result value with:
result = my_endless_process.close()
would be fantastic.
There are multiple workarounds to achieve a similar result and we have settled on one, but the above way of working seems natural and would be the cleanest by far. When starting out on using generators this way, I was surprised to find that calling .close on a generator did not return its return value. At least to me it seemed and still seems like an obvious and very useful feature.
Anyway, hope this feature makes it at some point in the future, although I realize that such things must be weighed against a lot of considerations.
Hmm. It sounds like you’re really looking for async/await rather than generators here, so maybe there are tools in the asyncio library to do what you want?
In any case, I don’t think it would be possible for a synchronous close method to achieve what you want, so the closest might end up being something like this:
class HaltTask(Exception): pass
def endless_process(...):
while True:
# do work (with update_message if there is one)
# accumulate final_result and determine resume_dependency
try:
update_message = yield resume_dependency
except HaltProcess:
break
return final_result
my_endless_process.throw(HaltProcess())
result = yield from my_endless_process
with the last two lines potentially being wrapped up in a function for convenience. I might have some details wrong here as I’m more thinking in terms of async and await these days, but the broad concepts should be similar.
I work in simulations as well, so I expect Maarten and I are talking about exactly the same use case for generators, which is not async, but externally controlled and running until ended:
my_endless_process = endless_process(...)
my_endless_process.send(None)
simulation = ...
for dt in simulation.timesteps:
stuff = my_endless_process.send(...)
...
result = my_endless_process.close()
(So just coroutines as they were understood before they got their asynchronous connotation.)
Again, it is clear that this can be done right now using a number of workarounds. But it cannot be done in this natural and easy manner, which would only require changing one line of code in CPython.
In the last case, close() is idempotent in its effect on the generator, but of course only the first close() will handle a StopIteration and return a value.