Let generator.close() return StopIteration.value

storchaka · May 19, 2023, 8:11am

So, it does not work.

To make it working in cases 2 and 3 you need to change __next__(), send() and throw(), not only close(). They should set the returned value as an attribute of the generator object, and close() should return that attribute.

The problem with case 1 is more serious and cannot be solved without changing the way of how generator objects work (I mean a generator object created by a generator function, not the generator protocol). It will require introducing a new syntax – a statement which denotes a “starting point” in a generator function.

ntessore · May 19, 2023, 8:20am

It works as intended here; there’s no intent on making close() remember its return value.^[1] The description of it would read:

If the generator function then exits gracefully (by catching GeneratorExit and returning), close() returns the generator function’s return value. If the generator function is already closed, or raises GeneratorExit (by not catching the exception), close returns to its caller.

Since the state of the generator is different in each encounter of close(), the behaviour of close() itself can be expected to be different. It still fulfils its basic contract that it moves the generator to the closed state from whatever state it currently is in.

And I would even say that it would be unexpected, and even weird, if close() did remember its return value. ↩︎

ntessore · May 19, 2023, 8:39am

Maybe I should clarify the mental model of generators in the context where close() returning a value is useful. There, generators are essentially pipelines, implementing an action → reaction interface:

initial_result = generator.send(None)
partial_result = generator.send(work_item)
...
partial_result = generator.send(work_item)
final_result   = generator.close()
past_the_end   = generator.close()

So returning None for anything but a graceful generator exit seems correct to me. As a matter of fact, in some applications, I’m relying on this behaviour to determine whether the generator has just finished its work, and I need to process its end product, or whether it was previously closed, not even started, or interrupted in a state where it cannot produce its intended end product.

guido · May 20, 2023, 11:59pm

It would be awesome if there was a more realistic example. All the sample code so far is super generic, partial_result = generator.send(work_item), or data = yield (the latter from the initial post, where apparently there aren’t even work items).

The use case would come much more to light if you could narrate a simple, specific example – are we talking DNA samples here, or web requests, or train movements, or what? What’s the relationship between the partial results and the final result? What does the caller do with the partial results? What state does the generator preserve between suspensions? Would it be inconvenient to do do this with a sentinel value instead of calling close()? Why not store the state on a class instance? Maybe it would also help to contrast the generator version with the class version – how is the generator version easier to read, write or maintain?

ntessore · May 22, 2023, 10:44am

Here’s an actual, if pared-down, example of the kind of generator where a returning close() would be useful. It’s from astronomy, or rather cosmology. The generator receives rows from a catalogue of observed galaxies^[1] and creates a map of number counts (i.e. how many galaxies in each pixel). It also computes the current mean number count in each iteration, and reports that number. (In reality, it might do more checking and reporting.)

def map_galaxies(nx, ny, dx, dy, lon, lat):
    '''Map galaxy density contrast in a rectangular region.'''

    counts = np.zeros((nx, ny))
    mean = 0.

    # create number count map
    while True:
        try:
            rows = yield {'mean': mean}
        except GeneratorExit:
            break

        # pixel indices from positions (lon, lat)
        # done differently (checks!) in real code
        i = ((rows['lon'] - lon)/dx).astype(int)
        j = ((rows['lat'] - lat)/dy).astype(int)

        # increase count in those pixels
        np.add.at(counts, (i, j), 1)

        # compute new mean
        mean = counts.mean()

    # transform number counts into density contrast in place
    counts -= mean
    counts /= mean

    # not actually counts any more, but density contrast
    return counts

This generator is hence a machine for iteratively processing catalogues of galaxy positions into position maps. Now the relevant point is that the map of number counts themselves is not very interesting, cosmologically. What’s interesting is the map of relative over- or underdensity with respect to the mean. So before we return the map of counts, we transform it into a map of relative density.^[2]

There might be many such generators for processing catalogues into all kinds of other maps (e.g. galaxy shapes). Each type of map might have a different “post-processing” step. There might be a dynamic, user-configurable list of generators to be fed from the catalogue, and their common interface is that they receive rows, yield status information, and return their final maps.

I hope this example explains the use of generators a bit better. I’m a little reluctant to discuss alternatives, because generators have emerged as the best implementation by trial and error over a long period of time. As you can see from Maarten’s answer, there seems to be convergent evolution in industry as well. So maybe you can let me get away with just the broadest strokes of what isn’t great about a class-based alternative:

class PositionMap:
    def __init__(self, nx, ny, dx, dy, lon, lat):
        self.counts = np.zeros((nx, ny))
        self.dx = dx
        self.dy = dy
        self.lon = lon
        self.lat = lat
        self.mean = 0.
        self.final = False
    def add(self, rows):
        if self.final:
            raise RuntimeError('map is finalized')
        i = ((rows['lon'] - self.lon)/self.dx).astype(int)
        j = ((rows['lat'] - self.lat)/self.dy).astype(int)
        np.add.at(self.counts, (i, j), 1)
        self.mean = self.counts.mean()
        return {'mean': self.mean}
    def finalize(self):
        if not self.final:
            self.counts -= self.mean
            self.counts /= self.mean
            self.final = True
        return self.counts

You see that

we have recreated the generator interface, where __init__() is send(None), add(...) is send(...), and finalize() is close(),
we have recreated the generator state (running, final),
we now have to create and maintain the internal state explicitly, which we will never do as efficiently as the generator context switching (one can time this),
since we are manually maintaining the internal state, we have to remember to clean up; for example, finalize() could remove self.counts before returning it.

The class version is not fundamentally different from the generator version, because it imitates the generators perfectly. It may be slightly harder to write, because one has to be careful about getting the state right in certain cases such as the above. But why redo all of what is a native Python construct?

So, assuming generators are a good design choice here, the question is really only “how do I obtain the final result after I tell the generator to stop?” And indeed, a sentinel value is one way to go, as I wrote somewhere above. But that also just mimics the existing GeneratorExit mechanism of close(), and necessitates reimplementing exactly the same checks (no further yields, errors raised during closing, etc.).

Which is so large is can only be read in batches, maybe even from a remote database. ↩︎
That is done in place; real maps might be so huge that we cannot fit a second copy into memory. ↩︎

guido · May 22, 2023, 6:07pm

Thanks, that was interesting. While there are other ways to write this (e.g. the solution you gave in your original post) I am now convinced that this is useful for some users, and pretty harmless for users who don’t need it (typically close() would just keep returning None).

I now feel that Serhiy’s objections don’t really matter too much – if the generator isn’t started yet, sure, let it return None, and a redundant close() returning None should also be fine.

Let’s proceed to the Issue + PR stage.

ntessore · May 22, 2023, 11:29pm

Thanks! I have opened:

Issue: Let generator.close() return StopIteration.value · Issue #104770 · python/cpython · GitHub
PR: gh-104770: Let generator.close() return value by ntessore · Pull Request #104771 · python/cpython · GitHub

tjreedy · May 23, 2023, 9:12pm

A potential advantage of your PositionMap class is that you could replace finalize with a report method that copied self.counts and adjusted the copy, so that report could be called multiple times. There are situations (some clinical trials, for instance) where people want ‘interim’ reports, possibly to decide whether to collect and enter more data or to stop immediately. But if you know you only want one report, a report on generator close is sensible. Fine with me if Guido or whoever decides to merge something.

This is a bit funny, since classes came first. In particular, iterator classes were a thing from Python’s beginning or soon after. Generator-based iterators came, I believe, in 2.2, for the reasons you experienced - faster to write and execute. Non-iterator methods send, etc., were added a few versions later. It is a mark of success that you see this as a ‘native Python construct’.

Stefan2 · May 24, 2023, 8:21pm

Generator version for that, using the issue’s averager example:

avg = Averager()
for number in 3, 1, 4:
    avg.send(number)
    print(avg.report())
print(avg.report())

Output:

3.0
2.0
2.6666666666666665
2.6666666666666665

Implementation:

class Averager:
    def __init__(self):
        def gen():
            self.report = lambda: sum / n
            sum = 0
            n = 0
            while True:
                sum += yield
                n += 1
        g = gen()
        next(g)
        self.send = g.send

Attempt This Online!

That actually seems even better to me than the generator close solution. Advantages:

Supports intermediate results, not just a final one.
Clean generator loop, no try and catch GeneratorExit.
The initial next(g) to get the generator ready is done in Averager, not by the user.
You can have multiple different “report” functions. And they can have parameters.

ntessore · May 28, 2023, 10:24pm

Thanks everyone for the insights. The proposed change has been merged into main for 3.13, all thanks to Guido and Irit who turned the PR into something acceptable.

I have been looking at possible ways to backport the changes to Python 3.7 to 3.12 — The sanest way that I can think of is a decorator that wraps generator functions in a simple collections.abc.Generator subclass with a suitable close() wrapper. The disadvantage is that the wrapped generator functions would no longer be instances of types.GeneratorType, and checks against that type (rather than the ABC) seem fairly common.

There are a couple of non-sane (meaning insane) ways to backport the behaviour more transparently, which I wouldn’t consider fit for production use.^[1] If anyone has any further, clever ideas, I would be glad to hear them.

E.g. hacking PyGen_Type’s tp_flags to allow subclassing of GeneratorType, or overwriting close() in PyGen_Type’s tp_methods with the backport. ↩︎