Here’s an actual, if pared-down, example of the kind of generator where a returning close()
would be useful. It’s from astronomy, or rather cosmology. The generator receives rows from a catalogue of observed galaxies and creates a map of number counts (i.e. how many galaxies in each pixel). It also computes the current mean number count in each iteration, and reports that number. (In reality, it might do more checking and reporting.)
def map_galaxies(nx, ny, dx, dy, lon, lat):
'''Map galaxy density contrast in a rectangular region.'''
counts = np.zeros((nx, ny))
mean = 0.
# create number count map
while True:
try:
rows = yield {'mean': mean}
except GeneratorExit:
break
# pixel indices from positions (lon, lat)
# done differently (checks!) in real code
i = ((rows['lon'] - lon)/dx).astype(int)
j = ((rows['lat'] - lat)/dy).astype(int)
# increase count in those pixels
np.add.at(counts, (i, j), 1)
# compute new mean
mean = counts.mean()
# transform number counts into density contrast in place
counts -= mean
counts /= mean
# not actually counts any more, but density contrast
return counts
This generator is hence a machine for iteratively processing catalogues of galaxy positions into position maps. Now the relevant point is that the map of number counts themselves is not very interesting, cosmologically. What’s interesting is the map of relative over- or underdensity with respect to the mean. So before we return the map of counts, we transform it into a map of relative density.
There might be many such generators for processing catalogues into all kinds of other maps (e.g. galaxy shapes). Each type of map might have a different “post-processing” step. There might be a dynamic, user-configurable list of generators to be fed from the catalogue, and their common interface is that they receive rows, yield status information, and return their final maps.
I hope this example explains the use of generators a bit better. I’m a little reluctant to discuss alternatives, because generators have emerged as the best implementation by trial and error over a long period of time. As you can see from Maarten’s answer, there seems to be convergent evolution in industry as well. So maybe you can let me get away with just the broadest strokes of what isn’t great about a class-based alternative:
class PositionMap:
def __init__(self, nx, ny, dx, dy, lon, lat):
self.counts = np.zeros((nx, ny))
self.dx = dx
self.dy = dy
self.lon = lon
self.lat = lat
self.mean = 0.
self.final = False
def add(self, rows):
if self.final:
raise RuntimeError('map is finalized')
i = ((rows['lon'] - self.lon)/self.dx).astype(int)
j = ((rows['lat'] - self.lat)/self.dy).astype(int)
np.add.at(self.counts, (i, j), 1)
self.mean = self.counts.mean()
return {'mean': self.mean}
def finalize(self):
if not self.final:
self.counts -= self.mean
self.counts /= self.mean
self.final = True
return self.counts
You see that
- we have recreated the generator interface, where
__init__()
is send(None)
, add(...)
is send(...)
, and finalize()
is close()
,
- we have recreated the generator state (running, final),
- we now have to create and maintain the internal state explicitly, which we will never do as efficiently as the generator context switching (one can time this),
- since we are manually maintaining the internal state, we have to remember to clean up; for example,
finalize()
could remove self.counts
before returning it.
The class version is not fundamentally different from the generator version, because it imitates the generators perfectly. It may be slightly harder to write, because one has to be careful about getting the state right in certain cases such as the above. But why redo all of what is a native Python construct?
So, assuming generators are a good design choice here, the question is really only “how do I obtain the final result after I tell the generator to stop?” And indeed, a sentinel value is one way to go, as I wrote somewhere above. But that also just mimics the existing GeneratorExit
mechanism of close()
, and necessitates reimplementing exactly the same checks (no further yields, errors raised during closing, etc.).