Overload generators / generator operators

Hi,

I would like to overload the Generator class, or maybe it would be even better to add operations to generators.

For instance imagine I want the sum of two generators where one has a delay.

def test():
    i =0
    while True:
        i+=1
        yield i
        
t = test()
delay_t = test()
next(t) # advance t , which lets delayed delay_t 

What i would like to do:

 type(t).__add__ = lambda a,b: (a_+b_ for a_,b_ in zip(a,b))  # not working

 s0 = t + delay_t # I would like something like (a_+b_ for a_,b_ in zip(a,b))
 next(s0)  

What I can do:

def s(a,b): 
    while True:
        yield next(a)+ next(b)
        
s1 = s(t,delay_t)  # ok, but needs to import / create the sum generator
s2 = (a+b for a,b in zip(t,delay_t)) # good but not reusable
s3 = lambda a,b: next(a) + next(b)  # shortest but not anymore a generator

print(next(s1))
print(next(s2))
print(s3(t,delay_t))

As you see the current options are more verbose, specially when things get more complicated, more than one sum a+b+c , generators with send (where i would send the same value to both) etc.

In general operators for generators would be a very cool feature to have, we could sum, subtract, chain, pair them, etc. but the current solutions are very verbose.

How would sending values into the sum of two generators work, and similarly, how would .throw and .close work?

I’m a big fan of the yield keyword too as it leads to very pleasant code such as in your examples. But it’s just syntactic sugar. When stepping beyond a generator (a function that returns an iterator) is required, the mechanism that springs to my mind is the iterator protocol, which allows iterators to support any behaviour at all that it is possible to define using methods. E.g.:

import itertools
from collections.abc import Iterable

class AddableIterator:
  def __init__(self, iterator):
    self.iterator = iterator

  def __iter__(self):
    return self

  def __next__(self):
    return next(self.iterator)
  
  def __add__(self, other: Iterable):
    return self.__class__((x+y for x, y in zip(self, other)))
    
t = itertools.count()
delay_t = itertools.count()
next(t)

addable = AddableIterator(t)

t_plus_delayed = addable + delay_t

for x in t_plus_delayed:
  input(f'Got: {x=}.  Press Enter to continue or Ctrl+C to quit')

A Generator Protocol that builds on the Iterator Protocol (basically the __next__ and __iter__ dunders) would be cool on the surface. But instance methods with those names can just as well be defined on an iterator’s class, and .send, .throw and .close are supported by coroutines too, and a Coroutine Protocol would require all sorts of async considerations to be ironed out first.

3 Likes

Your solution is cool, but I share the love with the yield keyword and built in syntax of generators, so I see a bit annoying that we need to create a new class instead of patch the existing one or create the functionality.

For the send generators, what I propose is the following:

import operator
s = gen1 + gen2   # -> g_op(gen1, gen2, operator.add)
s.send(val)

Where g_op is:

def g_op(gen1, gen2, op):
    v = yield
    while True:
        v = yield op(gen1.send(v), gen2.send(v))

Being op the operator “add”" etc. from python operators built-in package, notice that next(gen) is the same as gen.send(None), so this solution is also compatible with next.

Another note, is that generators with send need to be initialized and primed, which I find terribly annoying, but In practice I use the decorator @consumer as described in the pep PEP 342 – Coroutines via Enhanced Generators | peps.python.org


def consumer(func):
    """avoid priming the generator"""

    def wrapper(*args, **kw):
        gen = func(*args, **kw)
        next(gen)
        return gen

    wrapper.__name__ = func.__name__
    wrapper.__dict__ = func.__dict__
    wrapper.__doc__ = func.__doc__
    return wrapper

In my opinion close and throw should not have any special treatment, as I would not propagate them to the underlying generators. I may want to still use gen1 even when s is already closed and erased from memory.

1 Like

This seems pretty specific to your use case. As a drop-in solution, you could use the fishhook library to patch the generator class with an __add__ at runtime.

1 Like

Fishhook looks an interesting solution :slight_smile: thanks!

Generators are already for somewhat specific use cases and generators with send are almost a rarity, but I think conceptually it is natural to operate generators as you operate any other class in python, and I can say it simplifies the life working with data streams a lot, both async or normal ones.

For instance subtracting background to a streamed signal, calculate online stats or triggers under certain conditions of the data streams etc.

I think this type of generators are not used more often precisely because of the lack of support to this kind of operations and many others, what makes everything a bit verbose for practical use. Among other reasons.

I think this specific, limited functionality would be much better as a free function somewhere than on every generator ever created.

If anything, one might expect + between generators to act like itertools.chain, analogous with list.__add__() and tuple.__add__().

2 Likes

Well at the end is choosing the right conventions, if everyone thinks + is better used for chaining, as long as it can overwritten, is ok for me.

Personally I think that using the “+” for chaining a potentially infinite generator, is not a very useful usage for the addition operator, as the first part may never finish (like in my example). Generators that can’t be exhausted are difficult to compare with a list or a tuple.

Also itertools.chain already exist in a built-in library, what I describe it doesn’t. And to be honest it would be a very cool feature :slight_smile:

Almost every class in python implement the dunder methods in one way or another to enhance their functionalities, why generators doesn’t ?

very cool example thanks to Peter solution:

from fishhook import hook
from types import GeneratorType
import operator
import itertools

def g_op(gen1, gen2, op):
    v = yield
    while True:
        v = yield op(gen1.send(v), gen2.send(v))
        
@hook(GeneratorType)
def __add__(self, other):
    g = g_op(self,other, operator.add)
    g.send(None)
    return g

@hook(GeneratorType)
def __mul__(self, other):
    g = g_op(self,other, operator.mul)
    g.send(None)
    return g

def count():
    yield from itertools.count()
    
c1 = count()
c2 = count()
c3 = count()

s = (c1 * c3) + c2  # :D

print(next(s)) # 0
print(next(s)) # 2
print(next(s)) # 6
2 Likes

Is the issue that it’s not a generator, or that it’s not an iterator? Since it doesn’t even do what the other options do (it only tries to add one pair), I’m surprised you included it at all. I’m also surprised that you didn’t mention map(add, t, delay_t) (with the import from operator), what’s the issue with that one? (And for your specific example I might btw use map(sum, pairwise(test()))).

1 Like

Yes, you are right, that option is plainly bad.

And yes I forgot to mention “map” as a suitable option for that simple case. But in general “map” is not useful for generators with “send”. You can do you own modified “map” to send things, and then you end up having something like “g_op”. (you can also pass .send to map, but is even worst looking).

In any case even if send would not exist, back to my last example:

s3 = (c1 * c3) + c2  

# vs:

s3 = map(add, map(mul, c1, c3),  c2)  # hard to read, isn't?

And this composition is still simple, but it gets worst very quickly and we can’t ignore the send method. (my zip and def s(a,b): solution was also not covering it ,but “g_op” does)

Pairwise in general is not what I’m looking for cause c1, c2, c3 may not come from the same generator, that is just a simplified example.

You can do

s3 = ((x1 * x3) + x2  for (x1, x2, x3) in zip(c1, c2, c3))

It’s more verbose than your proposal, but it’s totally general, available in all supported python versions, and it doesn’t involve adding extra methods to the iterator protocol - don’t forget that there is no common base class that all generators inherit from, so you can’t guarantee that your proposed operators would work on all generators.

I see the attraction of the idea, but the practical difficulties mean that it’s unlikely to be realistic when the functionality is already available via generator expressions.

2 Likes

I assume you mean iterators here? Generators DO have a single type, but in this case I think the proposal (despite being described as for generators) really belongs on iterators.

Probably… As you say, the OP is not very clear and the proposal seems more reasonable to add to iterators - after all, things like range, map, and most of the itertools aren’t of type generator.

1 Like

I think ideally would be Iterators as you said, because why not to use this syntax to sum two map objects , but I’m mostly concerned to generators in my use-case. I’m using generators with or without the send method a lot. Which produces code looking like:

(a.send(last) + b.send(last))/c.send(last)

When I would like to make

((a+b)/c).send(last)

Now I see that my proposal is hard to implement, and we would need to deal with some edge cases that could be annoying:

next(a/(b+a))

Will be in risk of calling next or send to a twice if not tracked somehow, which may not be what I want.

In case you want to see something slightly closer to the type of code I meant, I made this little repo, in general the data comes from async iterators so it can´t be fully stored: