Difference between `return generator()` vs `yield from generator()`

apalala · January 11, 2020, 10:58pm

Not actually an idea. Is yield from generator() somehow optimized in the compiler or C support? (see non-authorative answers on SO)

[ins] In [1]: def generator():
         ...:     yield from range(10)
         ...:

[ins] In [2]: type(generator)
Out[2]: function

[ins] In [3]: type(generator())
Out[3]: generator

[ins] In [4]: def use_generator():
         ...:     return generator()
         ...:

[ins] In [5]: type(use_generator())
Out[5]: generator

[ins] In [6]: def yield_generator():
         ...:     yield from generator()
         ...:

[ins] In [7]: type(yield_generator())
Out[7]: generator

[ins] In [8]: list(use_generator()) == list(yield_generator())
Out[8]: True

apalala · January 11, 2020, 11:03pm

This experiment answers the main question. What changes between return and yield from?

context!


[ins] In [1]: def generator():
         ...:     yield 1
         ...:     raise Exception
         ...:

[ins] In [2]: def use_generator():
         ...:     return generator()
         ...:

[ins] In [3]: def yield_generator():
         ...:     yield from generator()
         ...:

[ins] In [4]: g = use_generator()

[ins] In [5]: next(g); next(g)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-5-3d9500a8db9f> in <module>
----> 1 next(g); next(g)

<ipython-input-1-b4cc4538f589> in generator()
      1 def generator():
      2     yield 1
----> 3     raise Exception
      4

Exception:

[ins] In [6]: g = yield_generator()

[ins] In [7]: next(g); next(g)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-7-3d9500a8db9f> in <module>
----> 1 next(g); next(g)

<ipython-input-3-3ab40ecc32f5> in yield_generator()
      1 def yield_generator():
----> 2     yield from generator()
      3

<ipython-input-1-b4cc4538f589> in generator()
      1 def generator():
      2     yield 1
----> 3     raise Exception
      4

Exception:

steven.daprano · January 11, 2020, 11:39pm

I don’t fully understand your question. What precisely are you asking?
Which is faster? You could try using the timeit module to find out.

There are (at least) three ways to “pass through” an iterator:

def gen1(it):
    for obj in it: yield obj

def gen2(it):
    yield from it

def gen3(it):
    return it

Only gen3 passes the iterator through unchanged. gen1 and gen2 wrap it
in a generator object, so technically I guess they add one extra level
of indirection when subsequently iterating over it.

Is there a reason why yield from it can’t return it without adding
an additional wrapper? It seems to me that that the two ought to be
equivalent.

steven.daprano · January 12, 2020, 12:15am

return it simply returns the iterator untouched.

yield from it creates a new generator object which roughly performs
for obj in it: yield obj internally.

I say “roughly” because yield from also handles sending input into a
generator, as per the explanation in the PEP:

Note that this PEP uses a different meaning of the word “coroutine” to
that used by async. Async coroutines and generator coroutines are
different things. If anyone hasn’t come across generator coroutines
before, they’re really cool:

http://www.dabeaz.com/coroutines/

apalala · January 12, 2020, 1:06am

I answered my on original inquiry with the example that raises an exception.

The difference is precisely what you say. yield from preserves the invocation context (creates a new generator), and return does not (as it should not).

On a first look at it, I thought that return genarator() could be wrong, but no, because everything that can be assigned is a value, and thus can be returned, so the parser can’t be bothered with the deeper semantics:

g = generator()
return g

It’s perhaps something to be mentioned somewhere in the docs that

def __iter__(self):
    yield from self.somecriteria_iter()

is the way to do it for traceback context to be preserved.

I see many difficult to debug bugs from unintendedly writing:

return iterator_or_generator_func()

instead of:

yield from iterator_or_generator_func()

Consider that now it is allowed to:

def g():
    yield True
    return 'OK'

apalala · January 12, 2020, 1:13am

What a nightmare!

My intuition tells me there’s something broken in the ambiguity (“There should be one-- and preferably only one --obvious way to do it.”), but I have no idea about what it is, much less how to fix it.

Yet the simple knowledge that

def g():
    yield from returns_generator()

will have g() in the traceback context (while using return won’t) is good.

tim.one · January 12, 2020, 2:21am

Here’s a difference that can surprise, although not in a context this simple :

def f(n):
    if n <= 0:
        return
    def g(n):  # doesn't matter - any generator at all will do here
        yield from range(n)
    # now pick one of the next two
    # return g(n)
    # yield from g(n)

Those act very differently when the argument passed to f is less than 1. Because if you pick return g(n), f itself is not a generator, and f(0) returns None. If you pick yield from g(n) instead, then f is a generator, and f(0) returns a generator-iterator (which raises StopIteration the first time it’s poked).

I got burned by this once in a much more complex context, after I went on a binge of replacing yield from X() with return X() as the last statement of many generators. Oops! Some of those contexts then turned out not to be generators anymore, and things broke all over the place. It was fun