By Rnav1234 via Discussions on Python.org at 24Apr2022 23:32:
I realise I am now trying to predict when a python language feature
will want to fully instantiate a generator vs lazily return the first
time it can.
I note itertools:
[ i for i in itertools.takewhile(lambda x: x < 100,fib()) ]
works as I expected.
That is because it is doing what your for-loop did not: break from the
loop when a condition no longer holds. You could embed that test in your
loop and get the same result.
Do you have a simple to remember mnemonic or rule or command to help
you predict whether one is working with a full instantiator (eg
filter()) vs lazy evaluator? (or must this simply be memorised item by
item over time?)
Neither. Filter is lazy. Your fib() is lazy. Generators are lazy: they
run only when you are asking for the next value.
Your original loop looked like this:
for i in filter(lambda x: x < 100, fib()):
print(i)
filter() does not know that fib() returns an ordered sequence, and
therefore it does not know that once it sees a single value which is not
“x < 100” it should stop. The takewhile() suggestion does know to
stop.
This:
for i in takewhile(lambda x: x < 100, fib()):
print(i)
if the same as this:
for i in fib():
if not (i < 100):
break
print(i)
The takewhile() is effectively a convenience for building a chain of
generators.
Consider it this way:
- fib() runs indefinitely (in little steps, once each time you ask for
its next value)
- filter runs indefinitely (in little steps, once each time you ask for_
its next value)
- takewhile() runs indefinitely (in little steps, once each time you ask
for_ its next value), but it stops once the condition fails
So the only consideration is what does the generator do? What do the
docs say it does?
In general, generators are lazy functions. This is in contrast with,
say, goroutines in Golang, which run flat out immediately, but usually
return a value via a channel or some kind, so they stall at the point
where they produce a new value (until their consumer picks it up).
The difference here is that a generator runs on demand (“next value”)
whereas a goroutine runs immediately, in the hope of having a value
ready to go when the consumer wants one. Greedy.
In Python, things like list comprehensions are “greedy”:
nums = [ i*i for i in range(100) ]
This runs to completion before landing in “nums”.
By contrast, this:
nums_g = ( i*i for i in range(100) )
is a generator expression, and lands in “nums_g” instantly. It, like a
generator function, runs only on demand. AT this point, it has computed
nothing.
So this:
for num in [ i*i for i in range(100) ]:
print(num)
computes all the numbers before the loop starts, and has to keep them
in memory. versus this:
for num in ( i*i for i in range(100) ):
print(num)
what starts the loop immediately, and each number is computed only when
the for-loop mechanism asks for the enxt value, once per loop iteration.
Cheers,
Cameron Simpson cs@cskk.id.au