Fibonacci generator hangs in filter()

Hello Colleagues,

Can anyone explain why for Python 3.8.13 on Linux the following script hangs:
$ cat /tmp/bug1.py
from time import sleep

from: PEP 255 – Simple Generators | peps.python.org

def fib():
a,b = 0,1
while True:
yield b
a,b = b, a+b

for i in filter(lambda x: x < 100, fib()):
print(i)
sleep(1)
$ python /tmp/bug1.py
1
1
2
3
5
8
13
21
34
55
89
^CTraceback (most recent call last):
File “/tmp/bug1.py”, line 10, in
for i in filter(lambda x: x < 100, fib()):
File “/tmp/bug1.py”, line 10, in
for i in filter(lambda x: x < 100, fib()):
KeyboardInterrupt

In your for loop, x becomes greater than 100, once the loop passes 89.

Your filter only shows the fib numbers less than 100 and that’s what’s displayed. But the filter doesn’t exit. It just keeps running in case there’s another number that fits. You need an exit condition

The fib function, as given, is designed to yield values until you stop asking for them. The filter function keeps asking for and testing values until the supply runs out, which never happens in this case. It is not equipped to detect that after 89 it will never see another value that is less than 100, so your program just keeps on going without producing additional output.

Your program needs to decide when to stop asking fib for values. Here’s one simple way to do it:

for n in fib():
    if n < 100:
        print(n)
    else:
        break
2 Likes

Firstly thank you for your time and energy!
I realise I am now trying to predict when a python language feature will want to fully instantiate a generator vs lazily return the first time it can.
I note itertools:
[ i for i in itertools.takewhile(lambda x: x < 100,fib()) ]
works as I expected.
Do you have a simple to remember mnemonic or rule or command to help you predict whether one is working with a full instantiator (eg filter()) vs lazy evaluator? (or must this simply be memorised item by item over time?)

1 Like

The one that works for you uses takewhile. From takewhile docs:

Make an iterator that returns elements from the iterable as long as the predicate is true.

When you use takewhile, it ends the iterator as soon as the condition is false for the first time. In your other attempt you used filter. That removes any false conditions, but doesn’t stop running.

>>> import itertools
>>> l = [1, 2, 3, False, 4, 5, 6, False]
>>> list(filter(lambda x:x, l))
[1, 2, 3, 4, 5, 6]
>>> list(itertools.takewhile(lambda x:x, l))
[1, 2, 3]
1 Like

For me, it is mostly the latter, that is, hopefully remembering how a particular tool works after having used it repetitively.

You are mistaken about filter. It is a lazy iterator too.

In your case, it is the for loop that tries to evaluate all the values eagerly, not filter.

There is no foolproof way of telling what a function does except to read the docs or the source code, or try it and see for yourself. Even then, docs can be inaccurate, and source code could be obfuscated or even deliberately misleading:

http://www.underhanded-c.org/

There are some rules of thumb that might apply, but they aren’t foolproof. People can name their functions anything they like.

  • Functions in the itertools module are usually lazy iterators;

  • but see the tee function, which is an iterator but only partially lazy;

  • functions that return something with “iterator” in their name probably are lazy iterators;

  • functions that contain yield are probably lazy;

  • the inspect module has functions that can, er, inspect functions and tell you if they are generator functions;

  • but they can’t tell you how lazy they are.

By Rnav1234 via Discussions on Python.org at 24Apr2022 23:32:

I realise I am now trying to predict when a python language feature
will want to fully instantiate a generator vs lazily return the first
time it can.
I note itertools:
[ i for i in itertools.takewhile(lambda x: x < 100,fib()) ]
works as I expected.

That is because it is doing what your for-loop did not: break from the
loop when a condition no longer holds. You could embed that test in your
loop and get the same result.

Do you have a simple to remember mnemonic or rule or command to help
you predict whether one is working with a full instantiator (eg
filter()) vs lazy evaluator? (or must this simply be memorised item by
item over time?)

Neither. Filter is lazy. Your fib() is lazy. Generators are lazy: they
run only when you are asking for the next value.

Your original loop looked like this:

for i in filter(lambda x: x < 100, fib()):
    print(i)

filter() does not know that fib() returns an ordered sequence, and
therefore it does not know that once it sees a single value which is not
“x < 100” it should stop. The takewhile() suggestion does know to
stop.

This:

for i in takewhile(lambda x: x < 100, fib()):
    print(i)

if the same as this:

for i in fib():
    if not (i < 100):
        break
    print(i)

The takewhile() is effectively a convenience for building a chain of
generators.

Consider it this way:

  • fib() runs indefinitely (in little steps, once each time you ask for
    its next value)
  • filter runs indefinitely (in little steps, once each time you ask for_
    its next value)
  • takewhile() runs indefinitely (in little steps, once each time you ask
    for_ its next value), but it stops once the condition fails

So the only consideration is what does the generator do? What do the
docs say it does?

In general, generators are lazy functions. This is in contrast with,
say, goroutines in Golang, which run flat out immediately, but usually
return a value via a channel or some kind, so they stall at the point
where they produce a new value (until their consumer picks it up).

The difference here is that a generator runs on demand (“next value”)
whereas a goroutine runs immediately, in the hope of having a value
ready to go when the consumer wants one. Greedy.

In Python, things like list comprehensions are “greedy”:

nums = [ i*i for i in range(100) ]

This runs to completion before landing in “nums”.

By contrast, this:

nums_g = ( i*i for i in range(100) )

is a generator expression, and lands in “nums_g” instantly. It, like a
generator function, runs only on demand. AT this point, it has computed
nothing.

So this:

for num in [ i*i for i in range(100) ]:
    print(num)

computes all the numbers before the loop starts, and has to keep them
in memory. versus this:

for num in ( i*i for i in range(100) ):
    print(num)

what starts the loop immediately, and each number is computed only when
the for-loop mechanism asks for the enxt value, once per loop iteration.

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like

Thank you kindly folks ! Very generous with your time indeed.
My take-home is that I simply need to know that filter() will keep going once my ‘for loop’ tickles its iterable for the next item after 89. So this is a filter() characteristic. No built-in break and hence filter() is unable to stop before it hits the end of the iterable’s output i.e. never.