I need to understand what is iterator and generators. I looked into YouTube and gone through documents but didn’t get it. What is the purpose, impact and where and when one can use it?
Its quite confusing. Please help.
I need to understand what is iterator and generators. I looked into YouTube and gone through documents but didn’t get it. What is the purpose, impact and where and when one can use it?
Its quite confusing. Please help.
You use them where you might otherwise have created a list or dictionary to hold an intermediate result. You might naturally have generated items into a list and then gone through that list with another loop, processing and grouping things say, and making another list as output.
With a generator you make the items only as you need them, so it you might have a lot of items, you don’t have to store them all at once, and if you might finish early, you avoid generating all the later ones, saving cpu. You’re probably already using some without noticing, e.g. when you do for line in file: ....
I think I learned about them from this how-to: Functional Programming HOWTO — Python 3.14.2 documentation
I would say I’m yet to master itertools and functools, but I’ve found uses for my own generators. In practice, they can be a little difficult to debug, because when they fail, they fail in the part of your program where you read from them, not where you first create them, as would happen with a list.
Hello,
welcome to the Python forum.
An iterator is any object that you can step through and perform an operation or extract information from. Examples include lists, tuples,dictionaries, sets, etc. Formally, an iterator is an object that can be iterated upon, meaning that you can traverse through all its values. Technically, in Python, an iterator is an object which implements the iterator protocol, which consists of the two methods:
Example of Iterable(s) Objects:
-----------------
a. Lists
b. Tuples
c. Dictionaries
d. Sets
e. files
f. generators
g. strings
Here is a list of numbers and a tuple of strings:
`a_list = [1, 2, 3, 4, 5]`
`a_tuple = ('one', 'two', 'three')`
A generator is a function that includes the yield keyword which is a special type of iterator that produces a sequence of values one at a time, on demand, without storing the entire sequence in memory. This is helpful when you don’t want to process the entire contents of an iterator object (see list above). For example, say you want to process the contents of a database for some operation. Instead of loading the entire contents which can load down your system, you can choose to process it in chunks and on-demand as needed. This is where an object like a generator function can come in handy.
Examples might be helpful. The first example processes a list of values and provides the square of each value in the list at once:
#Example 1 - without generator prints values in list all at once
list_of_values = [1, 2, 3, 4, 5]
def process_squares(values):
for num in values:
print(f'{num}^2 = {num**2}')
return
process_squares(list_of_values)
The second example, we first create a generator function, then a generator object, and then explicit calls to the object to process one value at a time in the list, on an on-demand basis.
# Example 2 - With generator, processes values on a call explicit call basis (not all at once)
def process_squares_with_generator(values):
for num in values:
yield num
# Create a generator object
gen_object = process_squares_with_generator(list_of_values)
# Need to call generator object explicitly to process values one at a time
print(next(gen_object))
print(next(gen_object))
print(next(gen_object))
print(next(gen_object))
print(next(gen_object))
These examples illustrate the difference between processing values in an iterator (i.e., list) all at once and on an on-damand basis. The memory requirements for loading the list in these examples for processing may not be critical since it is rather small. However, if you are processing values in a database or files that are in the high megabyte (or gigabyte) range, then processing in chunks may be beneficial.
Hope this helps.
Keep in mind that the same object does not need to implement both __iter__ and __next__; an iterable is any object that can produce an iterator, but does not itself need to be an iterator. An example is list itself:
>>> x = [1,2,3]
>>> next(x)
Traceback (most recent call last):
File "<python-input-8>", line 1, in <module>
next(x)
~~~~^^^
TypeError: 'list' object is not an iterator
>>> iter(x)
<list_iterator object at 0x10579a140>
>>> i1 = iter(x)
>>> i2 = iter(x)
>>> next(i1), next(i1)
(1, 2)
>>> next(i2)
1
list.__iter__ is used to produce an object of type list_iterator, which is what implements __next__. The same list can be iterated over multiple times using distinct iterators. By convention, all iterators should be iterables as well by implementing a trivial __iter__ method that returns the object itself.
Please reference:
It specifically states:
A class that wants to be an iterator should implement two methods: a next() method that behaves as described above, and an __iter__() method that returns self .
Reference this:
It states the following:
The iterator objects themselves are required to support the following two methods, which together form the iterator protocol:
iterator.iter()
Return the iterator object itself. This is required to allow both containers and iterators to be used with the for and in statements. This method corresponds to the tp_iter slot of the type structure for Python objects in the Python/C API.
iterator.next()
Return the next item from the iterator. If there are no further items, raise the StopIteration exception. This method corresponds to the tp_iternext slot of the type structure for Python objects in the Python/C API.
@onePythonUser, I think @chepner was talking about how __iter__ need not be a meaningful implementation, and in fact as per the spec Iterator.__iter__ must always be essentially a no-op that returns self. That is, all iterators are also iterables, but with a strict idempotency rule on the behavior of iter().
To address @sarveshpandey23‘s question, there are actually 3 types of things to consider:
Iterators IMO are the easiest to understand. You call next() ( Built-in Functions — Python 3.14.2 documentation ) on an iterator to get the “next value”. If you are writing your own class, you implement this using the __next__() method ( Built-in Types — Python 3.14.2 documentation ).
If you are writing your own iterator, “next value” that can be defined however you like. The standard library defines what it means for basic data types such as list, where it simply gets the next value in list order. Note that this means an iterator usually needs to track the current state of iteration somehow, otherwise it wouldn’t know what the “next value” was.
This is actually how for works internally. For example:
for x in data_iter:
...
This is the same as calling x = next(data_iter) over and over until a special exception is raised, in which case Python stops looping and moves on to subsequent lines of code. Conceptually equivalent to:
more = True
while more:
try:
x = next(data_iter)
except StopIteration:
more = False
else:
...
An iterable is a little more abstract, but it’s very important. An iterable is anything that can be turned into an iterator. You call iter() Built-in Functions — Python 3.14.2 documentation on an iterable to obtain an iterator. If you are writing your own class, you implement this using the __iter()__ method ( 3. Data model — Python 3.14.2 documentation , Built-in Types — Python 3.14.2 documentation ).
I left out a detail above about for. This loop:
for x in data:
...
Really looks more like this:
more = True
data_iter = iter(data)
while more:
try:
x = next(data_iter)
except StopIteration:
more = False
else:
...
Note that Python first converts the iterable to an iterator, and then starts looping as described in the previous section about iterators.
Why do we need the distinction? Recall that iterators usually need to track the iteration state internally. Now consider a list: a list can be iterated over many different times, maybe even concurrently, and it would be ridiculous to expect a list to somehow keep track of every distinct loop where it might be used. In fact I don’t even know if that would be possible in the current design of the Python language. So instead we loop over a list by first constructing a single-use iterator that tracks the iteration state for each loop separately, and the list object itself never has to worry about doing this.
This detail about loops is the reason why iterators are also expected to return themselves when you call iter(), as people are discussing above.
Python glossary: generator (generator iterator)
Generator iterators are special iterators which can receive a value for processing at the same time as a value is emitted for iteration. This is kind of an advanced feature and it’s very rarely used nowadays.
Many Python devs however interact with generator functions, which are functions that let you write all kinds of iterators, including generator iterators, using a special keyword yield. Technically all iterators defined this way are generator iterators, but usually the “generator” aspect is ignored.
Here is an example of a simple custom iteration function that produces an (infinite!) stream of Fibonacci numbers:
def iter_fibonacci():
prev_1, prev_2 = 1, 1
while True:
curr = prev_1 + prev_2
yield curr
prev_1, prev_2 = curr, prev_1
iter_fibonacci is a generator function. When you call iter_fibonacci(), you get a generator iterator. But in this case the “generator” functionality is not used, so it acts just like a normal iterator:
max_n = 20
for n, x in enumerate(iter_fibonacci()):
print(f"{n}: {x}")
if n >= max_n:
break
For a fun exploration of advanced uses of generators, see :A Curious Course on Coroutines and Concurrency and https://www.youtube.com/watch?v=Z_OAlIhXziw.