The itertools.product function computes the Cartesian product of a collection of iterables. However the implementation works by completely consuming the input iterables and keeping pools of values in memory to generate the products. The documentation notes that means that “it only useful with finite inputs” but in practice these iterables must also be small enough to convert into tuples. This can result in a variety of issues including:
>>> from itertools import product
>>> next(product(range(1 << 30), repeat=2))
MemoryError
>>> next(product(range(1 << 31), repeat=2))
OverflowError: Python int too large to convert to C ssize_t
>>> from itertools import count
>>> next(product(count(), count())
# Hangs
I would like to suggest adding a variant of the product function to the itertools library which can handle such cases, however I suspect that this would most likely need to be a separate function. As a concrete example to consider I would propose the following implementation which takes an item from each iterable in turn and yields all the tuples that can be formed from it and all the previous items that have been seen before adding it to the store of items that have been seen.
from itertools import cycle, product, tee
def iproduct(*iterables, repeat=1):
iterables = [item for row in zip(*(tee(iterable, repeat) for iterable in iterables)) for item in row]
N = len(iterables)
saved = [[] for _ in range(N)] # All the items that we have seen of each iterable.
exhausted = set() # The set of indices of iterables that have been exhausted.
for i in cycle(range(N)):
if i in exhausted: # Just to avoid repeatedly hitting that exception.
continue
try:
item = next(iterables[i])
yield from product(*saved[:i], [item], *saved[i+1:]) # Finite product.
saved[i].append(item)
except StopIteration:
exhausted.add(i)
if not saved[i] or len(exhausted) == N: # Product is empty or all iterables exhausted.
return
yield () # There are no iterables.
This handles a wide array of different circumstances but importantly it produces the same pairs as itertools.product
(up to ordering)
>>> from itertools import islice, product
>>> X = [i+1 for i in range(20)]
>>> Y = [2*i for i in range(5)]
>>> print(*islice(iproduct(X, Y), 10))
(1, 0) (2, 0) (1, 2) (2, 2) (3, 0) (3, 2) (1, 4) (2, 4) (3, 4) (4, 0)
>>> print(sorted(iproduct(X, Y)) == sorted(product(X, Y)))
True
But since it doesn’t start by building tuples of the inputs it can handle very large iterables.
>>> X = (i + 7 for i in range(2 << 100) if str(i).startswith('2'))
>>> Y = (i for i in range(10**10))
>>> print(*islice(iproduct(X, Y), 10))
(9, 0) (27, 0) (9, 1) (27, 1) (28, 0) (28, 1) (9, 2) (27, 2) (28, 2) (29, 0)
This also means that it also handles infinite iterables perfectly well.
>>> from itertools import count
>>> X = (i for i in range(20))
>>> Y = count()
>>> print(*islice(iproduct(X, Y), 10))
(0, 0) (1, 0) (0, 1) (1, 1) (2, 0) (2, 1) (0, 2) (1, 2) (2, 2) (3, 0)
>>> X = count()
>>> Y = count()
>>> print(*islice(iproduct(X, Y), 10))
(0, 0) (1, 0) (0, 1) (1, 1) (2, 0) (2, 1) (0, 2) (1, 2) (2, 2) (3, 0)
Empty iterables result in an empty product and no iterables result in the empty tuple.
>>> X = (i for i in range(0))
>>> Y = count()
>>> next(iproduct(X, Y))
StopIteration
>>> print(*iproduct())
()
>>> X = count()
>>> print(*iproduct(X, repeat=0))
()
And has repeat which works like itertools.product(..., repeat=...)
>>> X = count()
>>> print(*islice(iproduct(X, repeat=3), 10))
(0, 0, 0) (1, 0, 0) (0, 1, 0) (1, 1, 0) (0, 0, 1) (0, 1, 1) (1, 0, 1) (1, 1, 1) (2, 0, 0) (2, 0, 1)
>>> X = count()
>>> Y = 'ABC'
>>> print(*islice(iproduct(X, Y, repeat=2), 10))
(0, 'A', 0, 'A') (1, 'A', 0, 'A') (0, 'B', 0, 'A') (1, 'B', 0, 'A') (0, 'A', 1, 'A') (0, 'B', 1, 'A') (1, 'A', 1, 'A') (1, 'B', 1, 'A') (0, 'A', 0, 'B') (0, 'A', 1, 'B')
The only notable difference between this and itertools.product
is that the tuples produced are in a different order. So while itertools.product
creates its pairs so that if the input’s iterables are sorted then the product tuples are emitted in sorted order (lexicographic ordering), if the input iterables to iproduct
are sorted then it produces its tuples sorted by the following:
>>> def grade(x):
>>> m = max(x)
>>> return (m, [i for i, y in enumerate(x) if y == m][-1], x)
>>> X = range(5)
>>> Y = range(10)
>>> print(list(iproduct(X, Y)) == sorted(product(X, Y), key=grade))
True
Due to this difference I don’t think that it would be appropriate for iproduct
to replace itertools.product
but this could be included as a separate function.