Simple generators should have `__len__` or `__length_hint__` when possible

Simple generators such as (x+10 for x in foo) or (0 for _ in range(100)) should offer __len__ and __length_hint__. To achieve that, these methods can in turn call foo.__len__() and foo.__length_hint__().

2 Likes

What’s the plan when the comprehension has a conditional (aka if part)?

3 Likes

By Laurie O via Discussions on Python.org at 04Sep2022 23:06:

What’s the plan when the comprehension has a conditional (aka if
part)?

Let’s ignore __len__. But __length_hint__ could legitimately return
len(foo) anyway, or alternatively raise NotImplemented which is
also defined by the spec.

Cheers,
Cameron Simpson cs@cskk.id.au

There have been many suggestions to add length to iterators (including generators). But there is a serious problem with that: the length of an iterator is not well-defined.

Here is a simple example. Let’s start with a list, L = [10, 20, 30, 40]. The length of L is well-defined, and obviously 4. So that’s fine.

Now let’s make an iterator. Either of these will do:

it = iter(L)
it = (x for x in L)

What’s the length of it? We want the invariants:

  1. The length of the iterator should be the same as the length of the list L.
  2. That means that the length of the iterator should be constant, unless we modify (add or remove items) the list L.
  3. If the length of the iterator is N, then for a in it should loop N times.

Another way of stating 3 is that if you call list(it), the length of the new list should be N, which of course is the length of the original list L.

Those 3 invariants are very simple. Unfortunately they are impossible. Consider:

L = [10, 20, 30, 40]
it = iter(L)
next(it); next(it); next(it)
assert len(L) == 4  # the list L has not changed
print(len(it))
assert list(it) == [40]

What would you expect to be printed?

If it prints “1”, that breaks invariant number 1 and 2, but keeps invariant number 3. It also requires the iterator object to keep track of how many values it has yielded.

If it prints “4”, that keeps invariant number 1 and 2, but breaks invariant 3.

Either way, whichever answer you give, some of the invariants will be broken, and len(it) will behave weirdly and confusingly, and upset people.

L = [10, 20, 30, 40]
it = iter(L)
next(it); next(it); next(it)

# Option 1:
assert len(it) == len(L)  # Passes.
assert len(it) == len(list(it))  # Fails.

# Option 2:
assert len(it) == len(L)  # Fails.
assert len(it) == len(list(it))  # Passes.

Iterators with a length are fundamentally broken.

I feel like the biggest problem here is deciding whether whether __len__ or __len_hint__ should measure the total length of a generator at creation time regardless of how many times next() has been called or whether it should measure steps needed to exhaust it, effectively being equivalent to sum(1 for _ in itertools.tee(generator)). From a commonplace intuition standpoint both appear equally valid to me, and that can raise some confusion.

I do not understand why you are considering the points 1. and 2. They do not make sense for len(). They only make sense with a freshly created iterator before we start to consume it.

I think that if len() has to be implemented for iterators then only the point 3. should be considered.