# Simple generators should have `__len__` or `__length_hint__` when possible

Simple generators such as `(x+10 for x in foo)` or `(0 for _ in range(100))` should offer `__len__` and `__length_hint__`. To achieve that, these methods can in turn call `foo.__len__()` and `foo.__length_hint__()`.

2 Likes

What’s the plan when the comprehension has a conditional (aka `if` part)?

3 Likes

By Laurie O via Discussions on Python.org at 04Sep2022 23:06:

What’s the plan when the comprehension has a conditional (aka `if`
part)?

Let’s ignore `__len__`. But `__length_hint__` could legitimately return
`len(foo)` anyway, or alternatively raise ` NotImplemented` which is
also defined by the spec.

Cheers,
Cameron Simpson cs@cskk.id.au

There have been many suggestions to add length to iterators (including generators). But there is a serious problem with that: the length of an iterator is not well-defined.

Here is a simple example. Let’s start with a list, `L = [10, 20, 30, 40]`. The length of L is well-defined, and obviously 4. So that’s fine.

Now let’s make an iterator. Either of these will do:

``````it = iter(L)
it = (x for x in L)
``````

What’s the length of `it`? We want the invariants:

1. The length of the iterator should be the same as the length of the list L.
2. That means that the length of the iterator should be constant, unless we modify (add or remove items) the list L.
3. If the length of the iterator is N, then `for a in it` should loop N times.

Another way of stating 3 is that if you call `list(it)`, the length of the new list should be N, which of course is the length of the original list L.

Those 3 invariants are very simple. Unfortunately they are impossible. Consider:

``````L = [10, 20, 30, 40]
it = iter(L)
next(it); next(it); next(it)
assert len(L) == 4  # the list L has not changed
print(len(it))
assert list(it) == 
``````

What would you expect to be printed?

If it prints “1”, that breaks invariant number 1 and 2, but keeps invariant number 3. It also requires the iterator object to keep track of how many values it has yielded.

If it prints “4”, that keeps invariant number 1 and 2, but breaks invariant 3.

Either way, whichever answer you give, some of the invariants will be broken, and `len(it)` will behave weirdly and confusingly, and upset people.

``````L = [10, 20, 30, 40]
it = iter(L)
next(it); next(it); next(it)

# Option 1:
assert len(it) == len(L)  # Passes.
assert len(it) == len(list(it))  # Fails.

# Option 2:
assert len(it) == len(L)  # Fails.
assert len(it) == len(list(it))  # Passes.
``````

Iterators with a length are fundamentally broken.

I feel like the biggest problem here is deciding whether whether `__len__` or `__len_hint__` should measure the total length of a generator at creation time regardless of how many times `next()` has been called or whether it should measure steps needed to exhaust it, effectively being equivalent to `sum(1 for _ in itertools.tee(generator))`. From a commonplace intuition standpoint both appear equally valid to me, and that can raise some confusion.

I do not understand why you are considering the points 1. and 2. They do not make sense for `len()`. They only make sense with a freshly created iterator before we start to consume it.

I think that if `len()` has to be implemented for iterators then only the point 3. should be considered.