Built-in `map` function does not forward __length_hint__

laclouis5 · October 4, 2022, 10:18pm

I would have expected the map() return iterator to forward the __length_hint__ from its input iterator as an optimization, but it does not seem to be the case:

from operator import length_hint

it = map(lambda v: v, [1, 2, 3])
l = length_hint(it)  # expected: 3, actual: 0

The same behavior exists for generators. Is there a practical reason for this behavior?

cameron · October 4, 2022, 11:34pm

I expect map() well predates __length_hint__. Maybe when
__length_hint__ was implemented nobody thought to do this propagation?
It seams like a reasonable thing to want and since map() is 1-to-1 the
propagation seems like a correct thing to do.

Cheers,
Cameron Simpson cs@cskk.id.au

rhettinger · October 4, 2022, 11:44pm

Yes. I explored this path and it was a dead end. See the docstring for test_iterlen. Several problems arose but the biggest is summarized in the last paragraph:

The iterators not listed above, such as enumerate and the other itertools,
are not length transparent because they have no way to distinguish between
iterables that report static length and iterators whose length changes with
each call (i.e. the difference between enumerate(‘abc’) and
enumerate(iter(‘abc’)).

steven.daprano · October 4, 2022, 11:59pm

For generators, absolutely. The length of a generator is, in general, completely unpredictable, and may be infinite.

This applies to both generator comprehensions and full def ... yield generators. It often will require extremely sophisticated analysis, and even human intelligence, to determine even a hint of what the length might be, and may be impossible to predict.

def collatz(n: int):
    if n < 1:
        return
    while n > 1:
        yield n
        if n % 2 == 0:
            n = n//2
        else:
            n = 3*n + 1
    assert n == 1
    yield n

It is not even known whether the Collatz sequence terminates for all values of n, let alone how long the sequence is.

Asking the interpreter to compute a __length_hint__ for generators is a waste of time. It would be accurate only in a tiny fraction of cases, and even then only under the most trivial circumstances. It hardly seems like it is worth the effort.

In the case of map, I guess that the length of the map iterator cannot be different from the length of the input. So I suppose we might request an enhancement, the map iterators copy or otherwise expose whatever length hint their input provides.

But… why bother? It’s not that I oppose the suggestion that map iterators forward the length hint, but I do wonder why you want it to?

cameron · October 5, 2022, 6:26am

To my mind: for the same reason you’d want a length hint in the first
place, on anything. If I go:

 list(something_hinted)

it can use the length hint meaningfully. And:

 list(map(transform, something_hinted))

has exactly the same use for a length hint; it’s just making a lint of
the transformed values instead.

In short, if the thing you’re mapping had a length hint, the hint’s just
as valid for the output of the map. Why wouldn’t you propagate it?

I frequently (reflexively sometimes) reach for list(map( before
muttering irritably and writing a list comprehension

Cheers,
Cameron Simpson cs@cskk.id.au

laclouis5 · October 5, 2022, 8:16am

Thanks all for your insights!

My main use case was tqdm and other progress bar utilities. To display the estimated progression during iteration, the size of the iterable is required. I discovered that many functions returning map-like objects such as map, enumerate but also ThreadPoolExecutor.map() and many others do not forward __length_hint__. The workaround is simple but quite cumbersome.

Another advantage of length hints being propagated is simply optimization of allocations.

my expectation was that there is no distinction between iterators with a static size vs ones with a dynamic size, operator.length_hint() would simply return the length hint returned by the iterator at the time of the call. Since __length_hint__ is a best effort implementation, in case the length changes it could simply return the length of the original iterable.