It would be nice if the `next()` function had a `count` argument

hypernova · July 5, 2024, 6:38pm

It would be nice if the next() function had a keyword argument (possibly called count) which could be used instead of calling next() count times.

Example:

# Current code
next(thing)
next(thing)

# Suggested alternative
next(thing, count=2)

This comes from a real world use case which I encountered today.

It comes from a loop which looks like this:

index_range = range(len(dataframe))
for index in index_range:
    row = dataframe.iloc[index]
    # do some processing on row
    # sometimes, need to skip some rows, in pairs
    if some_condition:
        next(index_range)
        next(index_range)

I’m hoping this code does what I expect. I haven’t actually tested the behavior of the object index_range in production yet. Still, regardless of how index_range behaves, the double call to next() could be replaced with a single call next(_, count=2).

Im sure there must be other usecases. One other usecase which comes to mind is to depeat some iterable.

This could be done if there was a way to specify that next() should repeatedly skip elements of the iterable until a StopException is raised.

In otherwords, a way to say count=infinity.

I’m not sure what a good API for that might look like. Inspiration could perhaps be taken from the min and max key function keyword argument.

Possibly something like next(iterable, until=None), which would call next() repeatedly until the iterable is empty.

ericvsmith · July 5, 2024, 7:10pm

What should the return value be?

Stefan2 · July 5, 2024, 7:42pm

It doesn’t, as a range object is no iterator.

What’s that?

Ugh, those functions are frustratingly slow, I think because of their parameters. Wouldn’t want next to become slower than it already is.

Stefan2 · July 5, 2024, 7:47pm

You could use consume(index_range, 2) with consume from the itertools recipes or more-itertools.

dg-pb · July 5, 2024, 7:50pm

Or if you need to get the value, the equivalent would be next(itertools.islice(it, n-1, None))

hypernova · July 5, 2024, 9:29pm

Sorry I meant “depleat”

hypernova · July 5, 2024, 9:30pm

Possibly the last value which would have been returned by the final call to next() if a series of individual next() function calls had been made?

effigies · July 6, 2024, 12:57am

If you just want to consume the iterable:

from more_itertools import consume

x = iter(range(20))
consume(x, 5)
print(next(x))

If you also want to store the list of consumed values:

from more_itertools import take

x = iter(range(20))
vals = take(5, x)
print(next(x))
print(vals)

5
[0, 1, 2, 3, 4]

If you just want to store the last consumed value:

from more_itertools import nth_or_last

x = iter(range(20))
val = nth_or_last(x, 4)
print(next(x))
print(val)

5
4

This does not seem to need a change to the builtins.

Eneg · July 6, 2024, 5:59am

The loop over range(...) just to index something is a code smell.
The use case doesn’t sound compelling when the same could be achieved with existing first party tools (forementioned itertools), or just rewritten as

skip_count = 0

for row in dataframe:
    if skip_count:
        skip_count -= 1
        continue

    ...
    if condition:
        skip_count = 2

Rosuav · July 6, 2024, 7:56am

You aren’t indexing. You are stepping through an iterator up to a certain point.

If you need a list, use a list.

Eneg · July 6, 2024, 12:12pm

Their example shows a range based for loop, which is immediately used to index something. I was referring to that.

hypernova · July 6, 2024, 12:30pm

It may be a code smell but there is no way around it. What it is doing is inspecting a sequence of rows in a dataframe based on some trigger condition. Once those rows have been inspected, they must not be read on the next iteration. They must be skipped.

Introducing a new variable skip_count is also a code smell. It’s a worse solution than using next(), although I agree having to factor out the range() statement really isn’t ideal. But I don’t see any alternative to that.

I was trying to avoid creating a list of index values from the range() statement. It might be there is no way to avoid this, however.

Stefan2 · July 6, 2024, 12:44pm

Use an index variable and a while-loop?

index = 0
while index < len(dataframe):
    row = dataframe.iloc[index]
    index += 1
    # do some processing on row
    # sometimes, need to skip some rows, in pairs
    if some_condition:
        index += 2

Rosuav · July 6, 2024, 1:02pm

Sounds like a job for enumerate to me.

Stefan2 · July 6, 2024, 1:03pm

How would you use that?

GotoRoto · July 6, 2024, 10:04pm

The code you provided will not execute because the range class is a subclass of the Iterable class, not the Iterator class, and as such doesn’t have a next method, due to the range class also being a subclass of the Sequence class.

The code you probably intended to write is here below:

index_range = range(len(dataframe))

index_iter = iter(index_range)

for index in index_iter:
    row = dataframe.iloc[index]
    # do some processing on row
    # sometimes, need to skip some rows, in pairs
    if some_condition:
        next(index_iter)
        next(index_iter)

I also must ask if your processing is mutating the row objects themselves or if your just calling a function to the row and yielding the results or your reducing the rows to some value. I also must ask what is the some_condition supposed to represent? Is it a function call, a loop variable, etc?

jamestwebber · July 7, 2024, 1:59am

The way around it is to have tidier data, such that rows are not dependent on each other like this–that’s a smell^[1]. That might not be possible in this scenario (perhaps the data is from an external source) but it’s the best solution.

a data smell? ↩︎

will_f · July 7, 2024, 6:44pm

Another possibility, on the assumption that the two (or more or less) rows to continue over share a common feature with the examined row (i.e. that the dataframe object is sorted in some way), is itertools.groupby:

from itertools import groupby


for feature, rows in groupby(dataframe, key=lambda df: df["my_column"]):
    
    # fetch a row
    row = next(rows)
    
    # do some processing on row
    print(feature, row["my_column"])
    
    # skip the rest of the rows with this feature
    if not row["my_column"]:
        continue

This is nice and idiomatic-- the code doesn’t need to know how much to consume, it can just discard the iterator of remaining rows if the first indicates so. There are dataframes out there that have a groupby member also.

On the topic of adding parameters to next, it seems reasonable at first glance. If count is implemented, my personal opinion is that times is more intuitive of a parameter name. On the other hand, it’s reasonable to worry that this enhancement might collide with next’s parameter default. Thanks in advance for sharing specifics on how this activity can hamper the performance of next.

hypernova · July 8, 2024, 10:40am

In this case, it is read only

GotoRoto · July 9, 2024, 8:18pm

Could you share more details regarding what the data is exactly and what the processing is doing with that data? It would be very helpful if you also specified why you had to use this confusing, unusual algorithm to get the results you want? What are the results you want anyway? Why is the data structured so poorly that a row with certain qualities requires the algorithm to skip the next two rows?

If you can’t specify relevant details of the problem you provided and if you don’t answer the clarification questions posed by said users, they won’t understand the use cases your proposal will solve and thereby won’t be convinced to go along with your proposal.