The purpose of next(iter(x)) is to get the first item from x, if x is ordered, or any item otherwise. The x itself is not modified.
I think a specialized function called for example getone(x) would bring some benefits over next(iter(x)). I’m just not sure if those benefits are sufficient to justify a change. That’s why I don’t have a well prepared proposal.
My opinion is:
Readability-wise is next(iter(x)) not great. It is difficult to understand until you remember it as an idiom.
A lot more interesting question is:
could getone() have better performance?
I don’t know for sure. Of course we can immediately return x[0] for all lists, tuples and strings, but calls of the type next(iter(plain_sequence)) must be quite rare. But I suspect that with little bit of C code we can avoid the creation of a temporary iterator for dicts and sets. If dicts and sets are used in majority of use-cases (again, I don’t have data), not having to create a temporary iterator object could be an improvement worth further discussion.
I don’t see how next(iter(x)) is “difficult to understand”. I find it completely obvious. Straightforward combination of two of the most basic built-ins.
This is one of ideas that pops up every year. You can find past discussions here and on the old mailing lists. The main problems:
It is too niche for builtins. So it can only be added in itertools, and still, all other itertools generators either much more used or much more complex, so it may not reach the bar. Many users would razer continue to use obvious next(iter(x)) idiom (which works in all versions) than add an import.
It cannot achieve one of your goals – getting the first item without modifying the input. If x is a file, a generator, etc, it will be modified. And this makes the code more errorprone. With next(iter(x)) this is at least more explicit.
While it is more or less obvious what does next(iter(x)) do, the “other way around” it’s not. Imagine a beginner wanting to get one element from a set. I find it more probable that (s)he will end up with pop+add as with next(iter(...)).
The most important difference between first and next(iter) is that when the iterable is empty first raises ValueError rather than StopIteration. Leaking StopIteration is bad because of the way that it interacts with other loop constructs. There was a PEP precisely to prevent this problem with generators but it is still unsafe to leak StopIteration when using other things like map.
Personally I think that there should be a first function in builtins precisely to stop people from being tempted to use next badly.
As evidence of this bad use there is even a ruff rule RUF015 that gives out bad advice for using next and some people try to apply its unsafe fixer to codebases e.g. this SymPy PR.
I’m not too sure about builtins but I’ll definitely love to see it in itertools. I think it’s definitely one of those things that’s not worth adding a dependency (more-itertools) for but the naive approach of next(iter(x)) can be a footgun if you aren’t aware of StopIteration’s semantics.
Do the existing set.pop and dict.popitem methods do what you want here?
>>> {1,2}.pop()
1
>>> {1:2}.popitem()
(1, 2)
Those aren’t equivalent because they mutate the set/dict but looking through the examples in the PR that I linked in most cases the intention is just to get the single element from a set of size 1 and then discard the set. We don’t want to mutate the set but we also don’t care about mutating it.
In fact most of those cases could just be written as
[a] = b
but there is no equivalent of this unpacking that can be used inline in an expression. This is actually more_itertools’ one function rather than first with the difference being that it also raises an exception if the iterable is not of length 1.
More generally if you want to get the first item from an iterable then it makes sense to have a function like first that works for any iterable and uses the general iterator protocol. This is what people want when they use next(iter(obj)). In my experience 99% of the time if someone suggests using next then it would be strictly better to use first with the important difference being just which exception is raised on an empty iterable.
The only situation in which it is potentially correct to raise StopIteration is in the __next__ method of an iterator:
class map:
def __init__(self, func, iterable):
self.func = func
self.iterator = iter(iterable)
def __next__(self):
# If the underlying iterator is exhausted
# then we want to propagate StopIteration
return self.func(next(self.iterator))
If you find yourself using next in anything that is not the body of a __next__ method then it would be better to use first instead because raising StopIteration is always wrong. Correct use of next in these other contexts needs to either catch the exception or pass a default value so that next will catch the exception.
Sometimes I don’t want to modify existing data. And if I don’t care about the rest of the set/dict, its modification is useless. I had the impression that Python developers were trying really hard to improve the execution speed in last years.
By far the most of the use-cases I saw are related to the dicts. They are ordered and if this feature is actively used in the code, chances are the very first item has an important meaning. Some programs need the first key, some the first value. Maybe a dict.getfirstitem() would be a more descriptive name for a method doing what they need.
Other uses-cases are infrequent:
I think I saw next(iter(xset)) in some graph-related algorithms, can’t find it now.
logging a set of error messages as “some_error_message and N others”
code that I don’t understand. For example asyncio/locks.py contains:
# note: all comments added
# self._waiters is either None or a deque
if not self._waiters:
return
# after the test above it cannot be None, it cannot be empty,
# so why not just "fut = self._waiters[0]" ?
try:
fut = next(iter(self._waiters))
except StopIteration:
return
For this asyncio/lock.py example : isn’t it for being many-threads safe ? the _waiters attribute could be modified (emptied/flushed) by some other thread thus (after the if check on its truthy value, but before the next(iter()) over it) ?
There is also that issue with StopIteration you wrote about yesterday. It got many likes (), more than my posts. Are those two points together still weak?
For completness, a benchmark of something comparable:
Not actually very comparable. On a microbenchmark like this, the fact that one of your examples is looking up three globals and the other is looking up one global is highly relevant.
Might be. I write applications. I cannot read bytecode and I did not study Python internals. Could somebody more experienced please be so kind and post a corrected benchmark?
I was comparing a direct access to the first element vs. next(iter()). I cannot do that for sets and dicts, so I decided to use a sequence as a closest match.
If looking up the globals is the main cost of the operation then it is reasonable to measure that cost in the microbenchmark. The important performance question is: when would any real task be dominated by the cost of getting the first item from an iterable?
Almost by definition you won’t have a tight loop in which you need to get the first item from an iterable many times because an iterable only has one first item. I can come up with contrived situations where you have many sets and need to get the “first” item of each in a tight loop but I find it hard to imagine a real situation where the cost of this operation is a bottleneck. In a real situation you would at minimum also have the cost of creating the many iterables or of doing something with the many first items from the iterables.
There are much better things to focus on if you want to make real Python programs faster.
Personally I am convinced by the StopIteration issue alone. Above it is suggested that wanting an alternative to next(iter(...)) is “too niche for builtins”. I would rather say that the situations where it is reasonable to use next rather than first are niche and the vast majority of uses of next in the wild (grep.app) would be better served by something like first. The situation is that we already have the wrong function as a builtin and people use it even though it’s wrong because it is a builtin.