What is the correct collections.abc type to check isinstance of for a container that can be iterated over multiple times?
I need to iterate over a container multiple times and would like to ensure it’s possible or accumulate it into a list if not. Would it be more reliable to check isinstance(container, abc.Iterator) instead?
You can’t really truly guarantee this, since Python is a dynamic language and the implementation is free to do whatever.
I usually use Collection for this if the order doesn’t matter, and Sequence otherwise. This signals intent well enough while avoiding Generator/Iterator from being passed in.
I settled on accumulating elements of the container in a auxiliary list as the most reliable option.
There is a difference between Iterator and Container. You probably want to use a collections.abc.Collection, I would say that any implementation of that that can’t be iterated more than once is misbehaving. (but this isn’t actually something an isinstance check can truly define)
In my case I may receive a generator and wanted to reliably detect such case. So I suppose calling it a container is not exactly right.
You could check if the object has a __next__ method, since only an Iterator[1] should have it. Although that would not be 100% reliable. It seems safer to allow a small set of commonly used containers e.g. list/tuple/set to pass through unchanged, while the rest is copied into an auxiliary list.
this includes generators ↩︎
That forces the caller to call iter on input args. for loops create iterators from non-iterator iterables automagically
I don’t think there’s anything ideal that can be described as “correct”. Most of the time if I’ve written a function that will iterate over an arg twice, I just want to include sets as well as lists and tuples (and why not dicts too?). So I simply note that Set inherits from Collection.
I wasn’t suggesting to change the parameter annotation to Iterator, it was intended as a pure runtime check.
From how i understood OP they want a runtime check to detect potential iterators amongst the passed in values, since they accept arbitrary iterables.
So my suggestion was along the lines of
def foo(x: Iterable[T]):
if hasattr(x, '__next__'):
# create a copy of the values so x can be iterated
# over multiple times
x = list(x)
...
But as I mentioned that’s not as robust or safe as something like:
def foo(x: Iterable[T]):
if not isinstance(x, (list, tuple, set, dict)):
# create a copy of the values so x can be iterated
# over multiple times
x = list(x)
...
I understand now - thanks for the clarification.
Both examples are nice, as long as x is small enough to fit a list of it in the available memory, but isn’t great for huge files.
I think the main concern is someone passing in an Iterator. I’d use the first one but raise an Exception that tell the caller Iterators re not supported, or not to call iter on their repeatable-Iterable.
After that rely on EAFP. If the caller’s managed to construct a one-shot Iterator without a __next__, personally I think their code has created the problem, and they’re best placed to fix it.
Yeah, personally I wouldn’t really do either check and just change the annotation from Iterable to Collection or if possible change the algorithm so Iterable always works. But that wasn’t really the question.
Just a few suggestions. Most of the time in this situation, the repeatable Iterable, should yield the same items, right (possibly reordered)? In that case, it’s still natural to define a length or size of the container, or file. So Sized might be useful. I once used a SizedIterator wrapper for Shape Files - even if it is not desirable to an entire file as a Container in memory, they still have a natural length, in the metadata in the file header.
Otherwise, it’s a bit Javascripty, but perhaps the most versatile arg for the function is a callback function, Callable[[],Iterator], that returns a new Iterator whenever the function requires it? This would also support multiple iterations yielding different numbers of items
In my case it could be a generator and quite an expensive one to rewind and it’s not guaranteed to produce the same result if restarted, so Sized wouldn’t be appropriate.
I think just accumulating the elements is the best option here.
Absolutely. If you can cache it, then just cache it. Simples.
An iterator without a __next__ … is not an iterator ![]()
Oops. Well spotted. That should’ve been a “one-shot Iterable without a next”
I spotted this in the docs, this morning:
Once an iterator’s
__next__()method raisesStopIteration, it must continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken.
It’s not difficult to write an incrementer in a __next__ that resets and raises StopIteration after some condition is met. But an argument can be made that noone is obliged to support repeatable Iterators. That reduces the scope to Containers.