Checking the length and then accessing items is a fairly common pattern for collections in Python. Currently, typeshed’s custom SupportsLenAndGetItem protocol is used around 30 times in typeshed, although it is likely that more potential uses are hidden in other type annotations, especially Sequence.
Currently, the best way to type annotate these items is to use either _typeshed.SupportsLenAndGetItem – which doesn’t exist at runtime with all the problems that this brings – or using collections.abc.Sequence. The latter has a much broader interface than SupportsLenAndGetItem and is not considered a protocol for typing purposes (as opposed to most other ABCs in collections.abc).
I suggest we add another ABC to collections.abc, which will get treated as a protocol in typeshed:
One of the key differences between ABCs and protocols is that whereas both make promises about method signatures, ABCs also make promises about the meanings of the methods.
For example, I noticed that bisect_left uses SupportsLenAndGetItem. But it’s not just counting on having access to the methods __len__ and __getitem__. It’s also counting on the interface invariant whereby __getitem__ can be called with an integer (or integer-like) index and returns the item at that index.
Typically, you would annotate something like that Sequence. Is there a reason why Sequence is not acceptable here?
Similarly, numpy.concatenate is annotated to accept SupportsLenAndGetItem despite having the comment: # NOTE: Allow any sequence of array-like objects.
I don’t think that’s true. On the type check side there is no difference between the two, except that ABCs are less flexible as they require a sub-class relationship. On the implementation side, an ABC could enforce some invariants, but that’s not guaranteed. I don’t think the the collections.abc ABCs enforce any invariants. On the documentation side, both protocols and ABCs can be documented with invariants.
You’re absolutely right that interface behavior promises are not enforced (they can’t be).
There are however implicit promises made by ABCs. In my opinion, collections.abc.Sequence is implicitly promising that __getitem__(self, i) returns the ith element. This is exactly what it provides when you use its generated mixin method __iter__: the elements in order.
It seems that the two users of SupportsLenAndGetItem are in fact depending on the promise that __getitem__(self, i) returns the ith element. Therefore, why not use a Sequence?
Sequence requires a bunch of other methods besides __len__ and __getitem__
What if someone wants to pass to numpy.concatenate something that doesn’t implement index, and count, and __reversed__?
Inheriting from Sequence only requires__len__ and __getitem__. The Sequence mixin provides the other methods that you mention.
If they inherit from Sequence, they get all of that stuff for free. Therefore, you are hypothesizing an object that doesn’t want Sequence to mix in the additional methods using its knowledge of how “sequences” work, right? So, the object is not a sequence.
Now, consider some function accepting SupportsLenAndGetItem. Presumably it is going to use the object without making any assumptions of it being a sequence. Are there any examples of such functions?
numpy.concatenate is not a good example because it is assuming that it is receiving an actual sequence (in the conceptual sense: successive indexes in the range [0, n-1] give the entire sequence).
I don’t think adding more ABCs or runtime checkable protocols is a good idea for this. I’d be fine with a non-runtimecheckable protocol that’s public and supported in typing for this, or waiting for intersections so people can just do something like compose Indexable[int, slice, int | slice] & Sized
The fact that someone doesn’t want to inherit from Sequence doesn’t mean that the object is not a sequence.
The reason for not wanting to inherit it could be because of concerns about import timing or something like that.
What do you mean by import timing? You mean time to import collections?
Also, I want to raise another issue with using SupportsLenAndGetItem for numpy.concatenate. It means that something like this passes type checking:
x = {3: np.ones(10), 5: np.zeros(10)}
np.concatenate(x) # Passes!!
This reduces some of the benefits of type checking.
I would guess that 99% of uses of np.concatenate already use list or tuple. And the future of numpy (the Array API) only accepts list and tuple (and nothing else). The benefit of tighter annotations are more errors caught.