The typeshed builtins.pyi stub indicates that the str class derives from Sequence[str]. This is somewhat consistent with runtime behavior in that Sequence is a registered virtual base class for str (and hence isinstance(str, Sequence) evaluates to True), but it’s also somewhat inconsistent because Sequence is not a real base class of str and does not appear in its __mro__.
Many mypy and pyright users have requested that strnot be assignable to Sequence[str] or Iterable[str]. This is a frequent source of bugs, and it’s something a static type checker could catch.
Here are some historical threads that discuss this pain point:
As a contributor to pyright, I’m always reluctant to override the type information provided in typeshed with custom logic in the type checker. Doing so has almost always resulted in unintended side effects, surprising behaviors, and maintenance nightmares. I’m even more reluctant to override the subtyping rules documented in the typing spec. I’d much prefer that we collectively decide on a desired behavior and reflect this in the type stubs.
I just did an experiment where I changed the typeshed definition of str to not derive from Sequence[str]. The fallout of this change (as measured by mypy_primer output) was much less than I anticipated. It affected only six repos in mypy_primer (out of many dozens). Across these six repos, there were only about ten code sites that were affected. Approximately half of them appear to be actual bugs. Most of the remaining cases affect only test code. All of them have a straightforward and obvious fix: specify str | Sequence[str] if the intent is to support both types.
Would it make sense to make this change in typeshed? Has this been suggested and tried in the past?
I admit that such a change will have some backward compatibility impact, but I think it will be relatively minor. If we can agree that this is the right thing to do, it might make sense to “rip the bandaid off”.
I agree that there are good reasons for experimenting with removing Sequence[str] as a base class of str, but I don’t think this is one of them, unless you’re also proposing to remove (Mutable)Sequence as a base class of bytes, list, tuple, memoryview and bytearray. The vast majority of collections in the Python standard library do not inherit from the abstract base classes in collections.abc or typing at runtime.
my 2c on this is that it’s the right thing to do, but that there’s not a good and low-breakage path to doing it; doing it right now without other changes would cause other worse design warts.
str doesn’t conform to Sequence, as has been explored before (see __contains__ contravariance violation), but this will introduce a new wart in its place with isinstance(str, Sequence) I believe this to be a worse wart than the current issues with Sequence[str] if not addressed as part of this.
Generally speaking, I don’t think it’s safe for type checkers to narrow on isinstance involving ABCs or types that have defined __instancecheck__ or __subclasscheck__, but changing that behavior would violate longstanding user expectations. I also don’t think we can remove the registration that happens at runtime, even if with the benefit of hindsight we would say this isn’t something we would register. These are methods of bypassing subclassing which are neither typesafe nor possible to be checked for type-safety and are explicitly escape hatches.
This was on a list of things I thought would likely never be entirely fixed due to the history and many decisions that predated the type system, so I’m pleasantly surprised to see some appetite to address this one, but this was also one of the harder ones from that list to have a suitable long term fix for everyone in my estimation.
After we have difference types, I think the best solution is to have a strict rule in the type checkers (maybe not enabled by default) to disallow Sequence[str] (and other applicable collection types) unless they have a union or a difference with str
Disallowing Sequence[str], Iterable[str], etc on it’s own does not avoid this issue, it would apply to user defined structural types too, and to more than just str
This applies to all negated types, even those not the subject of discussion right now. You can’t use a negated type without proof the negation holds. Special casing Sequence[str] to be the one case you don’t need to check this anymore would require everyone writing more type info that isn’t even accurate, which is harmful to user intuition and ease of use.
The important parallel is below:
class A:
...
class B(A):
...
def function_returning_seqa() -> Sequence[A]:
...
def runtime_penalty_req(s: Sequence[A] & ~Sequence[B]):
...
How do you ever call this function safely? If you know it is a sequence[A], this is a covariant context, and B is always a possibility unless removed. you need positive proof that it isn’t B
def seq_narrower(s: Sequence, inner_type: T) -> TypeIs[Sequence[T]]:
...
def safe():
s = function_returning_seqa()
if not seq_narrower(s, B):
runtime_penalty_req(s)
What you’re proposing here is to special case the exact negation : Sequence[str] - str from this, and require everyone to write Sequence[str] | str to mean Sequence[str] as it exists today and Sequence[str] - str to mean the “correct” way.
The special casing alone should sink this, this won’t compose well if this type arises inside a more complex type
Honest question, do people actually rely on isinstance(x, Sequence) where x is a str? I’ve only seen discussion about how this is always an api design issue. How hard would it actually be to just fix all of this, runtime and typing?
The only way I see to totally fix it would be a fundamental change to the Python language.
The problem is that when you iterate through a str, the type yielded is str
for x in "hello":
print(type(x))
I think this ^ is the api design issue. But we can’t change that without at least a change to Python 4.
If we didn’t have this as a fundamental part of Python, sure, people could make their own data structures with this problem, but at least we could more easily avoid it and recommend against designing data structures this way. But with this being a fundamental part of Python, we don’t really have a way out of it.
That’s not a problem here, you may want to see the prior discussions on the matter. it isn’t an issue that str is Iterable[str], the problem is a combination of Sequence having str be registered as a subclass at runtime, and a lie in the typeshed to match that, allowing str to claim compatibility with Sequence (it isn’t compatible, as mentioned above, it’s currently in the typeshed as being a subclass of Sequence, which required type ignores to make happen due to the incompatibility)
There might be a way to do a slightly more powerful (still ugly) fix here
Adding a rule where for all G[T] where T is a type that would normally be compatible with G[T], T is not assignable to G[T] and requires a specification of G[T] | T, ensuring that the use of G[T] to include T is intentional.
This covers all basic cases with str and does not require modifying Sequence, str, isinstance, or waiting for type negation. I’m unsure of the impact or if this causes collateral damage for recursively defined types.
I won’t speak as to the consequences or history but I will say this is also a common request from Pydantic users (who want Pydantic to essentially prevent the bugs).
I would generally prefer if we’d try to move to stubs that more closely resemble the runtime implementation. In the past we’ve used quite a few hacks to make things work one time or another, but often while these hacks help in one particular case, they make life harder in other cases. As the typing spec and type checkers have matured, often these hacks are more of a liability.
That said, moving off these hacks can be a challenge, but a challenge that will be worth it, in my opinion. One idea for these “virtual abc base classes” could be to have a special decorator like @collections.abc.register that can be used to decorate classes:
@register(Sequence)
class str:
...
This of course would need special casing by type checkers.
Also it might be worthwhile to consider how we could replace the ABCs with protocols or turn them into protocols somehow.
>>> from collections.abc import Sequence
>>> @Sequence.register
... class B: pass
...
>>> B
<class '__main__.B'>
>>> issubclass(B, Sequence)
True
So we could use this in stubs and reflect the runtime accurately. However, we’d need to amend the spec to require type checkers to support this decorator.