I think, at least for type intersections it’s well established. Aside from what is discussed here, a common use-case is intersecting Protocols. Many languages support intersection types. Intersection type - Wikipedia
Totally agree. There are many example problems in the type intersection issue on MyPy.
I don’t think this is true. Have you looked through code to estimate how “likely” this is?
This would require every sequence consumer to rewrite their code to this. And why do they want to block all objects that aren’t also sequences of themselves? What about a tree node type, which exposes a sequence of its children?
I haven’t, but I presume many of the other string functions, such as e.g. str.startswith make use of str.__iter__ internally. I guess you could treat that as an implementation detail you wouldn’t bother with when using AtomicString, but still I really don’t see why you’d like to have this class as opposed to simply sticking to str. Or, are you intending to implement custom string classes?
I really don’t get what you are trying to say here. If you want to type hint a node type you should use a node type. I am not advocating for replacing Sequence[T] with Sequence[T] & ~T, if that’s what you thought I said. I am saying that if we had an intersection and not type, and you wanted to type hint a function that consumes a collection of strings, but not a string itself, one could concisely type hint it with Iterable[str] & ~str / Collection[str] & ~str / Sequence[str] & ~str.
But how would someone write their own implementation of startswith? Your argument seems to be based on the idea that typeshed can supply type annotations that don’t match the implementation. I’d be more OK with that idea if developers could do that as well. But they can’t (short of writing code in a very unnatural manner, by having a typed wrapper round an untyped implementation).
“Special cases aren’t special enough to break the rules” seems relevant to me here…
By the way - why not just propose this as a linter check? I’m sure linters could check for names declared as str and warn that they are being used as iterables. Such a check doesn’t need the full type inference of something like mypy to be useful, and people are used to choosing which linter warnings to enable, so it feels far more “opt in” than the current proposal.
Linters lack inference/knowledge to implement a check like this well. Pylint has some type related logic but something that’s complex to describe for type checker is beyond it. Many other linters lack any type knowledge and it would be large amount of work to add enough to handle this.
I do like idea of a separate error code. Mypy could report this new issue with it’s own error code and initially have it off by default (users can opt in) and based on experience/mypy primer explore impact of making it on by default.
My general feeling is this rule feels very complex to describe/opt out of so I’m -0 on it. I do think it would catch some bugs it just feels too confusing especially when we can have many paragraphs debating what rule even means.
You would have to add a single cast to re-expose it.
There’s no way to simultaneously block dangerous use of str.__iter__ without also blocking reasonable use. That’s the central question: does blocking it catch more bugs than the casting pain it inflicts. And according to the experiments done by the Pytype people, it catches more bugs.
I’ve considered that, but it has issues:
You may want to change overload resolution behaviour, and
Linters like Pylint frequently don’t implement full type-checking.
Well, I proposed this as an option initially and people were really opposed to it.
Given that Pytype has already tried this, I think that the concerns about this change being problematic are overblown. We should actually looking at the MyPy-Primer to see just how bad this would be. Is there any reason to think that such an experiment will differ from Pytype’s results?
If I needed to make a function type-check after str is no longer a Sequence[str], I suppose the natural “explicit” thing would be to do the following:
def f(x: str | Sequence[str]):
if isinstance(x, str):
x = tuple(str)
...
I assume tuple() is using str.__iter__ internally, so would that be off the table, and I would need to cast?
def f(x: str | Sequence[str]):
x = cast(Sequence[str], x)
...
Possibly I’m misunderstanding this discussion – I’m pretty new to types in Python – but I would hope that the cast would not be the recommendation. I consider cast a code smell in Python, and usually end up writing a comment explaining it so that hopefully it can be removed as typing matures.
If you're looking for a use case where it's common to pass strings as iterables, the one that I use almost daily is in medical imaging: ...
Here we describe image orientations with axis codes. For example, “RAS” means “axis 0 proceeds left-to-right, axis 1 proceeds posterior-to-anterior, axis 2 proceeds inferior-to-superior”. I have functions like the following:
For higher level functions, I will generally be passing around tuples between calls to functions like these, but if I’m ever calling one directly I am absolutely using 'RAS' over ('R', 'A', 'S') or tuple('RAS').
I’ve refrained from mentioning this before, because I don’t want to get into a big debate over performance, but x = tuple(x) is an unnecessary cost. That’s also why I dislike casts - the cast call is a runtime cost. Yes, it’s only a call to a function that does nothing (the tuple call is worse, it does a bunch of allocation) but it’s still a cost. And a lot of the code I’m currently writing does some tight loops where every performance hit counts.
So I don’t like adding unnecessary runtime costs just to placate the static type checkers.
Right, and it’s just as easy for someone to pass a single string for such a function expecting it to be treated as an atom, e.g., axcodes2affine('xyz') when they mean axcodes2affine(('xyz',)). That’s a perfect example of something that would be improved by blocking iteration and forcing explicitness.
That’s why I proposed a chars method, but a few casts is probably a smaller impact solution. You need to do something to indicate that you mean to iterate over strings.
I’d prefer to find a way to make type annotations helpful, not make them sufficiently annoying that switching them off seems like the only reasonable solution.
Well, FWIW those aren’t equivalent in general in the sense that you can annotate a variable with a new specific static type, not just turn off checking.
As already mentioned the way to do this is # type: ignore. For runtime purposes it’s just a comment. For type-checking purposes it’s no different from a cast. In Python’s typing a cast is like a pointer cast in C. It’s just a way to tell the typechecker to shut up and mind its own business because you think you know what you are doing. That is also precisely what # type: ignore is for although ignore can also be used in other cases where a cast would not work.
Of course you can express “positive integer” as a type – I think it’s called unsigned int in C, it’s called uint16 (etc) in numpy. And you can (and I have) made unsigned ints in Python [*].
But people don’t want to write a function like:
def fun(x: Uint):
...
And then require the caller to wrap a regular int with Uint()
The problem is that if you want to catch negative values of an regular integer, then you are catching a value error, not a type error. Why do we expect the static type system to catch value errors?
Of course, all of this a consequence of the fundamental impedance mismatch between a dynamic language and static type checking – for my part, I think it’s a bit much to expect the type checker to catch everything it might with a statically typed language.
-CHB
[*] to be fair only as an exercise to demonstrate how to subclass from an immutable …
That’s one of the goals of PEP 647 (type guards). One example: it will be really useful when we can start catching array dimension mismatches statically.
Wait, what? Isn’t this EXACTLY the primary problem that spurned this whole discussion? Sure there are other issues with str being a “container” of other strings, but this is the biggie.
But in any case, I for one think it could be useful