An interesting pytype experiment, and a possible extension to strings

ambv · February 14, 2023, 3:35pm

I split off the discussion of type intersection and negation to its own topic.

In terms of special-casing str to not include its sequence nature from the type checker’s perspective, I am very interested in this idea. I found the “string is an iterable of strings” wart when working on PEP 484 and it seemed to me at the time that type intersections and negations might be a solution there. The idea to include those concepts was shot down then as it would broaden the type algebra beyond what the proponents of PEP 484 were ready to implement at that time. I accepted strings being iterable as an inevitable part of Python as a change in the default str behavior would bring “Python 4”-style backward compatibility breakage. Making this a type checker-only feature makes perfect sense to me.

In my time at Facebook, I observed this being one of the cases where Python’s type system as currently defined cannot help catch obvious programming errors. Those errors aren’t as common as missing None checks, and aren’t as tricky to debug compared to some other classes of bugs. I mean, when this happens, a well-placed unit test will discover the problem very quickly. Even just running the code rarely succeeds with this kind of bug and the data mismatch is curious and unique enough that with some experience it becomes easy to spot what went wrong.

This wart in particular contributed to type-annotated code to lean into concrete collection types. You don’t say Iterable[str] even if you only iterate. You say list[str] because it’s simpler to type, doesn’t require an import, and works around the “strings are iterables of strings” issue altogether. This is sometimes wasteful in terms of both efficiency and flexibility, but it turned out to be good enough of a workaround for me to drop pursuing this.

Now, having a type checker option to exclude the Sequence / Iterable / Collection nature from strings, that sounds like a workable solution! Especially that it’s all static analysis, it still behaves the same at runtime. Then all it needs to recover the excluded functionality is a cast() to inform the type checker that iteration is actually explicitly needed.

I’d say it’s worth trying it out in mypy too, as passing a single string where a collection of them was expected does occasionally happen and is a time waster for everybody involved. It is disappointing that the type checker is unable to spot the error in this case. I would use this mode of the type checker if it were available, and I would advertise for everybody to use it.

I am less excited about str.chars and ideas to transition to str excluding iteration, indexing, etc. Using data from the experiment, 30 bugs caught in 400 cases is barely above noise level so it suggests a change like that would be mostly churn.

Finally, the config switch being global per invocation works in a mono repo environment where all code is game for modification if needed. In the open-source world where a good chunk of your code is third-party libraries, this will be somewhat more tricky because some code will always emit the wrong kind of string or accept the wrong kind of string. Casting every time would certainly be possible but some casts would be pretty ugly when what you’re passing isn’t a string but (ironically) a collection of them like a dictionary or list, and so on.