Alternative to SequenceNotStr: Char Special Form

I just had this idea and wanted to hear your feedback.

The 3 current possibilities (that I know of) for “solving” the issue of not being able to represent a sequence of strings, that is not just a string itself, are the following:

  1. Do nothing.
  2. Add SequenceNotStr, which is a Special Form that specifically does not count str as a valid sequence in its place.
  3. Wait for negated types.

My opinion on the 3 solutions (skip this if you want)

In short:

  1. I and many others have ran into this issue multiple times and I think it needs a solution, so doing nothing is not a valid solution in my opinion.
  2. In my opinion the best solution. I think this case is easily common enough to be worth adding a Special Form and if another solution may ever be introduced, it could always still be (soft) deprecated in the future.
  3. Negated types seem like they will either never come or come in the far future. Waiting for those to fix a current issue is a bad decision and harmful in my opinion.

Somewhat of a workaround workaround is SequenceNotStr from the useful_types 3rd party library, but this also has some problems.

My new idea

So I have another idea to propose. This is just an idea I had and I want to hear your feedback!
We could introduce a new Special Form: Char

Char would simply represent a single character and just exists for type hinting, not as an actual class that replaces any functionality of strings.

Benefits

  1. It would solve the aforementioned problems with SequenceNotStr:
    str would be valid for the expected type Sequence[Char], but for Sequence[Sequence[Char]] str would not be valid. Obviously for places where just a string is expected str would be simpler and preferred over using Sequence[Char].
  2. It would bring type safety in other places:
    Functions that take or return a single character could be type hinted more specifically. For example the builtin function chr would have the return type Char and the builtin function ord would take in a Char | bytes | bytearray codepoint (instead of str | bytes | bytearray).
  3. Bonus: Combined with tuples, specific length strings could be verified:
    Technically a function could take any specifically wanted string length if converted to tuples like this:
    Length 2: tuple[Char, Char]
    Length 1 or more: tuple[Char, *tuple[Char, ...]]
    Length 2 or 6: tuple[Char, Char] | tuple[Char, Char, Char, Char, Char, Char]

(Honestly I hope in the future not just tuples but Sequences can be typed heterogenously, then the 3rd point would be possible more natively with Sequence, but that’s another topic)

I also think there are more places this could be used for, but these are 3 things I could come up with. Let me know if you find some other places! :smiley:

So what do you think? I’m curious for your feedback!

Edit: Fix phrasing and remove emoji.

I forgot to mention that a simple alias could then be made:

type StrSequence = Sequence[Sequence[Char]]

Like mentioned above, a str would not be valid here because Sequence[Char] behaves similar to C# where string is not recursively defined but by multiple chars.

Do I understand it correctly?

assert not isinstance("ABC", Char)
assert isinstance("ABC", Sequence[Char])
assert isinstance("A", Char)

If yes, there is a problem that was shown in previous discussions. Single character type fails to distinguish these two cases:

assert isinstance("ABC", Sequence[Char])
assert isinstance(["A", "B", "C"], Sequence[Char])

IMO SequenceNotStr is the best option of those three mentioned.

Yes the way you describe it is correct if we use isinstance here to describe the type checker.

Why does it fail to distinguish those?

I also like and would be happy with SequenceNotStr, this is just another possible solution I proposed with more additional benefits (assuming it works).

We want to treat "ABC" as one element (not three) and ["A", "B", "C"] as three elements.

So we have <string of three chars> vs <sequence of three strings of length 1>

But in Python:

  • string == sequence of chars
  • char == string of length1

If you substitute that, you’ll get two equal expressions.

I feel like any runtime usages of Char should be a runtime error. It should remain a typing only construct given how it’s not a real runtime type.

That’s quite an important difference, of course. But then, what problem would it solve?

In my original post I quite literally said that it is typing only and I also explained 3 possible use cases.

I just wanted to point out the runtime aspect but overall, I’m not sold on the idea of Char even as a typing only construct.

I’m sorry if any kind of misunderstanding happened from my side. I remember how SequenceNotStr - that you mentioned in the title - was discussed here and that one was usable at runtime and that is a huge advantage IMO. Anyway, I will not take a part in a typing-only discussion.