collections.abc.NonStringSequence

JamesParrott · May 15, 2024, 11:35am

Slight side point, but the typing alii were deprecated to avoid duplication.

dg-pb · May 15, 2024, 11:51am

I am wandering if there is some sort of “ultimate vision” for the whole typing thing?

I mean, ideally, given the direction where this is all going, the culmination will theoretically be a flexible infrastructure which can describe any python object by nitpicking features.

Type is only one part of it. Although it is one of the more complex ones (especially logical DNF simplifications of python class trees), it is roughly only 20% of the complete infrastructure for such thing.

Eventually things like this should ideally be possible.:

a : ~str & HasMethod[__getitem__] & ValueOfItem[0, OneOf[None, 0]]
b : int & HasMethod[to_bytes] | str & HasMethod[method] | OneOf[None, Ellipsis, True]
hypot = lambda x: (x[0]**2 + x[1]**2)**0.5
c : CustomFuncLess[hypot, 2]

# Simplifications
SerialProtocol & HasMethod[dumps] == HasMethod['dumps'] 
SerialProtocol | HasMethod[dumps] == SerialProtocol
print(Number | (int & list)) == Number

Maybe there is something written on this? It could provide some ground for such discussions as this.

Nineteendo · May 15, 2024, 2:35pm

Without deferred evaluation, you would probably need to write this:

HasMethod['__getitem__']

kknechtel · May 15, 2024, 8:59pm

The goal, as I understood it, is not simply to use this in typing, but to be able to implement e.g. an isinstance runtime check for recursive algorithms.

It warrants being a type, so that it can work with isinstance. But you do have a valid question: where does this fit best? For the applications where I would use something like this, I wouldn’t necessarily be using type annotations for any static checking purpose.

MegaIng · May 15, 2024, 9:04pm

Yes, which is why I suggested the second part of that statement. In fact, that function could already be added (either in collections or in string I would think), it just couldn’t be typed probably right now. I think this is a common enough usecase that such a convenience function would be beneficial (although it will probably lead to discussions if bytes should also be excluded… I vote for yes).

If this function is too small of a feature to be added, what extra usability does it gain from being a class? As far as I can tell, the only possible use for this “class” at runtime is as an argument for isinstance.

pf_moore · May 15, 2024, 9:15pm

Whenever this has come up before, it’s been rejected because there aren’t enough realistic use cases apart from a flatten() function.

alicederyn · May 15, 2024, 9:31pm

I would like to be able to use it for type hinting. Passing a string to a Sequence[str] or Iterable[str] is usually a bug, ducks not withstanding.

alicederyn · May 16, 2024, 12:51am

It occurs to me this could just be a pattern type checkers start rejecting despite being technically correct, given it IS usually a bug. No need for a new type and counter-intuitive annotations then.

Daverball · May 16, 2024, 6:19am

Writing a mypy plugin for this should be fairly straightforward, it wouldn’t be able to cover assignments, but the more common case, i.e. function/method calls would be covered through the function/method hook. Although it may slow down things a bit if you don’t first build a cache of which callables have a parameter that needs to be checked using the corresponding signature hooks.

xitop · May 16, 2024, 6:32am

Sometimes an API acepts one item or several items. Would you count the test “single or sequence?” as a use-case?

Real-world examples:

if type(problems) not in (list, tuple):
    problems = [problems]

if not isinstance(weights, (list, tuple)):
    weights = list(weights)

Dutcho · May 17, 2024, 9:56pm

useful-types provides SequenceNotStr for typing, but isn’t runtime_checkable, so not up to OP’s objectives.

Gouvernathor · May 21, 2024, 12:31pm

The issue with this idea is that due in part to Liskov’s principle, an ABC you check for virtual inheritance is supposed to have an interface which you compare to the interface of the object you’re checking. But there is no method (or attribute or property) missing to str as compared to all the other sequences. That would make it unique across existing collection ABCs.

It took me a while to understand what you mean by that, but I got it. I would say “cannot be flattened to non-collection pieces”.
So, a type where trying to intuitively and recursively flatten it yields infinite recursion. Or said otherwise, a type such that an instance i of that type can iter to a singletond of itself (i,).

But that is not specific to str by nature. I could make a number type where if you iterate it, you get its decimal digits from left to right, as instances of that same number type. But I can’t see an implementation of your SequenceNotStr - or SequenceNotInfinitelyRecursive - that would detect that at runtime.
I think that would be a pretty big inconsistency in the proposed AC (since it’s not really a base class anyway).

alicederyn · May 21, 2024, 1:48pm

“Practicality beats purity.” Type systems can check that something is “a sequence but not a string.” So can runtime. Liskov substitutability isn’t a hard rule that trumps useful tools.

mikeshardmind · May 21, 2024, 2:36pm

technically speaking, str shouldn’t be considered a Sequence as-is, it has an incompatible definition for __contains__ both statically and at runtime. It’s not currently something the type system catches as an LSP violation (See NonStringSequence in useful types) and it would be reasonable to fix this by fixing that lack of detection. This would also end up matching the behavior some other type checkers picked where if you actually intend both, you should use str | Sequence[str]

With that said, this would be disruptive to fix all the places the type system isn’t enforcing LSP, or isn’t doing a “good enough” check. for instance, isinstance("haha", collections.abc.Sequence) will continue to return True, because this doesn’t check that __contains__ has a compatible type, only that it is provided. “fixing” this (and I use the term fixing very loosely here) would require what is currently a relatively cheap check to handle annotations and for those annotations to actually exist at runtime for comparison.

Gouvernathor · May 21, 2024, 2:46pm

The only problem I can see regarding __contains__, and which I didn’t think of before, is that it requires a creative interpretation of the following statement taken from the __contains__ definition in the datamodel documentation (emphasis mine) :

object.__contains__(self, item )
Called to implement membership test operators. Should return true if item is in self, false otherwise. For mapping objects…

Because ss in s can be true even in cases where ss in tuple(s) is false (when ss contains several characters which are a substring of s).
But that’s not that much of a problem since nowhere is there a strong link between contains tests and iteration : you can have a __contains__ while not being iterable at all, that’s the collections.abc.Container ABC.

But otherwise I don’t see a problem in str.__contains__. str is a Sequence[str], so str.__contains__ is exactly what it should be and consistent with any definition of Sequence. If there’s a LSP violation, I’m missing it.

mikeshardmind · May 21, 2024, 2:49pm

str.__contains__ only accepts str’s. This is known by the type system, and ~~NonStringSequence~~ useful_types.SequenceNotStr even uses this detail to work.

collections.abc.Sequence.__contains__ accepts object, so a method that only accepts str is not a safe replacement, ie. str.__contains__ is not compatible with collections.abc.Sequence.__contains__, and str should not be compatible with Sequence

Gouvernathor · May 21, 2024, 2:51pm

Ok, so I did miss it, you’re right, it is a contravariance violation.

>>> 1 in []
False
>>> 1 in ""
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'in <string>' requires string as left operand, not int

We’re going slightly off-topic, but has this ever been flagged as a bug in cpython ? It seems to me that it should…

alicederyn · May 21, 2024, 3:40pm

It’s maybe not the most user friendly error message but it seems legitimate to raise a TypeError there.

Gouvernathor · May 21, 2024, 3:55pm

You’re right. For the record what convinced me is the fact that [] < 1 raises a TypeError despite each of them supporting the < operator with other types - so it should not be a problem for the in operator to do the same.
Let’s close that side question.

matthewyu0311 · May 26, 2024, 9:06pm

Python already has the distinction between strings and “non-str/bytes/bytearray sequences” in the form of pattern-matching sequence patterns. https://docs.python.org/3.13/reference/compound_stmts.html#id21

The pattern matching spec in the language reference defines sequence as collections.abc.Sequence explicit and registered subtypes, plus some C types, minus str, bytes, bytesarray (though memoryview does match) and mappings. (Though mappings are Iterables not Sequences) :

match foo:
    case [*seq]:
        …
        # will not match a str / bytes / bytearray / mapping

Maybe it might solve the problem if we have a runtime-checkable ABC that corresponds to what’s accepted by the pattern matching machinery?