collections.abc.NonStringSequence

I’d like to propose this new simple abstract base class.


The reason #1 is that strings are technically containers/sequences, but practically are simple types.

This nested sequence: [1, [2, 3]] contains 1, 2 and 3. That is what would we get after flattening it.

But does this nested sequence: ["one", [ "two", "three"]] contains “one”, “two” and “three”?

Yes, but only if you agree with the reason #1. Otherwise the aswer is: a nested sequence of: “o”, “n”, “e”, “t”, “w”, “o”, “t”, “h”, “r”, “e”, “e”.


The reason #2 it that they must be special-cased each time a nested data structure is processed.

The code below works for sequences, except when there are strings inside.

def flatten(data):
    if isinstance(data, Sequence):  # <-- a place for the NonStringSequence
        for item in data:
            flatten(item)
    else:
        print(f"got: {data!r}")

Strings are special, because they are not strings of characters. There is no character type in Python. This leads to an inifinite recursion:

>>> msg="string"
>>> type(msg) is type(msg[0]) is type(msg[0][0]) # ... etc ...
True

>>> s="X"
>>> s is s[0] is s[0][0] # ... etc ...
True

IMO almost every programmer was bitten by this. After learning it we often write:

if isinstance(data, (list, tuple))

I found that 100+ times in the standard library - but I DID NOT check if it was for the reasons discussed here.

2 Likes

Wouldn’t this work? Yes, it’s more work to write, but it’s clear what you’re trying to do.

if isinstance(data, Sequence) and not isinstance(data, str):
    ...  # Do stuff

I already do the same for booleans in some cases:

if isinstance(data, int) and not isinstance(data, bool):
    ...  # Do stuff
2 Likes

There has been quite a bit of discussion about something like this at Possible to distinguish between Sequence[str]/Iterable[str] and str? · Issue #256 · python/typing · GitHub. This would be quite useful for type checking, but it’s most likely a PEP-level problem.

1 Like

Doesn’t bytes suffer the same issue?

No, that returns ints (which aren’t iterable). So, you don’t get infinite recursion.

1 Like

I often support iterables that’re like a list or a tuple, but not a str.

So I’d definitely like a short way to exclude bytes, memoryview etc. too, as well as excluding the usual culprit, str from tests for Iterable (I suppose bytearray is a grey area. FWIW I’d include that as list like). I always end up using (list, tuple, set), which excludes arrays (and casting dict to their keys).

I completely understand why bytes and str etc. are Iterable subclasses for historical reasons. But strings are also Sized, Container, Reversible and Collection.

Surely it’s not too much to ask for an ABC for Iterable containers, that excludes str and bytes etc. ?

1 Like

How would it be used? I don’t think I’ve ever used an ABC for subclassing, and very rarely for isinstance style checks.

I use them for type annotations, but that’s a very different matter, as type annotations have all sorts of other implications (duck typing needs to be considered carefully, and what you really want is an Iterable protocol rather than an ABC - I don’t know how or if type checkers resolve this). Also, the typing community is actively working on “difference types” which are the general mechanism for expressing “this type but not these cases”. Adding a special case for “iterable but not str (and maybe some others that nobody really agrees on)” at this point seems like a temporary workaround until difference types get added.

2 Likes

Great! Glad to hear as ever the Python devs are way ahead of me. I had in mind type annotations (I’m not sure why the aliases for Iterable etc. in typing were deprecated in favour of collections.abc, but I now call them ABCs). I look forward to these difference types, in that case. Hopefully mypy can add support soon after.

Only in 2.x.

This conversation makes me wonder if the typing world should have a way to say not something.

Sort of like (pseudocode):

NotAString: Sequence not str = ['app', 'le']

Then maybe we could do:

isinstance(thing, Sequence not str)

Probably not great if there are multiple nots, etc. though was an interesting thought exercise.

1 Like

Prior discussion of the idea of a Not type was shot down because a Not type doesn’t make sense to be a type of its own, but I like your idea of making it a binary operator so that it must be expressed as a subtraction of a type from another type.

I guess we can overload the - operator to avoid having to introduce a new grammar for not to be a binary operator. And to make the idea backportable, we can introduce a new type called Difference so that Difference[Sequence, str] will do the same job as Sequence - str, in a manner similar to Union and |.

The resulting value of the subtraction is another type so I think it makes sense to allow another substraction from it.

This is a bit misleading. The discussion about type negation is still on-going, and relevant to several work in progress typing features. Since the linked comment “shooting it down”, It was also formally accepted as part of PEP 742

Formally, type NP should be narrowed to A∧R, the intersection of A and R, and type NN should be narrowed to A∧¬R, the intersection of A and the complement of R. In practice, the theoretic types for strict type guards cannot be expressed precisely in the Python type system. Type checkers should fall back on practical approximations of these types. As a rule of thumb, a type checker should use the same type narrowing logic – and get results that are consistent with – its handling of isinstance(). This guidance allows for changes and improvements if the type system is extended in the future.

type checkers already have a need to handle type negation in the negative case of TypeIs, it’s just not something which is user denotable at this point in time, and can only result as part of control flow at the moment

1 Like

Typing does not affect runtime and this proposal is about runtime. Strings are collections that cannot be decomposed to smaller pieces (non-collections). (If there is a better term than “non-decomposable”, please comment). The example program I gave can enter “endless” recursion (RuntimeError) if strings are not ruled out.

def flatten(data):
    if isinstance(data, Sequence):  # <-- a place for the NonStringSequence
        for item in data:
            flatten(item)
    else:
        print(f"got: {data!r}")

Yes, that is fine provided the str is and will be the only special sequence that is “not-decomposable”. We know bytes are fine.

1 Like

To answer this explicitly: It is a lot to ask for, as you cannot express this type with normal ABC semantics, it requires a custom metaclass because ABCs can’t exclude a type that otherwise fits.

Also, what would it mean to subclass this type? The same as subclassing Sequence?

I don’t think adding a class for this is the right approach. Instead, the typing community should make this type expressable and then the stdlib should provide a strict type guard for this common usecase somewhere.

2 Likes

Typing can affect runtime if the type implements __subclasshook__, which is the case for all collections.abc types as well as typing.Union.

from collections.abc import Sequence, Mapping

print(isinstance([], Sequence | Mapping)) # outputs True

The same can be done for the proposed typing.Difference.

Ah I wasn’t aware of this new proposal. A nice improvement over TypeGuard for sure, but as a type narrower I don’t think it can satisfy the OP’s runtime needs.

Probably not. Parameterized forms aren’t allowed as the second argument to isinstance. Unions were special cased for this, but arguably shouldn’t have been (it reduced consistency) since the tuple form for the second argument already existed.

Agreed, it won’t help with the runtime needs on it’s own, however, I do think that simply adding type information does prevent practical misuse.

The ability to do something like:

type NonStringSequence = Sequence[Any & ~str]

would allow the intent to be well documented, and for type checkers, and IDEs to both warn if you tried to create this with string elements.

As for runtime enforcement, I imagine that libraries like beartype would be able to help there if the type system gave a way to denote this, and if it were a popular enough special case, this particular specialization might get direct support in collections if it were composable with the rest of the ecosystem.


edit: For correctness’s sake, that example should likely be constrained to not string, and only have a default including Any non-string with something more like:

type NonstringSequence[T: ~str = Any & ~str] = Sequence[T]

and there’s also the other thing many people have wanted:

type NonStringSequenceOfStrings = Sequence[str] & ~str

but this is a small detail given that this isn’t directly about negation, just that negation could help in this case with documenting intent, having tooling support, and possibly leading to a special collection in the standard lib to support this specific case of unintended type overlap in collections.

Yes, but the fact is that Union is special-cased for issubclass/isinstance checks because the usage actually makes sense, so it only makes further sense for any similar set operations such as Intersection and Difference to be special-cased in the same way for consistency.

4 Likes

That’s completely fair. It’s just my opinion, and user feedback.

Good question regarding subclassing. Frankly, I don’t know! Perhaps class X(NotAStr, str): should raise an error.

I don’t think it will help with the type annotations, but for ABCs at least, if a naively simple incomplete implementation demonstrates anything, then it doesn’t seem to me like too much work to extend the existing ad-hoc override - .register, either by a new register_imposter method, or an imposter kwarg to `.register:

Lib/_py_abc.py

class ABCMeta(type):

    def __new__(mcls, name, bases, namespace, /, **kwargs):
        ...
        
        cls._abc_registry = WeakSet()

        # Proposed addtion.  Create registry to record known imposters.
        cls._abc_imposter_registry = WeakSet()


    # Proposed addition.  Register known imposters.
    def register_imposter(cls, subclass):
        # TODO:  Does registering an imposter necessitate invalidation of any other caches?
        ABCMeta._abc_invalidation_counter += 1  # Invalidate negative cache

        
        cls._abc_imposter_registry.add(subclass)


    def __instancecheck__(cls, instance):
        """Override for isinstance(instance, cls)."""
        # Inline the cache checking
        subclass = instance.__class__

        # Proposed addition.  Reject known imposters.
        # TODO: Have fruitful discussion about which registry takes priority.
        if subclass in cls._abc_imposter_cache:
            return False

        if subclass in cls._abc_cache:
            return True

        ....

    def __subclasscheck__(cls, subclass):
        """Override for issubclass(subclass, cls)."""
        if not isinstance(subclass, type):
            raise TypeError('issubclass() arg 1 must be a class')

        # Proposed addition.  Reject known imposters.
        if subclass in cls._abc_imposter_cache:
            return False

        # Check cache
        if subclass in cls._abc_cache:
            return True

        ...

I can’t see how this could cause problems elsewhere, but there’s a lot going on in that meta class, so could it?

Perhaps this just shouldn’t live in collections.abc? I don’t see why it wants to be an abstract base class.

1 Like