Speedup ABC issubclass

I have a working python implementation of this and wanted to solicit feedback on the idea to see if anyone sees some issue I am not accounting for prior to building the C implementation of this. Sorry this post is super long and apologize for any unclear language I am super dyslexic so if anything sounds confusing it is me not you and I will happily clarify.

Issues

ABC right now is a very slow metaclass. It has been a known issue awhile and has some very entertaining properties.

For instance the following fairly innocuous looking code generates a whopping ~20 GBs of wek refs, pointers, and function partials and takes over 1m and 30 seconds to run.

from abc import ABC, ABCMeta

abc_cls = set()
normal_cls = set()

for i in range(10000):
    abc_cls.add(ABCMeta("abc_"+str(i), (ABC, ), {}))
    normal_cls.add(type("normal_"+str(i), (object,), {}))

for c in normal_cls:
    issubclass(c, ABC)

For comparison two normal classes only takes .003 seconds to run.

While the above code is purposely designed to produce a worst case behavior any deeply nested ABC hierarchy will cause problems. The issue is algorithmically ABC recursively searches its subclasses and on any miss all the way through. This causes every single subclass to cache the result and generates a ton of objects needed to do so.

The solution is a two part change to the way registration and issubclass behave.

Registration

The proposed change to registration is to implement registration chasing. When register is called the abc class and all it parent classes will register the virtual subclass. This allows you to bypass the main reason for searching the subclasses recursively as registrations will bubble upward and it won’t be necessary.

Sudo implementation

def walk_registration(to_register, start_cls, end_cls):
    if start_cls is not end_cls:
        start_cls._register(to_register)
        for cls in (set(start_cls.__mro__) - set(end_cls.__mro__)):
            cls._register(to_register)
        end_cls._register(to_register)
    else:
        end_cls._register(to_register)

There are a few other complications around bases that are not of type ABC but none of them are terribly difficult to deal with. The biggest “challenge” is that you need to generate a new registration walker for each class with parents that have different registration walkers. It is completely doable from new without any issues though.

The second change to registration would be to split the registry into two groups one for classes with a metaclass that has a custom __subclasscheck__ and those who do not. This allows you to reduce the total number of tracked classes for those without custom issubclass by removing all subclasses from the registry on each registration call. This part isn’t 100% needed but it’s easy enough to do that its worth the performance improvement. This also lets you search the faster classes before searching the slower ones.

Issubclass

One small change to the current implied execution of issubclass would fully remove the need to check any subclasses recursively during execution. That would be to depreciate checking of subclasshook on subclasses of the called class.

class one(ABC):
    @classmethod
    def __subclasshook__(cls, C):
        print("called one")

class two(one):
    @classmethod
    def __subclasshook__(cls, C):
        print("called two")


issubclass(int, one)

For the above code the current behavior would print

called one
called two

Whereas with the changes it would print

called one

The current recursive behavior is easy enough to move into its own method that can raise a deprecation warning if there would be concerns about just removing it outright. I looked through the standard library and multiple other major external repos and was unable to find a single usage that relied upon this behavior. The only place I was even able to find that would invoke it was protocols and that appears to be due to other libraries and can be addressed and removed.

The code concerning it.

def _allow_reckless_class_checks(depth=3):
    """Allow instance and class checks for special stdlib modules.
    The abc and functools modules indiscriminately call isinstance() and
    issubclass() on the whole MRO of a user class, which may contain protocols.
    """
    return _caller(depth) in {'abc', 'functools', None}

It is easy enough to just stopgap with a depreciation period where the speed won’t be as fast it as possible but still dramatically faster then it currently is.

I haven’t found too many edge cases where the strategy of chasing registrations and restricting subclass recursion has produced different results than currently. There are a few bugs in my code still related to some very convoluted multiple inheritance with multiple metaclasses that cause some wonkiness but as far as I can tell those classes are already broken so its not actually introducing any new bugs.

In fact it would actually reduce some really subtle bugs that currently exist in the spec related to object lifetimes. Currently if a subclass gets virtual subclass registered to it that will prevent the parent from also holding that reference which causes the effect of that registration to be bound to the lifetime of the subclass. That means technically when the object goes out of scoop the parent class will have a broken definition. This would resolve that issue. I have never seen this occur in actual code but it is technically a bug. I haven’t the slightest idea what anyone would be writing that would run into it though.

Once I am done ironing these out I will update the post with a link to the code. So far it passed all the tests in the standard library but it isn’t a perfect match for all edge cases.

4 Likes

I’m looking forward to seeing your implementation, though I worry that deprecating edge case behavior is going to be a tough sell. I will definitely watch this topic!

1 Like