Looking for a way to track all subclasses of a given class

Hi. First post, be gentle!

I’d like to be able to find all current subclasses of a given class. The obvious answer is __subclasses__, but that feature is not kept up to date in the face of class redefinition and scoping. For example:

class Foo:
    pass

class Bar(Foo):
    pass

class Bar:
    pass

At this point, Foo.__subclasses__() is still a list of length 1, containing Bar. Similarly:

def fn1():
    class Baz(Foo):
        pass

fn1()
fn1()
fn1()

Now Foo.__subclasses__() has length 3, with three copies of a local class Baz.

Presumably it’s too difficult to keep __subclasses__ up to date, otherwise I assume it would be done. But as things stand it changes only at garbage collection; if I append gc.collect() to either of the examples above, Foo.__subclasses__() then returns an empty list.

My question is, what might I do? Is there another mechanism I can use? Is it worth asking the Python Gods whether __subclasses__() can be made accurate? Should I redefine the function myself to call gc each time, just to be certain? Any ideas appreciated.

1 Like

From the reference documentation for __subclasses__:

Each class keeps a list of weak references to its immediate subclasses. This method returns a list of all those references still alive. The list is in definition order.

The give-away is “weak”. It is obviously meant to not be up-to-date, so I expect your wish to make it so will not be granted.

When you look at the list when it is up-to-date, there may be surprises. I altered your first example a little:

>>> class Foo():   
...     pass
... 
>>> class Bar(Foo):
...     a = "Old foo"
... 
>>> class Baz(Bar):
...     pass
... 
>>> class Bar():
...     pass
... 
>>> bazzie = Baz()
>>> bazzie.a
'Old foo'
>>> barrie = Bar()
>>> barrie.a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Bar' object has no attribute 'a'

The “old” Bar is not reachable through the name Bar any more, but it does still exist. It is referenced from Baz, so not GC-ed. Two Bar entries in __subclasses__, and rightfully so.

I would try to keep out of keeping track of subclasses, it is hardly ever worth while.

As said before, don’t bother.

No, when you use the entries in __subclasses__, be aware they may be dead and catch the TypeError.

Yes, I agree with much of what you say, but I don’t see why in your example you say “Two Bar entries in __subclasses__ , and rightfully so.” Why would there be two? Did you mean for the second definition of Bar also to inherit from Foo? (In that case I agree that there should be two!)

Also, you say: " when you use the entries in __subclasses__ , be aware they may be dead and catch the TypeError". This I don’t get—what TypeError? Certainly it would suffice to have a way to tell that a class is dead! How do I do that?

That is stupid of me.

Sorry about that. There are two objects, one with the reference from the name Bar and the other with a reference from the class Baz.

You want those references to the classes to do something to them. So you do something classy - I used instantiating as an example - and you will probably get a TypeError (check before use) as I got from trying to call it. The easiest and fastest way is trying the operation and if it fails with the expected error, it is probably dead. What is wrong with that?

There must be a way, otherwise GC would not work. This Stackoverflow question shows you how and indeed the accepted answer uses the gc module. Is try…except not much easier in this context?

Hm, are you sure? Here’s what I get in my tests:

$ python3.8
Python 3.8.2+ (heads/3.8:3e72de9e08, Apr 16 2020, 12:25:15) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class Foo:
...     pass
... 
>>> def f():
...     class Baz(Foo):
...             pass
... 
>>> f()
>>> f()
>>> f()
>>> Foo.__subclasses__()
[<class '__main__.f.<locals>.Baz'>, <class '__main__.f.<locals>.Baz'>, <class '__main__.f.<locals>.Baz'>]
>>> a = Foo.__subclasses__()[0]()
>>> a
<__main__.f.<locals>.Baz object at 0x7fd654f7acd0>
>>> import gc
>>> gc.collect()
6
>>> Foo.__subclasses__()
[<class '__main__.f.<locals>.Baz'>]

It seems that as long as the class objects are not collected, nothing prevents you from referencing them and instantiating them – or how would GC work? The reason why the objects are not collected immediately is that they are part of a cycle. As far as I understand, objects involved in a reference cycle are not seen as dead by the interpreter until the garbage collector is run. Efficiency is why it does not run all the time.

If you want to kill dead objects, call gc.collect() :wink: I don’t see a way around it.

Interesting,

I tried Foo.__subclasses__()[0]() after I recreated the Bar reference so it referred to another class and it failed with a TypeError. That is why I mentioned it. Either I messed up or there is a difference between the two. Guess the way forward is indeed to run GC just before the call to __subclasses__. My argument:

doesn’t cut it :frowning_face:

Thanks for correcting.