Doing something evil: deleting a class (not instance) from memory

So I’ve started supporting a tool… its design is interesting, and it works, but it’s definitely not ideal.

At runtime, when a request arrives, it grabs an archive of Python source files from a repository, and uses exec to ‘run’ them in the current interpreter. These files define a number of classes, which are then registered via a parent class into a list, and the tool uses those classes to handle the request.

When the next request arrives, it empties the list (removing any references to those classes), and does the whole thing over again. Normally, this works fine, except when the content of those source files has changed in certain ways, like I did today. Namely, if one of the classes is renamed, or one of the classes is removed, it will still be in memory in the interpreter, and when the parent class cycles through all of the child classes it will be registered again. Thus, for certain types of content changes in the source files, the process must be ‘bounced’ in order to clear the old classes from the interpreter’s memory.

Now… classes are objects, and I can call del on any object I want in order to mark it for deletion, and I have references to these class objects. I’m fairly certain that if I did that the objects would not in fact be deleted, because there are other references to them in other places in the interpreter.

Is there a non-evil way to eliminate these class objects from the interpreter, without restarting it? If it’s of any help, it’s a Flask app, and if Flask has some way to influence the creation of new interpreter instances that might also be a solution.

1 Like

First a slight clarification: del does not mark an object for deletion, it merely unsets a variable/attribute, or deletes the entry from a collection object. If that is the only reference it’ll indeed be deleted afterwards. For class objects, they’ll be deleted like any other objects once no references to it exist, and no instances of that class exist. All the references in the interpreter itself are weakrefs, so they don’t count towards keeping it alive. (You may want to use that for your registry.)

What I reckon you should do is modify the registration code to say store an ID on the classes (incremented each request), and if it’s an old ID skip that class. Then whether or not the classes still exist won’t affect the correctness of the code. To debug what’s still holding references to the class, try using gc.get_referrers(), and look carefully through your code.

1 Like

I can’t immediately see how setting an ID would work, but I can see that maybe just setting a old boolean attribute in each class as the class list is being destroyed might work.

When the source files are run via exec() again, any classes in memory with the same names will be replaced by the newly-parsed versions, and the new ones won’t have the old boolean attribute. Any classes in memory that are no longer in the source files will have the old boolean attribute, and of course any classes from the source files that are entirely new won’t have it.

Any classes with old set to True would just be removed from the list after the source file processing is complete. They’d still be in the interpreter’s memory until the GC decides to get rid of them, but they would not be invoked while handling requests.

The official term for the first is unbind.

Assignment spam = 42 binds the object 42 to the name “spam”. Deletion del spam unbinds the object by deleting the name.

Agreed so far.

I don’t know what this means, but if my guess is correct, then it is wrong.

class C:
    pass

c = C()

C is a normal full reference to the class, not a weak reference. So is c.__class__. I don’t know what other “references in the interpreter” you are referring to.

That’s an … interesting … use of the word “interesting” :slight_smile:

What do you mean by “supporting”? Have you taken up maintenance and development of the tool, or do you just mean you are merely using it?

If you are just using it, you should ask the maintainer for their advice.

Also, you should tell us what this tool is called, not just describe it.

What an odd design. I wonder if there is a reason for it, or if it just evolved from code written by somebody who doesn’t understand importing modules.

Well no, merely removing the list doesn’t remove the references to the classes. This design sounds like there are at least four references to each class object:

  1. the class object created by execing the source, which I will call “C”;
  2. the reference to C in the list;
  3. the reference to C held by every instance of C.
  4. if there are any subclasses of C, each of them also hold references back to C.

There may be others.

I’m going to assume that the first was created in some sort of temporary or local namespace object, which is ready for garbage collection:

# Inside some function, ns is a local and registry is a global.
# When the function exits, ns is deleted.
ns = {}
exec(source, ns, ns)
registry.append(ns['C'])  # Save the class object.

But deleting the entries in the list, number 2 above, does nothing about the references held by instances. If any of those instances remain alive, you will get a build up of invisible classes that are almost inaccessible, wasting memory.

  • request 1 creates instances a, b, c referring to class C;
  • request 2 creates instances d, e, f referring to class C;
  • request 3 creates instances g, h, i referring to class C.

Even though the source code for C is the same, and the name is the same, each request creates a distinct class object. If even one single instance survives the request without being garbage collected, the classes will survive too.

Assuming a, d, g survived the request, then:

type(a) == type(d) == type(g)

will return False.

And that is why changes to the source code are not back-propagated to instances created in previous requests. If you make a change to the source code of C after request 2 but before request 3, only g will see that change, a and d will still refer to the pre-change version.

Live modification to Python source code is… not easy. Pushing those modifications back to instances that already exist is tricky. I’m not entirely sure that just changing instance.__class__ is sufficient, although it may be.

So in principle this should be easy, but in practice getting the book-keeping right can be prohibitive, leading to surprises. This is why the reload function was relegated from a builtin to the functools module in Python 3.0.

Non-evil? Not really.

There is no way to remove a class until all the references to it are gone (otherwise the interpreter could segfault). So you have to eliminate any and every reference to the class.

You could try using the gc module to search for objects which refer to that class object.

1 Like

What I was aluding to is type.__subclasses__() / tp_subclasses, which is a weak reference to the subclasses. There’d also be various internal data strctures in abc which also are weak. Perhaps others I forgot about. The important thing is that once your user code drops all references, the interpreter’s internals won’t keep the class alive.

[quote]
What I was aluding to is type.__subclasses__() / tp_subclasses

Ah that makes sense, thanks :slight_smile:

I am now the primary maintainer of it, although the original author is still involved as well.

It’s called "Watson’, but that won’t really be of any use to you since it’s an internal tool at $dayjob.

  1. yes
  2. yes
  3. there are no live instances around; they are created during handling of a request and then deleted, although of course that doesn’t immediately trigger the actual destruction of the instances.
  4. there are no subclasses

That would be a good thing, but it is not currently the case. I wasn’t aware that this was possible until I read your reply, so I’ll experiment with it as the isolation it provides would be beneficial. A concern would be that these classes are subclasses of a class that exists in the existing module namespace, so that will have to be available during exec.

The remainder of your post makes sense to me, but is not the goal here: the changes present in the newly-loaded versions of the classes do not need to be back-propagated into any existing instances. New instances will be created from the newly-loaded classes, and any previous instances (and their classes) still present in memory are just wasted memory.

As I said, the design is ‘interesting’, and probably not what I would have chosen had I been the one to create this tool, but at this point in the tool’s lifetime there is no value in redesigning it as it’s unlikely to be in use for more than 12-18 months from today. I’m focused today on solving short-term bugs and small improvements to support business process changes of small scope.

I’m going to experiment with marking all of the existing classes as ‘obsolete’ before a new cycle of class-loading is started, and then when the class list is iterated to create instances I’ll skip the ones that are marked ‘obsolete’.

Ah, I misinterpreted your comment about parent classes and subclasses to understand that there may have been subclasses of the classes loaded with exec.

But you also said:

“I’m fairly certain that if I did that the objects would not in fact be deleted, because there are other references to them in other places in the interpreter.”

If there are no subclasses, and no instances, what other places are you referring to? (Pun intended.)

That would be a good thing, but it is not currently the case. I wasn’t

aware that this was possible until I read your reply, so I’ll

experiment with it as the isolation it provides would be beneficial.

Presumably the source files have something like from mymodule import ParentClass before subclassing ParentClass, yes? That should continue to work.

Otherwise you can do this:


ns = {'ParentClass': ParentClass}

exec(source, ns, ns)

Then I must admit I don’t understand the problem. Earlier you said:

“Namely, if one of the classes is renamed, or one of the classes is removed, it will still be in memory in the interpreter, and when the parent class cycles through all of the child classes it will be registered again.”

Perhaps you can make it clear what you mean by cycling through the child classes? Do you mean ParentClass.__subclasses__()?

Sorry for the delay in responding, it was clear to me that posting a minimal example of how the existing code works would short-circuit much of this conversation, and I wasn’t able to do that until today. Please understand while reading this that I fully understand that the code is not particularly well-written and even this small example could be dramatically improved; for the moment I’m focused on the lowest-risk way to ensure that classes which have been previously loaded but are no longer needed can be ‘ignored’ during the registration process.

I’ve also left out nearly all of the code related to handling requests.

First, the base class and its ancillary functions:

import io
import requests
import zipfile


class RuleSet(object):
    rules = list()

    @classmethod
    def register(cls):
        if cls == RuleSet:
            return
        RuleSet.rules.append(cls)

    @classmethod
    def resetRules(cls):
        cls.rules = list()


def registerAll():
    for name in globals().keys():
        item = globals()[name]
        try:
            item.register()
        except AttributeError:
            pass


def loadFromURL(url):
    RuleSet.resetRules()

    result = requests.get(url)

    with zipfile.ZipFile(IO.BytesIO(result.content)) as archive:
        for item in archive.namelist():
            if not item.endswith(".py"):
                continue
            content = archive.open(item, "r").read().decode("utf-8")
            exec(content, globals())

    registerAll()


def applyRules(request):
    for rule in RuleSet.rules:
        obj = rule()
        if obj.match(request):
            obj.apply(request)
            break

Second, an example of a ‘rule’ plugin source file (note that there are no imports at all, it expects to be exec-ed with the interpreter’s existing set of globals available):

class ExampleRule(RuleSet):
    def match(self):
        return False

    def apply(self): 
        pass

Given this, you can see that if ExampleRule is present in the archive the first time that loadFromURL is invoked, but not present the second time it is invoked, ExampleRule will still be listed in the globals() dictionary and will be ‘registered’ again as an available rule. At that time there are no instances of ExampleRule (the only ones that ever existed were created inside the for-loop in applyRules and then left for the GC to cleanup when that loop exited), and there are no subclasses of ExampleRule either. The only remaining references to ExampleRule are the ones owned by the interpreter itself, and my original hypothesis (the ‘evil’ part) was that I’d need to somehow eliminate those references.

Instead, I’m now proposing that in resetRules I’ll iterate over the list of class objects and set an obsolete attribute on each one; later, when registerAll is invoked, if it sees that the object has an obsolete attribute it will skip that object. This still leaves the class object in memory, consuming memory, but that’s not a significant concern at the moment.

That does clear things up. First, there are no other references in the interpreter, the only ones are in the globals() dict itself, which isn’t easy to clean up. I’d recommend not using that at all. Instead for each request create a new empty dict, copy in references to the classes you want, then exec with that as the namespace. (Or potentially use collections.ChainMap if you want to just expose everything.) After you’re done you can simply delete that dict and all the new classes. Pass the dict into registerAll() as well. This also has the bonus that the executed scripts won’t be able to accidentally overwrite your functions or classes, breaking the rest of the code.

Thanks, a minimal example really helps!

There are some minor style things I would change, but the first change I would make is to change registerAll to check the object is a RulesSet subclass before calling register.

Why? Your registerAll spends a lot of time trying, and failing, to call a method which does not and can not exist on objects which don’t have it, and would do the completely wrong thing if it did exist!

You never know when some other change to your code will introduce a global variable that includes a register() method, and who knows what doing that will do?

import inspect

def is_registerable(obj):
    return (obj is not RulesSet 
            and inspect.isclass(obj) 
            and issubclass(obj, RulesSet)
            )

def registerAll():
    # We don't care what the *names* of the global variables are,
    # (the keys), only their value.
    for obj in globals().values():
        if is_registerable(obj):
            obj.register()

I expect that will be much safer, and possibly faster as well.

(Of course it assumes all your rules classes are subclasses of RulesSet.)

If that change doesn’t break your tests, time to move on to a bigger change:

class RuleSet(object):
    rules = list()

    @classmethod
    def resetRules(cls):
        # Remove existing rules.
        ns = globals()
        for obj in cls.rules:
            assert obj is not RulesSet
            name = obj.__name__
            if name in ns and ns[name] is obj:
                del ns[name]
        cls.rules = list()

I think that this will solve your problem.

Assuming that doesn’t break anything, the next step is avoid using the actual globals as the namespace for exec, but that (I think) has a higher likelihood of breaking, and if this tool is expected to be retired in a year it might not be worth the effort.

2 Likes

Thanks! Yes, all of the rules classes are subclasses of RulesSet.

As far as tests go, you won’t be surprised to learn that there isn’t much test coverage, so testing is a mostly manual process. Thankfully we have a staging environment I can use.

The help is very much appreciated, I think these changes will be the simplest way to solve the problem.

With some minor tweaks (due to the parts of the code you couldn’t see!) this is working perfectly.