Class variables: individual or single library

strantor · April 21, 2021, 5:50pm

I have a rather large multi-threaded (using PyQt5’s threading provisions; Qrunnable, QThreadPool, etc.) Python script which is performing at least 10 repetitive/concurrent functions, running in perpetuity. For each of these threaded functions, I pass class variables from the PyQt GUI parent thread as arguments into the function, put the function into the queue/QThreadpool, and when the function runs it emits signals which I copy back into class variables. When the function ends, repeat.

The latest function that I added to the script is one where I can send an email to request the status of various things (the values of various class variables) and it will reply with whatever I asked for. There are hundreds of class variables, probably >1,000. Formatting this data for email is proving to be quite a chore. I have to deliberately add every single variable that I want to send. For example, let’s say I request the state of variables related to the “override” function by sending an email containing “OVERRIDE INFO”:

   if self.emailReceived:
       emailStatusData = {}
       if self.dataRequested == "OVERRIDE INFO":
           emailStatusData["systemsEnabled"] = self.systemsEnabled
           emailStatusData["waitingOnApproval"] = self.waitingOnApproval
           emailStatusData["overrideMode"] = self.overrideMode
           emailStatusData["overrideState"] = self.overrideState
           (etc. many more variables)
       elif self.dataRequested == "CLEANOUT INFO":
           emailStatusData["hopperEmpty"] = self.hopperEmpty
           emailStatusData["cleanoutRequired"] = self.cleanoutRequired
           emailStatusData["cleanoutPerformed"] = self.cleanoutPerformed
           emailStatusData["cleanoutRunning"] = self.cleanoutRunning
           (etc. many more variables)
       elif....
           (etc., many more options)
       self.sendReply(emailStatusData)

I would get the values of [systemsEnabled, waitingOnApproval, overrideMode, overrideState, et. al.].

As I expand the options for these email requests, I have to go scour through my code, collect all the variables I want to be part of the groups, and add them to the appropriate cluster of “emailStatusData[“thing”] = self.thing”. It’s a pain. And I’m adding new class variables all the time (this script is under perpetual development), and I have to remember every time a create I new one, to go add it to one of these clusters.

So to make things easier, what I’m thinking of doing, is replacing all of my class variables with a single (nested) dictionary, so:

self.systemsEnabled     becomes     self.systemVariables["overrideData"]["systemsEnabled"]
self.waitingOnApproval  becomes     self.systemVariables["overrideData"]["waitingOnApproval"]
self.overrideMode       becomes     self.systemVariables["overrideData"]["overrideMode"]
self.overrideState      becomes     self.systemVariables["overrideData"]["overrideState"]
(etc. many more variables)

self.hopperEmpty        becomes     self.systemVariables["cleanoutData"]["hopperEmpty"]
self.cleanoutRequired   becomes     self.systemVariables["cleanoutData"]["cleanoutRequired"]
self.cleanoutPerformed  becomes     self.systemVariables["cleanoutData"]["cleanoutPerformed"]
self.cleanoutRunning    becomes     self.systemVariables["cleanoutData"]["cleanoutRunning"]
(etc. many more variables)

I could write a quick script to automate the changeover, and then from there, my variables are all grouped logically, and sending the right data in the email reply becomes as easy as:

    if self.emailReceived:
        if self.dataRequested == "OVERRIDE INFO":
            self.sendReply(self.systemVariables["overrideData"])
        elif self.dataRequested == "CLEANOUT INFO":
            self.sendReply(self.systemVariables["cleanoutData"])
        elif....
            (etc., many more options, INCLUDING...)
        elif self.dataRequested == "ALL SYSTEM INFO":
        self.sendReply(self.systemVariables)

It seems like a fine idea to me, but what I don’t know (among most other things) is the performance characteristics of Python’s dictionary objects vs class variables. These class variables are being read by and written by multiple threads at once, some of them change once per day, some of them change thousands of times per second, etc.

By putting them all into a single dictionary would I be creating a bottleneck where only one variable can be written at a time?

(or is that how it is already?)

By putting them all into a single dictionary would I be creating a bottleneck where only one variable can read at a time?

(or is that how it is already?)

Would I be adding some delay each time a variable is accessed, as now it has to be looked up in the dictionary?

(or is that how it is already?)

I don’t really know what’s going on beneath all the layers of the Python onion, so I don’t know what to expect.

I don’t even know if these are the right questions to be asking.

What else is there to consider?

strantor · April 21, 2021, 11:26pm

I have been searching for information how Python handles variables/attributes in the background for some kind of baseline to compare against how it handles dictionaries and found it said somewhere on StachExchange that the way Python stores all of this information in the background is, actually, a dictionary. So with that simple explanation it seems it would not effect the speed or memory consumption of my application at all if I replaced my class attributes with a dictionary. It’s already a dictionary. But I was not totally convinced so I dug deeper and found this, which has the following to say:

In Python, all instance variables are stored as a regular dictionary. When working with attributes, you just changing a dictionary

As I said earlier, class attributes are owned by a class itself (i.e., by its definition). As it turns out, classes are using a dictionary too.

Dictionaries of classes are protected by mappingproxy . The proxy checks that all attribute names are strings, which helps to speed-up attribute lookups. As a downside, it makes dictionary read-only.

So there’s the “gotcha” I guess, the thing that made the StackExchange answer smell too good to be true. But “speed up” is not very descriptive. How much slower would it be without the mappingproxy? I still can’t find data about the time these things take (accessing dictionary kv pairs vs accessing class attributes). Is the mappingproxy-protected dictionary 10% faster to access? 1000% faster?

strantor · April 22, 2021, 12:19am

OK I wrote a little script to test the speed of class attributes vs dictionary entries:

    import time
    import random
    
    class classA:
    
        def __init__(self):
            self.doThing1()
            self.doThing2()
    
        def doThing1(self):
            # create 10,000 new class attributes and assign a value !=1000
            for i in range (0,10000):
                name = "attribute" + str(i)
                value = i + (i+1)
                if value == 1000:
                    value = 999
                setattr(self,name,value)
            #print(self.__dict__)
    
        def doThing2(self):
            for i in range (0,10000):
                name = "attribute" + str(random.randint(0,10000))
                self.__dict__[name] = 1000
            #print(self.__dict__)
    
    class classB:
    
        def __init__(self):
            self.attribute = {}
            self.doThing1()
            self.doThing2()
    
        def doThing1(self):
            # create 10,000 new class attributes and assign a value !=1000
            for i in range (0,10000):
                name = "attribute" + str(i)
                value = i + (i+1)
                if value == 1000:
                    value = 999
                self.attribute["name"] = value
            #print(self.__dict__)
    
        def doThing2(self):
            for i in range (0,10000):
                name = "attribute" + str(random.randint(0,10000))
                self.attribute[name] = 1000
            #print(self.__dict__)
    
    classAruntimes = []
    classBruntimes = []
    
    for i in range(0,10):
        start = time.time()
        for i in range (0,100):
            myStuff = classA()
        runtime = round(time.time()-start,4)
        classAruntimes.append(runtime)
        print("classA :",runtime)
    
        start = time.time()
        for i in range (0,100):
            myStuff = classB()
        runtime = round(time.time()-start,4)
        classBruntimes.append(runtime)
        print("classB :",runtime)
    
    print("class A runtimes:", classAruntimes)
    print("class B runtimes:", classBruntimes)

Here’s the result:

class A runtimes: [2.8236, 2.73,   2.7144, 2.7144, 2.6676, 2.652,  2.6676, 2.886,  2.7612, 2.73  ]
class B runtimes: [2.574,  2.6208, 2.496,  2.496,  2.4804, 2.5272, 2.652,  2.6364, 2.6364, 2.5428]

So using a dictionary is consistently faster than using class attributes?
Or is my test invalid for some reason?

BowlOfRed · April 22, 2021, 12:39am

Is that supposed to be self.attribute[name] = value?

steven.daprano · April 22, 2021, 11:03am

Hi Charles,

You quoted a blogger:

“”"
In Python, all instance variables are stored as a regular dictionary.
“”"

To be pedantic, that was true back in the Python 1.x days, but it hasn’t
been true since about version 2.2. Now it is only most of them which
are stored in a regular dictionary. The others are stored in slots,
which use less memory, but are less flexible. Most people don’t bother
to use slots, so I won’t talk further about them now.

By the way, the preferred terminology in Python circles is attribute,
not “variable”.

You then go on to ask:

“”"
Is the mappingproxy-protected dictionary 10% faster to access? 1000%
faster?
“”"

Let’s find out!

[steve ~]$ python3.9 -m timeit -s "class C: a=1" "C.a"
10000000 loops, best of 5: 22.4 nsec per loop

[steve ~]$ python3.9 -m timeit -s "class C: a=1" -s "c=C(); c.a = 2" "c.a"
10000000 loops, best of 5: 29 nsec per loop

So roughly 20-25% faster. How does that compare to a dictionary lookup?

[steve ~]$ python3.9 -m timeit -s "d = {'a': 1}" "d['a']"
10000000 loops, best of 5: 21.4 nsec per loop

You ought to prefer the timeit module for timing code over anything you
write yourself. It is carefully written to minimise any overhead and
gives you lots of options for controlling what is timed.

https://docs.python.org/3/library/timeit.html

strantor · April 22, 2021, 1:52pm

DOH! yes, thank you for catching that. I corrected it, and got similar results.

class A runtimes: [2.7144, 2.6832, 2.73,   2.6676, 2.6988, 2.6832, 2.7144, 2.7144, 2.7144, 2.6832]
class B runtimes: [2.5896, 2.574,  2.5896, 2.5896, 2.5896, 2.6052, 2.5896, 2.6208, 2.6364, 2.6052]

strantor · April 22, 2021, 7:14pm

So if I’m reading that correctly, it seems that the dictionary lookup was faster than the class attribute lookup. If that’s right, then it jives with the results of my first script, and with the results of the script I’m about to post, and does not jive with what was said by the blogger.

I fixed the flub in my first script and still got the same results. But I was not happy with that script, as it only tested class attributes against a single tier dictionary, and what I’m really interested in is a many-tier nested dictionary (so my attributes can be more organized). So I wrote this new one that has two new classes for nested dictionaries:

(sorry, I did not implement the timeit module yet, but I will going forward)

    import time
    import random

    class classA:

    def __init__(self):
        self.doThing1()
        self.doThing2()

    def doThing1(self):
        # create 100,000 new class attributes and assign a value of something other than 1,000
        for i in range (0,100000):
            name = "attribute" + str(i)
            value = random.randint(0,1000)
            if value == 1000:
                value = 999
            setattr(self,name,value)

    def doThing2(self):
        # 100,000 times, look up a random atribute and change its value to 1,000
        for i in range (0,100000):
            name = "attribute" + str(random.randint(0,10000))
            setattr(self,name,1000)


class classB:

    def __init__(self):
        self.attribute = {}
        self.doThing1()
        self.doThing2()

    def doThing1(self):
        # create a single one-level nested dictionary attribute with 100,000 entries and assign
        # a value of something other than 1,000 to each
        for i in range (0,100000):
            name = str(i)
            value = random.randint(0,1000)
            if value == 1000:
                value = 999
            self.attribute[name] = value

    def doThing2(self):
        # 100,000 times, look up a random key in the dict attribute and change its value to 1,000
        for i in range (0,100000):
            name = str(random.randint(0,10000))
            self.attribute[name] = 1000

class classC:

    def __init__(self):
        self.doThing1()
        self.doThing2()

    def doThing1(self):
        # create 100,000 new class attributes and assign a value of something other than 1,000 to each
        # (Simpler ways to do this (ex: classA), but did it this way to keep consistent with classD
        for i1 in range (0,10):
            for i2 in range(0, 10):
                for i3 in range(0, 10):
                    for i4 in range(0, 10):
                        for i5 in range(0, 10):
                            name = "attribute"+str(i1)+str(i2)+str(i3)+str(i4)+str(i5)
                            value = random.randint(0,1000)
                            if value == 1000:
                                value = 999
                            setattr(self,name,value)


    def doThing2(self):
        # 100,000 times, look up a random class attribute and change its value to 1,000
        for i1 in range (0,10):
            tier1 = str(random.randint(0,9))
            for i2 in range(0, 10):
                tier2 = str(random.randint(0, 9))
                for i3 in range(0, 10):
                    tier3 = str(random.randint(0, 9))
                    for i4 in range(0, 10):
                        tier4 = str(random.randint(0, 9))
                        for i5 in range(0, 10):
                            tier5 = str(random.randint(0, 9))
                            name = "attribute" + str(tier1)+str(tier2)+str(tier3)+str(tier4)+str(tier5)
                            setattr(self,name,1000)

class classD:

    def __init__(self):
        self.attribute = {}
        self.doThing1()
        self.doThing2()

    def doThing1(self):
        # create a single 5-level nested dictionary attribute with 100,000 entries and assign
        # a value of something other than 1,000 to each
        for i1 in range (0,10):
            self.attribute[str(i1)] = {}
            for i2 in range (0,10):
                self.attribute[str(i1)][str(i2)] = {}
                for i3 in range(0, 10):
                    self.attribute[str(i1)][str(i2)][str(i3)] = {}
                    for i4 in range(0, 10):
                        self.attribute[str(i1)][str(i2)][str(i3)][str(i4)] = {}
                        for i5 in range(0, 10):
                            value = random.randint(0,1000)
                            if value == 1000:
                                value = 999
                            self.attribute[str(i1)][str(i2)][str(i3)][str(i4)][str(i5)] = value

    def doThing2(self):
        # 100,000 times, look up a random key in the nested dict attribute and change its value to 1,000
        for i1 in range (0,10):
            tier1 = str(random.randint(0,9))
            for i2 in range(0, 10):
                tier2 = str(random.randint(0, 9))
                for i3 in range(0, 10):
                    tier3 = str(random.randint(0, 9))
                    for i4 in range(0, 10):
                        tier4 = str(random.randint(0, 9))
                        for i5 in range(0, 10):
                            tier5 = str(random.randint(0, 9))
                            self.attribute[tier1][tier2][tier3][tier4][tier5] = 1000

classAruntimes = []
classBruntimes = []
classCruntimes = []
classDruntimes = []
iters = 10
for iteration in range(0,iters):
    start = time.time()
    for i in range (0,10):
        myStuff = classA()
    runtime = round(time.time()-start,5)
    classAruntimes.append(runtime)

    start = time.time()
    for i in range (0,10):
        myStuff = classB()
    runtime = round(time.time()-start,5)
    classBruntimes.append(runtime)

    start = time.time()
    for i in range(0, 10):
        myStuff = classC()
    runtime = round(time.time() - start, 5)
    classCruntimes.append(runtime)

    start = time.time()
    for i in range(0, 10):
        myStuff = classD()
    runtime = round(time.time() - start, 5)
    classDruntimes.append(runtime)
    print("Iteration",(iteration+1),"of",iters,"complete.")

def avg(timeList):
    avg = 0
    for value in timeList:
        avg += value
    avg = avg/len(timeList)
    return round(avg,5)
print("class A runtimes:", classAruntimes, "average:",avg(classAruntimes))
print("class B runtimes:", classBruntimes, "average:",avg(classBruntimes))
print("class C runtimes:", classCruntimes, "average:",avg(classCruntimes))
print("class D runtimes:", classDruntimes, "average:",avg(classDruntimes))

here are the results:

class A runtimes: [4.56826, 5.18130, 5.09429, 4.61726, 5.35331, 4.69626, 4.44063, 4.44963, 4.40563, 4.89768] average: 4.77043
class B runtimes: [4.50026, 4.50126, 4.03561, 3.91660, 5.42831, 4.02703, 4.03323, 3.91722, 3.83122, 4.08623] average: 4.22770
class C runtimes: [6.81964, 6.81471, 6.66738, 6.66138, 6.83636, 6.48337, 6.56238, 6.46137, 6.43137, 6.61538] average: 6.63533
class D runtimes: [5.82065, 5.58732, 5.66032, 6.01334, 5.67632, 5.41231, 5.37531, 5.37131, 5.38068, 5.44031] average: 5.57379

Breaking down the data:

classA (100,000 class attributes) Vs. classB (single-level dictionary with 100,000 entries):
- It is 12.85% faster to use a single class attribute (single-level dictionary) to store 100,000 variables than it is to use 100,000 class attributes
classC (100,000 class attributes) Vs. classD (5-level dictionary with 100,000 entries):
- There is probably a simpler way to write classD but I didn’t find it.
- Because classD is a bit convoluted, I wrote a rev of classA in the same convoluted way as classD, and called it classC.
- classD was 19.05% faster than classC
classA (100,000 class attributes, simple) Vs. classC (100,000 class attributes, convoluted):
- The only point of classC is to isolate that portion of [classD’s speed increase over classB] which is due to the convoluted way I wrote it
- classA and classC are doing the exact same thing, but classC took 1.8649 seconds longer than classA
classB (single-level dictionary with 100,000 entries) Vs. classD (5-level dictionary with 100,000 entries):
- Since 1.8649 seconds can be attributed to convoluted code, classD’s “corrected” time would be 3.70889 seconds.
- classE (theoretical, 5-level dictionary with 100,000 entries, NOT convoluted) would probably be:
  - the winner of all of these classes.
  - 14.00% faster than classB (single-level dictionary with 100,000 entries)
  - 28.62% faster than classA (100,000 class attributes)

So, my conclusion is that contrary to my own suspicions and contrary to what at least one blogger, more knowledgeable than myself has said, a single dictionary class attribute is appreciably faster than multiple class attributes. And even more counterintuitively, the deeper you nest your data, the faster it is to access it.

-OR, (equally or more likely) I’m such a bad programmer that I can’t even write an effective script to answer a simple question.

Does anyone expressly agree or disagree with my conclusion or the way I arrived at it?