I’m moving into a project with an inheritance scheme
+----------------+
| Library Class |
+----------------+
^
|
+----------------+
| Our First Class|
+----------------+
^ ^ ^ ^ ^ ^
| | | | | |
+----+ +----+ +----+ +----+ +----+ +----+
| A | | B | | C | | D | | E | | F |
+----+ +----+ +----+ +----+ +----+ +----+
^ ^ ^ ^ ^ ^
| | | | | |
+-------+-------+-------+-------+-------+
|
+----------------+
| Final Class |
+----------------+
I both need to understand what is happening in the code, and I have an opportunity to refactor / clarify the structure of the code before it grows further in complexity.
Question 1:
Is there a way to gain insight into where attributes are defined, modified, and used?
With “where” I mean I want to know which classes defined the methods that accessed the attributes.
Question 2:
I would like to use composition instead of inheritance for some of what’s happening. ie replace some of the constructions of First < B < Final by the class Final instead having an attribute B.
But I can’t take that to its logical extreme. (Because I don’t control the library, among other things.)
I don’t have any specific question, but I’m open to advice on how to approach this.
I’d also like the code to provide more clarity in which attributes are mutating and which are (or can be treated as) frozen.
Question 3:
In order to provide better testability, I’d like a way to swap out the library with a dummy class with preset variables. Someone I respect said “if you need to monkeypatch to test your code, your architecture could be improved.”
I think I could inherit from a Protocol/ABC instead of the library class, and fuse the library class in at the last moment instead of the current construction. Is that a good idea?
I imagine that the number one thing I would want to change in this situation is to stop subclassing the library class. Subclassing creates strong coupling so I would avoid doing it across codebase boundaries. If you were using LibraryClass via composition then that would also solve your testing problem.
Is there some external reason that any of these things needs to subclass LibraryClass?
Are there more concrete classes inheriting from different combinations of these classes like FinalClass1, FinalClass2 etc or do all these classes just exist to define FinalClass as the everything class?
I would probably grep for attribute/method names and some other things but ultimately you just have to go read all the code to understand how these classes combine. One of the downsides of using a big inheritance graph like this is that you have to read through every one of those superclasses before you can understand any concrete class like FinalClass actually works. Worse is if there is also FinalClass2 etc because then you can’t understand/change any of the superclasses without first understanding all of the other subclasses as well.
A good IDE will be able to tell you this assuming the code has decent type annotations (not necessarily required). I know PyCharm can do this, I am pretty sure VSCode can as well. Note that depending on complexity of the code this might still be overwhelming and hard to understand.
That really depends on what the class is actually supposed to do and I can’t provide a good answer. Generally, don’t bend your code too much just to make it more testable - if an ABC actually makes sense, use it.
the library class loads various things in the init, then makes sure .setup() is called, which we’re supposed to use like a pseudo-init, and then it calls .algorithm() every 10 seconds.
That last part is the reason we (might) need to subclass. By it’s nature of running every 10 seconds .algorithm() owns the process, and so it needs access to all attributes and methods, and hence it needs to be at the top of the owner-property-relationship pyramid.
details might matter, so I include the code that does this below:
async def __autorun(self):
"""Autorun handler triggers all active contracts or timeslots every n seconds."""
self.logger.info("Starting autorun daemon.")
while True:
self.logger.info("Autorunning.")
try:
if self.pure_autorun:
await self.algorithm(trigger=None)
else:
...details...
except Exception as ex:
self.logger.error(f"Autorun failed: {ex}", exc_info=True)
await asyncio.sleep(self.config.get("autorun_seconds"))
I think it might be possible to copy this function over into my own class, and then to use the rest of the library class as an attribute. I can investigate.
Fortunately, there is only 1 FinalClass.
Edit: Would there be a way to specifically copy the .__autorun() method from the library class, so that if the library updates it in a useful way I get the benefits? Or would that be a dangerous code pattern?
OK, I found ‘Go to References’. Just knowing it existed helped.
But that only shows me a collection of all references, with initial assignment, value mutation and read-access all listed together.
I’d like to group those references by “here they assign to it” and “here they read its value”. And to go a step further and highlight the first assignment in a (debug) run.
For PyCharm, yes. You should be able to find a button with something like “Open in Find window” which then pops it into the bottom bar where it is then grouped by access type and/or file. Also check the docs for this, I am sure there are more features that can help you that I don’t know about.
I am less sure about a generic way to do this - you can go through and set a breakpoint at all such locations, but I am not sure if PyCharm has a generic option to do this. You can watch a variable, but only if the owning object already exists.
It sounds like the library has been designed in such a way that it requires subclassing to be able to be used. I would draw a line under it right there though and have one very small subclass that holds everything else using composition:
It is an underscore private method which indicates that the library doesn’t want you to touch it directly. Do with that what you will but if I was the library author I would expect to be able to make incompatible changes to that method all the way up to deleting it entirely.
I recently had this situation where one big class was divided into a graph of 20 abstract superclasses that were never used for anything else except making that one big class that. In the end I decided that the basic way to simplify the code was just to flatten the whole hierarchy into one class defined in one module. Now at least the code is all in one place and people don’t need to worry about which of the superclasses may or may not be subclassed by downstream code. If the different classes had been managing different state though then I would not have done this and would probably be thinking about trying to track the state management as you are.
Go to the top of your hierarchy and add __getattribute__ and __setattr__ methods. Stick some code in there to print out what is happening and add some checks like if attr == 'someattrname': then either log everything ot raise an exception there or use breakpoint() to enter a debugger etc. Now run the whole programme or run the test suite and look at the logs or tracebacks or drop into the debugger etc.