Writing a Binary Search Tree algorithm: proper __str__(self) method needed

I’m learning algorithm design.

Some time ago I completed a binary search algorithm.

I am now exploring Binary Search Trees.

I have a Python code snippet. After spending a considerable amount of time chasing many cryptic but trivial TypeErrors and NameErrors, passing in test data seems to work. The script runs. When I run it using my debugger, as I step through line by line with each pass, all the lower integer values are inserted to the left nodes and higher number integer nodes are inserted to the right. So it works.

The only detail as far as I can tell that remains outstanding is getting a valid string representation to work for more helpful output when the final test print statement is executed. My expected output is: ((18 21 ) 47 (52 76 (82) )) but my Python interpreter is returning the object’s hash location in memory. This is my main question for all of you. What would you people recommend I try instead to get the test print statement to show something more useful?

Any other comments / feedback / tips from the community here to help optimize and improve my script are welcome.

Thanks!

class Node:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None
    
class BST:
    def __init__(self):
        # newNode = Node(value)
        # self.root = newNode
        self.root = None

    def insert(self, value):
        newNode = Node(value)
        if self.root == None:
            self.root = newNode
            return self
        temp = self.root
        while True:
            if newNode.value == temp.value:
                return None
            if newNode.value < temp.value:
                if temp.left == None:
                    temp.left = newNode
                    return self
                temp = temp.left
            else:
                if temp.right == None:
                    temp.right = newNode
                    return self
                temp = temp.right
        
        def __str__(self):
            return self._str_recursive(self.root)

        def _str_recursive(self, node):
            if node is None:
                return "Empty Tree"
            return f"({self._str_recursive(node.left)} {node.value} {self._str_recursive(node.right)})"

mytree = BST()
mytree.insert(47)
mytree.insert(21)
mytree.insert(76)
mytree.insert(18)
mytree.insert(52)
mytree.insert(82)
print(mytree)

This is simply a typo; you have the __str__ and _str_recursive methods indented inside of insert, instead of putting them directly in the class. After fixing that I see no issue:

>>> print(mytree)
(((Empty Tree 18 Empty Tree) 21 Empty Tree) 47 ((Empty Tree 52 Empty Tree) 76 (Empty Tree 82 Empty Tree)))

(If you don’t want to see Empty Tree for all the leaf nodes, then don’t return it in the recursion; use an empty string instead. If you want something different for the special case where the entire tree is indeed empty, then put that special case outside the recursion - i.e., directly in __str__.)

As an aside, I recommend not making the insert method return self, for the same reasons that e.g. the built-in list’s insert method does not.

Try with this data:

mytree = BST()
mytree.insert(18)
mytree.insert(21)
mytree.insert(47)
mytree.insert(52)
mytree.insert(76)
mytree.insert(82)

Notice that the tree is 6 level deep. A search is worst case O(n) not the optimum O(log2(n)) that would be desirable for a binary tree.

% py3 a.py
(Empty Tree 18 (Empty Tree 21 (Empty Tree 47 (Empty Tree 52 (Empty Tree 76 (Empty Tree 82 Empty Tree))))))

You mission should you accept it is to change the code to balance the tree.

My apologies for the multi month gap in my response. I got distracted. I am back to close the loop.

Here is my latest code snippet:

class Node:
   def __init__(self, value):
       self.value = value
       self.left = None
       self.right = None
  
class BST:
   def __init__(self):
       # newNode = Node(value)
       # self.root = newNode
       self.root = None


   def insert(self, value):
       newNode = Node(value)
       if self.root == None:
           self.root = newNode
           return None
       temp = self.root
       while True:
           if newNode.value == temp.value:
               return None
           if newNode.value < temp.value:
               if temp.left == None:
                   temp.left = newNode
                   return None
               temp = temp.left
           else:
               if temp.right == None:
                   temp.right = newNode
                   return None
               temp = temp.right


      
   def __str__(self):
       return self._str_recursive(self.root)


   def _str_recursive(self, node):
       if node is None:
           return ""
       return f"({self._str_recursive(node.left)} {node.value} {self._str_recursive(node.right)})"


mytree = BST()
mytree.insert(47)
mytree.insert(21)
mytree.insert(76)
mytree.insert(18)
mytree.insert(52)
mytree.insert(82)
print(mytree)

Here is the output:

python script1.py
((( 18 ) 21 ) 47 (( 52 ) 76 ( 82 )))

I corrected the indentation and returned an empty string at that location, as you can see above. It works!

I am not quite sure I understand fully. I took a look at your SO question and it is quite long. The relevant segments as far as I can tell say this:

There were three instances inside my insert method where I was previously returning self. I changed them to None. So it now aligns were more Pythonic best practices. My script still works. The output is the same. But I am not sure why, even after considering those key points from SO.

I understand the task and the use case for when a BST is biased as you describe - - when nodes are positioned sequentially all to one side. It ends up becoming very inefficient and turns into a regular binary search. I get that. As I read your comment I couldn’t figure out where to start. I was totally blank.

For the Udemy course I am taking which challenged me to write my BST in the first place, the instructor provides pseudo code. That is fair game. So I went ahead and asked ChatGPT to write me some pseudo code to balance a biased BST but instead of pseudo code it gave me a complete algorithm in Python (even though I really wanted pure English syntax). I immediately looked away for the sake of preserving my learning experience. This keeps it a mystery for me to discover still. The title ChatGPT gave the algorithm was: “AVL tree”. When I searched Google for that lead I got a very high level detailed theoretical analysis on Wikipedia for the “AVL tree” which includes the actual pseudo code I wanted to learn from and then some!

While I am curious to learn more, I don’t know how far I will be able to get.

It’s quite a challenge - - I am not sure how realistic it would be for any student learning basic algorithms to arrive at the same or equally effective solution as professors and mathematicians, “Adelson-Velsky and Landis” originally did 70 years ago. While it might not be a fair expectation for a newbie like me to change my code to balance the tree by “reinventing the wheel” on my own, now that I have the helpful Wikipedia entry linked to above, I think I have a much better chance at it.

I have a few other coding tasks I am working on. Maybe I will return here to this forum with a new thread if I choose to tackle the balancing of an unequal binary tree.

Thank you @kknechtel and @barry-scott for your support and insight!

It works because you don’t use the return value.

The point is to change the return value, so that you aren’t tempted to use it.

In other languages, people write code like that all the time. But in the Python culture, it’s seen as messy and easy to get wrong.

Yes, I learned this algorithm as a university undergrad. The theory is not too hard if it’s explained properly.

Another common way is to make a red-black tree. As I recall, GCC uses this to implement std::map (a dictionary where keys are kept sorted and you can efficiently ask for a range of keys) for the C++ standard library.

1 Like

It turns into a linear search O(n) not O(log2(n).

AVL tree alogorithm is amazing performance.
Its O(log2(n)) because the tree is re-balenced on each insert.

As @kknechtel observed red-black trees seem to have replaced AVL trees in modern software. AVL trees is from the 1962 and red-black is from 1978. The key difference seems to be that insert is faster with a red-black tree where as lookup is faster for an AVL tree.