Add binarytree to stdlib

akulakov · July 19, 2021, 6:27pm

I think it would be nice to add binarytree package or similar functionality to stdlib.

Binarytree is a Python library which lets you generate, visualize, inspect and manipulate binary trees.

from binarytree import Node

root = Node(1)
root.left = Node(2)
root.right = Node(3)
root.left.right = Node(4)

print(root)
#
#      __1
#     /   \
#    2     3
#     \
#      4

discoverability: I suspect many people try to print their own representation of binary tree when studying them / experimenting
one of the few basic building blocks of CS / programming
Python has heapq module and probably other modules that use binary trees.
Python is often used to study CS topics and this would be a convenient aid
low maintenance! (I can be the maintainer if needed).
small size
perhaps can be expanded to support display of simple graphs?

Thanks for reading! -andrei

sweeneyde · July 20, 2021, 5:34am

Are there any real-life problems where a generic binary tree like this would be best? We already have dicts and sets and the heapq module, and you can check out sortedcontainers on PyPI if you need fast data structures that maintain sort order.

I don’t think it’s worth including something in the standard library purely based on theoretical or CS-pedagogical reasons. We don’t need separate implementations of heapsort and quicksort and shellort and bubble sort, even though those are sometimes studied in CS classes. Instead, we have one method: list.sort(), and people can use it without caring how it’s implemented. The Zen of Python mentions that we should prefer having only one obvious way of doing things.

I see binary trees as a conceptual tool for implementing some data structures, but Python users shouldn’t have to care about the implementation details of the data structures they use.

I’m also worried that “binary tree” is a bit too general/vague to be useful: each application of binary trees (red-black, AVL, binary heaps, search trees, segment trees, etc.) has particular invariants and operations that it needs to maintain, and tying all of the implementations together makes it confusing what invariants actually hold. It also prevents problem-specific optimizations and design choices, like whether the tree can be stored flatly as an array or whether each node has a reference to its parent or whether a node knows its number/depth of subtree, etc. The only thing that all of these structures share is some notion of a node having left and right children, and that’s the easy part that can be re-implemented as needed.

encukou · July 20, 2021, 7:56am

Right; while it looks like a nice module, it’s not a good fit for the standard library.
There are also many reasons why you wouldn’t want that; for example:

Jupyter/Graphviz integration (adding these dependencies to Python’s test suite would be difficult, and we can’t support untested features)
On PyPI, you can release changes/improvements as often as you want