I have some thought about the standard library’s
The primary innovation that I’m proposing is a new format for representing graph-like structures.
- Nodes can be any hashable python object,
- Edges are specified by a function which takes the form:
edges(node) -> iterable-of-nodes.
When edges are directed: the argument is the source and the return values are the destinations.
This API can be used on arbitrary data-structures, without ever needing to modify your data-structures to accommodate these algorithms!
This new API solves the following usability issues which all other graph libraries suffer from:
Users don’t store their data in explicit graph structures.
- Instead, they store their data in custom OOP class structures which are opaque to the std-lib.
- Therefore the user must construct the libraries “Graph” structure.
- Necessitating that they duplicate a lot of data.
- And if the underlying data structure ever changes, then they must either update or rebuild the Graph structure.
Other graph algorithms (not topological sorting) accept potentially infinite graphs, e.g. IDDFS, Dykstra’s shortest path, A* pathfinding.
The standard library’s
graphlib.TopologicalSorter uses a dictionary to specify the nodes and edges.
These 3rd party libraries provide a “Graph” class which users are required to use:
These “Graph” classes typically are not compatible between different 3rd party library implementations.
- Software for Complex Networks — NetworkX 2.6.2 documentation
- graphviz · PyPI
- graph-tools · PyPI
- graph-dfs · PyPI
The following libraries use compressed sparse matrices. Sparse matrices are a very efficient format. Numpy/scipy make it easy to convert to/from them and so provides good interoperability with the rest of the software ecosystem.
I’ve ranked them by how useful I think they are:
- Preorder depth first traversal
- Postorder depth first traversal
- Topological sort
- A-Star pathfinding
- Strongly connected components
- Depth first search (returning the final stack / search path)
- Iterative Deepening depth first search (shortest path on infinite graphs)
- Dykstra’s shortest path (technically a subset of A* but should have its own public API entry point for ppl who don’t know how to do A*)
I have a prototype of this on pypi: graph-algorithms · PyPI
import graph_algorithms # Make a data structure to represent a person and their familial relations. class Person: def __init__(self, name, mom=None, dad=None): self.name = name self.mom = mom self.dad = dad self.children =  if self.mom is not None: self.mom.children.append(self) if self.dad is not None: self.dad.children.append(self) def __repr__(self): return self.name # Make a family of people. # Grandparents James = Person("James") Patricia = Person("Patricia") Joseph = Person("Joseph") Margaret = Person("Margaret") # Parents Michael = Person("Michael", mom=Patricia, dad=James) Susan = Person("Susan", mom=Patricia, dad=James) Richard = Person("Richard", mom=Margaret, dad=Joseph) David = Person("David", mom=Margaret, dad=Joseph) # Grandchildren Sandra = Person("Sandra", mom=Susan, dad=Richard) Ashley = Person("Ashley", mom=Susan, dad=Richard) # Now let's traverse Susan's family tree: print("") print("Ancestors of Susan:") known_parents = lambda person: filter(None, [person.mom, person.dad]) for person in graph_algorithms.depth_first_traversal(Susan, known_parents): print(person) print("") print("Descendants of Susan:") for person in graph_algorithms.depth_first_traversal(Susan, lambda p: p.children): print(person) print("")
Ancestors of Susan: Susan Patricia James Descendants of Susan: Susan Sandra Ashley
The important thing to notice about this example is that the
Person class and the graph library’s
depth_first_traversal are totally decoupled.
Personclass does not care about the graph library.
- The depth first traversal does not care about the
- They only interaction between the traversal-code and the Person-class is through the adjacency function (the lambda’s in the example).
As a result, the depth-first-traversal can be applied to different sets of edges by simply swapping out the adjacency function. In this example: one set of edges describes the parents, and the other set of edges describes the children.