Hello,
I have some thought about the standard library’s graphlib
module.
The primary innovation that I’m proposing is a new format for representing graph-like structures.
- Nodes can be any hashable python object,
- Edges are specified by a function which takes the form:
edges(node) -> iterable-of-nodes
.
When edges are directed: the argument is the source and the return values are the destinations.
This API can be used on arbitrary data-structures, without ever needing to modify your data-structures to accommodate these algorithms!
This new API solves the following usability issues which all other graph libraries suffer from:
-
Users don’t store their data in explicit graph structures.
- Instead, they store their data in custom OOP class structures which are opaque to the std-lib.
- Therefore the user must construct the libraries “Graph” structure.
- Necessitating that they duplicate a lot of data.
- And if the underlying data structure ever changes, then they must either update or rebuild the Graph structure.
-
Other graph algorithms (not topological sorting) accept potentially infinite graphs, e.g. IDDFS, Dykstra’s shortest path, A* pathfinding.
Review of the state of the art:
The standard library’s graphlib.TopologicalSorter
uses a dictionary to specify the nodes and edges.
These 3rd party libraries provide a “Graph” class which users are required to use:
These “Graph” classes typically are not compatible between different 3rd party library implementations.
- Software for Complex Networks — NetworkX 2.6.2 documentation
- graphviz · PyPI
- graph-tools · PyPI
- graph-dfs · PyPI
The following libraries use compressed sparse matrices. Sparse matrices are a very efficient format. Numpy/scipy make it easy to convert to/from them and so provides good interoperability with the rest of the software ecosystem.
Algorithms:
I’ve ranked them by how useful I think they are:
- Preorder depth first traversal
- Postorder depth first traversal
- Topological sort
- A-Star pathfinding
- Strongly connected components
- Depth first search (returning the final stack / search path)
- Iterative Deepening depth first search (shortest path on infinite graphs)
- min-cut/max-flow
- Dykstra’s shortest path (technically a subset of A* but should have its own public API entry point for ppl who don’t know how to do A*)
Example:
I have a prototype of this on pypi: graph-algorithms · PyPI
import graph_algorithms
# Make a data structure to represent a person and their familial relations.
class Person:
def __init__(self, name, mom=None, dad=None):
self.name = name
self.mom = mom
self.dad = dad
self.children = []
if self.mom is not None: self.mom.children.append(self)
if self.dad is not None: self.dad.children.append(self)
def __repr__(self):
return self.name
# Make a family of people.
# Grandparents
James = Person("James")
Patricia = Person("Patricia")
Joseph = Person("Joseph")
Margaret = Person("Margaret")
# Parents
Michael = Person("Michael", mom=Patricia, dad=James)
Susan = Person("Susan", mom=Patricia, dad=James)
Richard = Person("Richard", mom=Margaret, dad=Joseph)
David = Person("David", mom=Margaret, dad=Joseph)
# Grandchildren
Sandra = Person("Sandra", mom=Susan, dad=Richard)
Ashley = Person("Ashley", mom=Susan, dad=Richard)
# Now let's traverse Susan's family tree:
print("")
print("Ancestors of Susan:")
known_parents = lambda person: filter(None, [person.mom, person.dad])
for person in graph_algorithms.depth_first_traversal(Susan, known_parents):
print(person)
print("")
print("Descendants of Susan:")
for person in graph_algorithms.depth_first_traversal(Susan, lambda p: p.children):
print(person)
print("")
Program Output:
Ancestors of Susan:
Susan
Patricia
James
Descendants of Susan:
Susan
Sandra
Ashley
The important thing to notice about this example is that the Person
class and the graph library’s depth_first_traversal
are totally decoupled.
- The
Person
class does not care about the graph library. - The depth first traversal does not care about the
Person
class. - They only interaction between the traversal-code and the Person-class is through the adjacency function (the lambda’s in the example).
As a result, the depth-first-traversal can be applied to different sets of edges by simply swapping out the adjacency function. In this example: one set of edges describes the parents, and the other set of edges describes the children.