Let’s read through some of the highlights from the thread where the
TopologicalSorter was designed:
[my edits are in square brackets]
In chronological order:
I approve in general with the principle of including a topological sort algorithm in the standard library. However […]
“Topological sort” is a terrible name: the analogy with topological graph theory is (i) unlikely to be helpful to anyone; and (ii) not quite right. I know that the name is widely used in computing, but a name incorporating “linearize” or “linear order” or “total order” would be much clearer.
The proposed interface is not suitable for all cases! The function topsort takes a list of directed edges and returns a linear order on the vertices in those edges (if any linear order exists). But this means that if there are any isolated vertices (that is, vertices with no edges) in the dependency graph, then there is no way of passing those vertices to the function.
Eric V. Smith
This is why I prefer the API exposed by toposort · PyPI
[Note: Eric Smith is and always had been the sole author and maintainer of this package.]
[This package uses a dictionary to store the graph. Eric describes the edges as directed in the conventionally forward direction, and in the resulting list the edges point from right to left. In a literal way, this is the same as
graphlib describes both the edges and the result as reversed and the two reversals cancel each other out.]
Date: 2019-06-01 03:49
Unless Łukasz gives us a nod to work out this API for the second beta, we’ll need to defer this to 3.9. IMO, the API in the patch is not user friendly. It started on the right path but became contorted (for both inputs and outputs) to serve unusual cases.
Date: 2020-01-10 02:24
Let’s stir this up a bit I recently had occasion to exchange ideas with Larry Hastings about topsorts for use in a package manager he’s writing. I thought his API for adding edges was … perfect:
So, e.g., add(A, B, C) says A depends on B, and on C, but says nothing else about B and C. This is almost always the way topsorts show up in real life: you know what a thing depends on directly, but have scant idea how things may be in the opposite direction. For example, you know that baking a cake requires (among other things) flour, but have no real idea of the universe of other things that require flour. Likewise Larry knows which packages each package requires, but not the reverse. Etc.
Nodes with no edges are trivial to add then: add(A).
[Yay, we get a second API to compensate for the first one!]
The other big thing that came up is that most topsort programs were useless for his goal: downloading and installing packages takes significant wall clock time, and there’s huge opportunity for exploiting parallelism. But a flat sequence in topsort order gives no clue about what can be done in parallel. Instead [… At this point he proposes what went on to become the final API …]
[Topological Sort != Parallel Processing. Why can’t the standard library’s implementation of topological sort do just that?]
Date: 2020-04-02 18:48
At some point in the next two or three weeks, I’ll have a chance to work on this more and to offer a competing patch. IMO, the current checkin is over-engineered, both in its API and implementation. This could have been a simple, fast tool written as one or two short Python functions.
Also, I would like to try to out the API alternatives on some groups of engineers to get some user feedback.
For me, as the API currently stands, I would have to write a wrapper to make it usable for my applications.
Date: 2020-04-02 19:32
Raymond, what application do you have that wouldn’t be completely addressed by sticking to just .add() (to record dependencies) and .static_order() (to retrieve a linear order)?
Larry Hastings and I originally worked out the fancier bits of the interface to deal with problems he actually had, and for which no existing Python topsort implementation we could find was of any use: extract maximal parallelism. If you don’t want that, fine, stick to the two simple bits.
The bits to support parallelism are very easy to use to write correct parallelized code, but of course can seem baffling if you don’t give a rip about parallelism. But in that case you have no need to learn about them either.
If your alternative isn’t equally easy to use in a parallelized context, I’ll be at best +0.
Pablo Galindo Salgado
Date: 2020-04-02 20:20
I just want to echo what Tim mentioned with the extra data point that some of the maintainers of some popular and wide-used open-source libraries that indeed have to deal with this problem or the parallel version of the problem (like gaborbernat in this thread, the maintainer of “tox” and “virtualenv”) do indeed find the current API desirable.
Date: 2020-05-31 16:01
It’s great to have this feature in the standard library, but it really seems to clutter the functools documentation. Everything else in functools applies directly to functions and methods. Suddenly reading graph theory terminology was disorienting for me. (Due to context, I expected “node” to mean some new type of Python function.) It makes the documentation significantly longer, and IMHO abstruse. Would it be sensible to move this to a separate module?
[At this point the discussion moves onto what to name the new module, and they decide on
As you can clearly see: they did not set out to create a topological sort. They set out to solve their particular problem involving parallel processing. Along the way the just happened to need to create a topological sort and they decided that because their tool was useful (in the domain of parallel processing) that it should live somewhere in the standard library.
Did they come up with an unambiguously good API for graphs? No. Instead they started with a truly awful API (list of edges) and then tried to move to a better one (dictionaries) but got confused about which direction the edges should point and then tacked on the
add method to fix it for good.