Analyzing the 2025 SC election

Agreed, but isn’t that the whole point of any election ? You rank candidates and pick the top few as winners.

While it may sound brutal, being able to stand in the lime light and deal with public scrutiny is one of the requirements someone running for a leadership role like being an SC member or a PSF board member will have to fulfill – not so much for the voting public, but for the candidate him- or herself.

I think the main reason for not finding more SC candidates is that the perceived bar of entry is much higher than for the PSF board.

IMO, the only way to address this problem is by making SC work more transparent, so that potential candidates can see how they can make a difference and shape the future of the language.

9 Likes

Same. I can say first hand that he has done an excellent job and it was a pleasure to serve together with him.

12 Likes

This is much more interesting to me than detailed analysis of the election[1]. As I mentioned in another thread, getting to the heart of why we didn’t have more candidates, and fixing that for next year is crucially important, given that of the 6 candidates, 5 had already served 5 times and one 4 times (if I’m remembering my previous analysis correctly).

I think our goal should be something like 5 new and 10 total candidates for the 2027 term.


  1. I’d say that was “fascinating” but ultimately not particularly actionable ↩︎

7 Likes

Three had already served five times, one four times, one once, and one zero times: hugovk.github.io/python-steering-council/

6 Likes

Yeah, as far as I know, there are no controversies or hard feelings at play here either. My concern is more about the potential for something like this to prevent future candidates from running. Even if if nobody’s involved in controversy, and nobody resents the winners for their victory, requiring anyone running to be comfortable with high-profile members of the community computing, say, the results of hypothetical 1:1 matchups between every pair of candidates after-the-fact seems unnecessary at best (and harmful at worst).


Yeah, but I see a difference between doing it once because we have seats to fill, and doing it many more times after-the-fact because it’s fun/interesting/whatever.

I agree that SC members should be able to stand up to public scrutiny for all of the decisions that they make, past and future, especially when those decisions have real impact on Python or its community. But it just feels to me like this goes further… like saying “PR authors need to be okay with public review and possible rejection” and using that to justify a ranking of PR authors by the average number of issues found during review.


But mostly, I’ve said all I wanted to say. While this analysis makes me a bit uncomfortable, I don’t think it’s inherently “wrong” to do (or even off-topic for this forum). But I’d just hope we’re considering the second-order effects of our cool new expressive voting system.

3 Likes

Sorry to say I think you go too far there. There’s always tension between privacy and transparency, but in a public election the latter serves the greater good.

You’re talking about the “preference matrix”, and that’s vital info in any ordinal election method. In fact, it’s the only input to Condorcet voting methods (which are used, e.g., in many Linux-related elections) Hiding that would be like hiding vote totals in a plurality election :wink:.

STAR is a mix of cardinal (score) and ordinal (runoff) methods, and must disclose preference counts among those involved in runoffs. BetterVoting.com doesn’t (yet?) display the full preference matrix in its STAR election results, but the people who run that recognize its great value too:

I’m not insensitive to privacy concerns, I just judge that they don’t take precedence in this context. I don’t think anyone ran in more PSF-related elections than I did (13 times a candidate for the Board), so I’m acutely aware of what it “feels like” to be the subject of public speculation.

But it comes with the territory of running in a public election, and at least I try to keep my analyses free of value judgments about the candidates.

Perhaps ironically, the more voters are allowed to express about their beliefs, the more into is available for the public to pick through after the fact. Which people will do, like it or not. I do it transparently, in what I hope is a principled and highly informed way. With any luck, that will cut off uninformed speculations (increasingly the norm for public election post-mortem “analysis’” outside the PSF context).

4 Likes

There are, in general, no technical solutions to sociological problems, so election analysis isn’t going to change anything.

But STAR gives more clues, for those with eyes to see :wink:. There is in particular one non-technical issue on which all relevant PSF groups have been the very opposite of “transparent”. While I may be wrong, I think the distribution of 0-star ratings in this election is strongly correlated with how adamantly & visibly members of the 2024 SC refused substantive public engagement on that issue.

At the time, a majority of the 2024 SC publicly expressed their wish to be out of the business that issue arose in, but with 0 visible progress toward that end ever since. So one thing I do know is that at least one potential SC candidate didn’t run because they didn’t want to become part a system in which that’s accepted.

"When we avoid hard conversations, we’re not keeping the peace. We’re just keeping the tension.”

So, ya, to my eyes, the tension is still there, and STAR details reflect it.

That’s just my opinion, of course - not “analysis”. But I can’t think of any reason for why the star distribution could be what it is based on purely technical grounds. Something other than technical chops must be in play.

Which wouldn’t be the whole story either. Nothing is. But it’s part of the whole story.

3 Likes

Well, the candidates are all obviously technically competent, so the explanation is probably not technical incompetence indeed :wink: . It doesn’t mean that your intuition is right, though. I certainly didn’t vote on the basis of the particular affair you’re alluding to (I’m also not willing to divulge or explain my vote publicly…).

2 Likes

I didn’t vote primarily on that basis either, but I used all 6 ratings available, and that issue certainly informed my fine-grained preferences. So long as the SC remains in that business, how they handle it is part of their work product too.

1 Like

And a while back MAL reported that the EuroPython Society uses a Condorcet method (Schulze) for their board elections. The service they use always displays the full preference matrix in its results, because that is the voter input to that method. It’s only part of the input to STAR (which also asks for cardinal scores, not just preferences).

Several Condorcet methods also deliver top-notch results. The knock against them is the extraordinary complexity of their schemes for breaking ties. People are happier with schemes they can understand without a doctorate in graph theory :wink:

They may make a comeback, though. In 2021, Sara Wolk introduced a variant called “Ranked Robin”, with much simpler tie-breaking protocols. That’s pleasant enough for “just folks” that even the service we’re using now (whose DNA is averse to needless complexity) supports a version of that.

In my judgment it’s technically fine too, but on a human-factors basis I think people come to like STAR better: on Condorcet ballots, there’s no possibility to express to what degree you like one candidate more (or less) than another. Just unquantified preference order.

OTOH, if you have 100 candidates, a Condorcet ballot allows (or even requires, in some places) imposing a total preference order on all of them. STAR remains fixed at 6 levels regardless of how many candidates there are.

So if someone wants to push for Ranked Robin, that’s a fine choice to me too, but I won’t be the one to do it.

3 Likes

The key difference is that Ranked Robin usually avoids ties to begin with by picking whoever won the most head-to-head matches. That’s not necessarily “a Condorcet winner”. By definition, a Condorcet winner is one who wins all pairwise preference matches, and may not exist (rock-scissors-paper).

So while pure Condorcet gives up on our data after finding 3 Condorcet winners, Ranked Robin doesn’t care that there isn’t a Condorcet winner among the remaining 3. Those all won different numbers of head-to-head matches, and RR goes on to pick them from most to least total wins.

It’s nevertheless “a Condorcet method”. By definition again, “a Condorcet method” is one that guarantees to pick a Condorcet winner if one exists And RR does.

And - no surprise by now - RR again delivers the same order as Bloc STAR did.

1 Like

Sorry for this, but I don’t want to let a false claim stand. Ranked Robin does deliver a slightly different order. There is no universal agreement on how multi-winner Ranked Robin should work (it’s still relatively new), and I picked a method that, on third thought, wasn’t well grounded in the spirit of the method.

The first 3 picks were Condorcet winners, and all Condorcet methods must agree on those. The remaining 3 are in a Condorcet cycle, and all pure Condorcet methods stall then. A better implementation of Ranked Robin stalls too, because two of the three remaining candidates tied for most number of preference wins among the 3 candidates remaining. That tie is resolved internally by seeing which of the two had the larger magnitude of preference differences.

And Thomas won that outright.

But then it’s truly stuck. Each of the two remaining was preferred by 32 ballots - there’s nothing in their 2x2 preference matrix to distinguish them.

[[ 0, 32],
 [32,  0]]

Which I won’t pursue. The universe of gimmicks Condorcet methods use to break ties is large and highly technical. A top reason for why I like STAR much better.

1 Like

My favorite Condorcet method is “Ranked Pairs”, invented by Professor Nicolaus Tideman. It’s close to understandable :wink:, although requires some elementary graph theory.

The idea: if candidate i beats candidate j on preferences, i should appear before j in a linear order. So the “beats” relation can be built up a pair at a time, and added to a topological sorter (our graphlib.TopologicalSorter is ideal for this).

The twists that make it work:

  1. Pairs are added in decreasing order of winning margin. If. e.g., i is preferred to j on 40 ballots, and j preferred to i on 19 ballots, i’s winning margin is 40-19 = 21.

  2. But if adding (i, j) to the topsort would create a cycle, ignore it! Move on to the next winning pair.

So there are no cycles to deal with after the fact, a (at least one) total topological order always exists, and the strongest preferences that can be accounted for without creating a cycle are favored.

In simulations it does very well, at least as well as Schulze and - yes - as STAR. It’s also designed to create a total order, not just a winner.

Leaving names out of it, here’s how it does on our ballots. Leaving aside that the election method in use affects how people vote (and especially so “strategic” voters), so there’s no way to know for sure how the election would have turned out had we used Ranked Pairs:

43 3 2 - added to topsort
42 3 5 - added to topsort
38 3 1 - added to topsort
33 3 0 - added to topsort
32 0 2 - added to topsort
24 4 1 - added to topsort
21 4 2 - added to topsort
20 0 5 - added to topsort
17 4 5 - added to topsort
17 3 4   XXX margin is the same as last - added to topsort
13 0 1 - added to topsort
8 5 2 - added to topsort
5 4 0 - added to topsort
3 1 5 - added to topsort
ready (3,)
ready (4,)
ready (0,)
ready (1,)
ready (5,)
ready (2,)

Suffice it to say that it found a total linear order (and a unique one - the sets of nodes ready to report were always singletons - no choices), and in the same order Bloc STAR found them.

It’s emphatically not true that the election method doesn’t matter. It matters a whole lot, at least as much as who can vote. The lesson you “should be” taking from all this is that the PSF people who have worked on this over the years (including me) have weeded out all but the best election methods known. And that’s a huge part of why we get such similar results from all. Although, ya, the power of the “heart poll” remains A Mystery :wink:.

[EDIT: repaired error in calling reachable()]

Ranked Pairs code
print("Ranked pairs")
trips = []
for i, row in enumerate(pref):
    for j, aij in enumerate(row):
        diff = aij - pref[j][i]
        if diff > 0:
            trips.append((diff, i, j))
trips.sort(reverse=True)

def reachable(g, seen, start, target):
    if start in seen:
        return False
    seen.add(start)
    for succ in g[start]:
        if succ == target or reachable(g, seen, succ, target):
            return True
    return False

from graphlib import TopologicalSorter
ts = TopologicalSorter()
for i in range(len(pref)):
    ts.add(i)

g = defaultdict(set)
last = None
for t in trips:
    print(*t, end=" ")
    margin, i, j = t
    if margin == last:
        print("  XXX margin is the same as last", end=" ")
        # TODO: it's conceivable that the order in which we
        # add pairs with equal margins can affect when a
        # cycle is detected, and so affect which relations
        # are fed to the topsort. Think harder ;-)
    last = margin
    g[i].add(j)
    if reachable(g, set(), i, i):
        print("- forms a cycle! ignored")
        g[i].remove(j)
    else:
        print("- added to topsort")
        ts.add(j, i)

ts.prepare()
while ts:
    r = ts.get_ready()
    print("ready", r)
    ts.done(*r)
1 Like

You’ll like this one - it’s for fun and praises you! :smile:.

The thing I enjoyed most when analyzing the Board election was making MDS (“multidimensional scaling”) scatterplots, trying to show the relative distance between candidates in 2D space.

I enlisted ChatGPT-5’s help in trying to think up a similar metric for STAR elections. It had lots of ideas, but in the absence of controversy we agreed in advance that it was unlikely to turn up anything dramatic.

And it didn’t. Not even a little bit. Instead, the scallerplot looks a lot like a regular hexagon! Quoting my collaborator:

Notes on the plot:

  • The axes don’t “mean” anything. They just come with territory of visualizing in 2D.
  • The numbers on the axes don’t mean anything. They’re just artifacts of how the algorithm happened to scale things. If I knew more about matplotlib, there may well be a way to suppress them.
  • It’s not generally possible to preserve all relative distances between points when shrinking the number of dimensions. The method used here favors preserving relative distances between “far” pairs at the expense of larger distortions in “close” pairs.

The method:

  • For each candidate, their 74 ballot scores are put in a vector, and the mean of that vector is then subtracted from each entry (“centering”).

  • A 6x6 “dissimilarity matrix” is then constructed:

    D[i, j] = 1.0 - correlation(i’s centered vector, j’s centered vector)

  • So the matrix contains a float near 2.0 if the candidates’ ballot vectors are strongly negatively correlated, and near 0.0 if they’re perfectly correlated.

  • That matrix is then the input to a MDS algorithm.

Ask a chatbot to justify those steps with details - at least mine is astonishingly knowledgeable about the theory and the pragmatics. And flexible. For example, it advised at first throwing out paired 0 scores in corresponding positions of centered vectors, because “don’t care” just adds noise. But I explained that’s not what 0 means in STAR, and it replied that, in effect, ya, that’s a cute idea, but it’s “don’t care” in real life. I uploaded the initial analysis of the span and star distributions from the first post here, and that convinced it our results really are exceptional It later nagged me more than once not to throw out the zeroes :rofl:.

And now the plot will speak for itself:

1 Like

Despite what the MDS plot may appear to say, our candidates aren’t actually identical :wink:. It shows what’s being measured. The correlation measure is measuring only that, patterns of relative support, not absolute levels of support.

Another measure the chatbot suggested was mean absolute score difference. Between candidates i and j, what’s the average distance between their scores across all ballots? That presents a somewhat different picture. The bot suggested normalizing each measure to the range [0.0, 1.0], and then combining them via a weighted average (30% to the original correlation measure, 70% to the newer).

And it predicted some degree of “clustering” would show up then, but not dramatically. Right on all counts. There’s weak clustering among candidates 0, 2, and 5. Who, not at all coincidentally according to the bot, are our oldest and often reelected already candidates. They have “incumbent advantage” increasing their mean score difference, making them more similar to each other - & viewed by voters much the same way - in that specific respect.

1 Like

Would you care to reveal who are the candidates behind each candidate number?

1 Like

Names don’t really matter to the kinds of analyses I’m doing (this isn’t “oppo research” advocacy), and some people are uncomfortable with seeing them. If people care enough to do a bit of work, they’re 0-based indices into the list of names in the first row of the CSV ballot download.

1 Like

And one more, using signed mean score differences rather than their absolute values. This folds in not just magnitude of score differences, but also their direction.

It’s worth sharing because it paints a radically different picture. Candidate 3 is in a universe of their own. It’s not a coincidence that they were easily the overall Condorcet winner, and had the fewest 0-star ratings.

1 Like

I realize this can be confusing :frowning:. The short course is that I intentionally didn’t want to show some measure of “popularity” in the first two MDS plots. That was already revealed by the final winners. They’re intended to probe for subtler qualities, like possible entrenched polarization in the electorate.

The correlation measure alone couldn’t care less about who won most often, or by how much, just about how consistently ballots scored various candidates above and below each ballot’s average ratings. If the electorate were polarized, my bot explained it better than I would have:

Our nearly ideal hexagon showed that nothing like that was in play. Almost no “clumping”. And a dramatic visual representation of why “proportional representation” schemes wouldn’t matter to the outcomes.

Folding in the mean absolute score differences only intensifies that visual distinction, pushing factions’ favored candidates even farther apart - but still saying nothing about which candidates were more liked.

Keeping the sign of the score differences is a radical change (as demonstrated by eyeball in the plots). Now we’re showing too how much candidates were liked. If you squint just right, it’s a visual representation of “why” Bloc STAR picked the order it did. The more removed a candidate is from all the others, the more strongly they were preferred.

1 Like

To make this clearer, the MDS plots were never intended (except for the third) to reveal anything about the candidates. They’re analyzing the collective us: the electorate. What did our voting behaviors reveal about us?

As I quoted the bot at the start of this offshoot, it showed that as a voting community we’re exceptionally healthy: highly expressive and not splintered into warring factions. And that’s nothing about who “won” or “lost”. It’s about whether our ballots showed evidence of thought and diversity of preferences.

Absolutely helped out by that no candidates were “duds” (if any were, they would have been far removed from the rest even in the first correlation-only plot, due to being rated “below my average” on almost all ballots).

2 Likes