So it seems clear that there’s some substantive discomfort with IRV. I’m uncomfortable, Tal is uncomfortable, the best thing that Tim has to say about IRV is that it’s not plurality, and the most passionate arguments in favor of IRV seem to be from people who don’t care and just want to get things resolved.
It sounds to me like we don’t have consensus here, and the timeline for the vote is going to slip. I’ve therefore moved PEP 8001 back to “Draft” status.
I’m not happy about doing this – I totally hear the arguments that we want to get this done ASAP. But the frustrating reality of dealing with groups of humans is that getting to agreement always takes an unreasonable amount of time and effort, and there are no magical shortcuts. When you have a BDFL you have the option to sometimes skip that, but IME these kinds of frustrating discussions are an inevitable component of every other form of governance, so we might as well start practicing now.
I’m particularly concerned by all this rhetoric about how the deadline is fixed and everyone has to get in line. I want to be done soon too! But trying to steamroller other core devs like this, and acting like some core devs get to resolve disagreements like this by pure fiat, is a really unhealthy precedent. I feel like some of us are so concerned about making sure it looks like we can work together, make decisions, and hold a legitimate vote, that they’re undermining our ability to actually work together, make decisions, or hold a legitimate vote.
Let’s try another poll here, just to gather some information. For each of the following options, imagine that PEP 8001 ended up using it. Would you feel comfortable filling out your ballot, and would you feel comfortable that the final result would be legitimate? If the answer to both questions is “yes”, please tick the corresponding box:
[EDIT: I screwed up the poll, but discourse doesn’t let me delete or edit it. So I’ve marked it read-only, and the real poll is DOWN HERE]
Pure Condorcet, with ties or “pathological cycles” thrown to the PSF board to resolve
Schulze, with ties thrown to the PSF board to resolve
Approval voting, with ties thrown to the PSF board to resolve
1-2-3, with ties thrown to the PSF board to resolve
Note: I’m fine with all 5 choices, but the poll only allowed picking 4. So I dropped “Schulze”: if there’s a Condorcet winner, it’s exactly the same as “Pure Condorcet” (which isn’t actually a voting method - it’s a property a given method may or may not satisfy), All the complication in Schulze is to figure out happens when there’s not a Condorcet winner. And that’s a whole lot of complication.
Honestly, the “Pure Condorcet” and “Schulze” choices are the same thing: if “problem cases” are punted to the PSF Board, then the only part of Schulze that remains is seeing whether there is a Condorcet winner (which, if it exists, is necessarily unique). Which is presumably exactly the same as what “Pure Condorcet” does. Schulze without punting problem cases to the PSF Board would be a different choice.
“Pure condorcet”: If there’s a condorcet winner, pick that. If there isn’t – which means either a classic tie, or that we have a “Condorcet cycle” – then ask the PSF Board to resolve things. (Presumably by picking from the Smith set, if you want to get technical.)
“Schulze”: If there’s a Schulze winner, pick that. If there isn’t – which means a classic tie – then ask the PSF board to resolve things. (Presumably by picking one of the tied options.)
I’ve been thinking about all of this a bit, and I think we’re sort of being “bitten” a bit by the fact that I don’t think there is a a “bad” proposal to vote on here. People may have their preferences, but ultimately I think most people are going to be generally okay with any of the options for governance, and in that case they likely don’t care too much about how we pick it, because regardless of the outcome they’re OK. IOW, we could probably be proposing Random Ballot and most people generally would be fine with it.
This is ultimately why I think you have the silent majority not really voicing an opinion one way or another, because at some level the choice is academic with only a small ability to actually influence the outcome based on what voting system we choose (and little way to know a priori what those situations are going to be, which is why people are largely speaking in abstract).
Unfortunately, we have to pick some system, and although Random Ballot feels positively Monty Pythonesque, we probably need something with less randomness
It seems we have the following high level options:
Pro: The PEP is already written.
Pro: Has real world use.
Con: Has severely weird pathological cases.
Con: Cannot use the Helios voting system the PSF uses.
Some Condorcet method
Pro: Will elect the choice that is most preferential to the majority of people.
Pro: Has real world use.
Con: Has potential for an extremely unlikely type of “tie” that is specific to it.
Con: Cannot use the Helios voting system the PSF uses.
Pro: Will elect the choice that the most people are OK with.
Pro: Has real world use (the PSF itself uses it).
Pro: Can use the Helios system that the PSF uses.
Con: Does not allow people to express preferences other than approve / disapprove.
Pro: Will elect the choice that maximized the number of people who are OK with it (but may not pick the one the majority wants if there is a 2nd choice a super majority is OK with).
Pro: Allows people to express a full range of preferences, including “equal”.
Con: Little to no real world use.
Con: Cannot use the Helios voting system.
Pro: Will elect the choice that maximizes most people being OK with it.
Pro: Allows people to express preferences, including “equal” though at a coarse granularity.
Pro: Most OSS developers will be familiar with the idea of +1, +0, -1 voting.
Con: Little to no real world use.
Con: Cannot use the Helios voting system.
There’s also range voting, but it’s basically the same as approval, just with more granularity for people to express themselves (although it cannot use Helios).
I’ll be honest, I find IRV to be almost as undesirable as plurality the biggest praises I can really give it is that it’s not plurality and the PEP is already written. The only reason I’m not really fighting harder against IRV is I suspect that the choice is probably academic and doesn’t really matter much which system we choose.
That being said, I still think, if given the choice, we should strive to pick a better system because there is very little downside to doing so. I’m personally happy with any of the others that have been mentioned besides Borda. The rest of them are largely equal to me, with a variety of trade offs but I think they’re all about roughly equal terms of results, so it’s just down to preference.
First, thanks for starting a poll that fixes the issues with the initial one(s).
Look at it from the perspective of those of us who volunteered to try and find a solution at the dev sprints. People who chose to participate in that discussion did, we did our best to find a solution (and found one in the room when we discussed this), and then we are being told by a few people that it wasn’t good enough. Either side of this can feel “steamrolled” by the other by feeling like they are not being fully heard and ignored, but unfortunately we don’t even have a clear definition of consensus in this regard to really resolve this when it’s subjective in the end. We’re simply in a crappy situation with having to resolve this and there isn’t much we can do about it unless we can get everyone to participate and magically agree.
IOW everyone is frustrated in some regard by this situation, but everyone is doing there best to find an amicable solution.
Aside from words in a PEP, what actual investment do we have in IRV so far? For example, have we invested time/money/effort in designing IRV ballots? Time/money/effort in acquiring/developing IRV tallying software? Any sunk costs to implement it? Or is it still just aspirational?
I don’t know. That’s why I’m asking . This is an effort to address Donald’s:
That is, unless we’d have to throw away work to switch, what are the downsides to switching?
Personally, if I were tasked to tally the votes, and it switched from (say) IRV to STAR, or vice versa, at the last second, I’d shrug “fine - no problem”. They’re both easy to tally with brief, simple, clear Python code. Heck, for a vote of this size, the easy-to-tally methods could be done by hand.
It’s not like you (Brett) appear to want IRV. It seems more that you’re opposed to change because … it would be a change. So my question is: a change to what? Words in the PEP, or is there actual investment in IRV infrastructure that would be lost?
If there is no infrastructure already in place, switching would leave us exactly where we already are: at ground zero. If that’s the case, the relentlessness of the opposition becomes hard to understand. The relentlessness of the opposition to IRV is easy to understand: IRV sucks , which has nothing to do with any of us.
Ya, I can live with it - but why is that necessary? The winner-so-far (“Pure Condorcet”) in the latest mostly-ignored poll is also fine by me. There’s actually little pointless bikeshedding in these messages: most people with some knowledge of these things agree that plurality, IRV, and Borda are poor choices, and that just about anything else would be significantly better. Donald and I seem to be the only vocal fans of range/score methods, but we’ve both said any of the Condorcet methods would be fine too. There isn’t a rally around any particular alternative because they’re all seen as better than IRV.
Nathaniel’s poll is a pure instance of approval voting, which is also better
I think this is a pretty good example of why there is generally a moratorium on making decisions at in person gatherings like the language summit or the core developer sprint.
You make the statement “those who chose to participate in that discussion”, which comes across like the people raising concerns had the option to participate, but simply chose not to until after it was “decided”. If I had been there, I would have raised the same concerns then, however it simply wasn’t possible for me to go (I even missed the PyPA sprint this weekend for similar reasons).
I don’t think that’s what you’re trying to say here, but that’s how the people arguing for IRV are generally coming across to me personally. None of them seem to even be particularly arguing for IRV for any reason other than it wasn’t plurality it was already decided at the core developer sprint to use it.
This is a good point. As far as I am aware so far the only sunk cost we’ve got for IRV is the PEP, and if that’s all it is, I can have a replacement PEP within say 24 hours that proposes one of the better options.
After a good night’s sleep, reading through the discussion and thinking some more, here are my thoughts:
A crucial criterion IMO for the chosen system is to be simple and clear. This means not only “how does one vote” but also “how is the winner selected”. IRV and Schulze both fail in this regard from my point of view.
Another crucial criterion is not to be esoteric. 3-2-1 and STAR both seem good and simple, but are too new and obscure IMO for this vote.
Approval voting is simple and common but the “yes”/“no” choice seems too restrictive for this vote. Personally, there are some options I expect to be in favor of, some I’d be okay with, and some I’d be against.
Most here seem to agree that IRV is not a very good option.
The cost of switching from IRV is indeed very low.
I wouldn’t be fine with a “Random Ballot”, and I suspect several others wouldn’t. I’d accept a governance system chosen by the community even if it wasn’t my preference, but not one chosen arbitrarily.
The Schulze system seems to check all of the boxes except simplicity, but the complexity is only needed to resolve ties and cycles. Donald’s suggestion of using “Some Condorcet method” seems to stem from this: Ranked voting and choosing the Condorcet winner seems like exactly what we want; let’s simply avoid the complexity of dealing with ties and cycles. Since a tie/cycle is very unlikely, this is very reasonable!
Therefore, I’m voting for Pure Condorcet, with ties or “pathological cycles” thrown to the PSF board to resolve, and not any other option.
The thing I like most about IRV is that I can mark a second choice without hurting my first choice. This is a pro for IRV.
A lot of the other systems like Approval, STAR, and “3-2-1” put me in the position of having to penalize my first choice if I want to say what my next preference is. So the system limits me from expressing my preference. This is a con for these systems.
I think the Pros and Cons should acknowledge this for each system.
Also, earlier Nathaniel said, “the most passionate arguments in favor of IRV seem to be from people who don’t care and just want to get things resolved.” But that’s not true for me. I do feel strongly about this and have a lot of real-world experience in these issues, and I thought I said that before, but unfortunately I don’t have the time or emotional energy for the debate. (I’ve been in debates like these many times before, and without fail they never resolve.) However, I feel like the point I made a couple times before seems to have been forgotten and not recorded in what I saw above, so I’m restating it again.
I’m uncomfortable with “ties thrown to the PSF board to resolve”. What kind of expertise or competence would they have to resolve the tie? They don’t actively follow or participate in discussions. If we need a particular person or body to resolve ties, I’d rather have @guido if he accepts it. It’s quite common for seniority to be the tie-breaker.
My understanding is that Condorcet systems also don’t have the property I mentioned – that expressing later preferences can’t hurt your higher choice. Some call this the “later-no-harm criterion.” Wikipedia lists which systems have this property and which don’t: https://en.wikipedia.org/wiki/Later-no-harm_criterion
I guess I’m uncomfortable participating. It looks like it’s using approval voting to help decide the voting method – before we decided on the method we should use. Approval isn’t a method I favor, and I’m not sure how the results will be used or interpreted, or who will be participating. It seems like it’s being organized by the people that are against IRV, so it seems like the results are likely to be skewed.
The reason I didn’t mention that, is because there are two related constraints a voting system can satisfy:
Later no harm, which basically states that once you’ve ranked something, ranking additional items behind it cannot hurt the chances of the items you’ve already ranked.
No favorite betrayal, which basically states that ranking something higher, should not hurt the chances of it being elected.
Unfortunately it’s widely accepted that these two properties are mutually exclusive, so while IRV “passes” Later no harm, it fails No favorite betrayal. The other options all go one way or another on that, with the only really novel one being 3-2-1 which technically fails both later no harm and no favorite betrayal, but it appears the properties of it suggests that it’s unlikely it’ll fail either of them (but it could).
I didn’t call out because I consider these two properties to be largely the same in terms of impact. Both of them mean that, it’s possible, for an honest ranking to hurt the chances of your top choice, just through different mechanisms.
It appears that you think later-no-harm is an important property, but that you don’t really care about no-favorite-betrayal, and my question would be why? I don’t mean that snarkingly, I’m curious because as I mentioned above, they both mean that an honest ranking will hurt the chances of your top choice, so why is one mechanism for that to happen the end be all?
I had actually forgotten when I wrote the above that Condorcet also fails both of them, although it (well the shchulze method at least) passes another, similar thing which all the other systems (except maybe 3-2-1, I’m not sure on it) fail, which is Strategy Free, which basically says that if everyone votes honestly, than the choice the majority prefers will win.
Approval: Later-no-harm and no-favorite-betrayal doesn’t apply since there is no ranking to be done, however it suffers from a conceptually similar effect of later-no-harm where approving your 2nd or 3rd choice can help them win over your first choice. It passes a conceptually similar effect to no-favorite-betrayal where approving your first choice cannot hurt it.
3-2-1: Fails later-no-harm, fails no-favorite-betrayal, appears that as a benefit it makes it exceedingly unlikely either case actually happens.
I think it’s probably a mistake to focus too much on specific criterion here, because as a number of impossibility theorems have stated, it’s basically impossible to get them all. Looking at the general quality of the outcomes is probably a far better mechanism than trying to pick which (roughly equally important/bad) criterion we’re going to care about, and which we’re not.
Better I think to look at the actual results the variety of elections give.
One source of that is looking at how well different methods fair in voting simulations like those found at http://zesty.ca/voting/sim/. In every one of the simulations there, IRV’s position is basically “better than plurality, but worse than everything else”, with a special mention of Borda which has it’s own brand of strange results when it comes to split votes.
Another mechanism we can look at is Voter Satisfaction Efficiency (VSE) (sometimes Voter Satisfaction Index or social utility efficiency), which is basically a measure of how well does a particular system fair at giving the voters what they want, under certain conditions (100% honest votes, 100% strategic votes, and in between). In these systems generally you consider drawing names out of a hat to be 0%, and being able to read people’s minds and magically select the perfect candidate to be a 100%. You can get some information at https://electology.github.io/vse-sim/VSE/.
Which graphs the variety of options under the different scenarios (If you go to the website, and click the graph there is an interactive version).
The scores for IRV range from ~79% to ~91% depending on whether how honest people are being in their voting.
The scores for Ranked Pairs (the simplest way to determine a Condorcet winner) is 87%-98%.
The scores for Schulze are 80%-90%.
Approval is a bit hard to model as a single system, because the underlying question becomes, at what level of utility do you approve of a choice vs disapprove. The above graph has two models, one is where you approve any choice that has “above average” utility (aka IdealApproval) and another where you assume 60% of voters are going to bullet vote and select only their preferred choice, and 40% will vote as in Ideal Approval:
For Ideal Approval, the score range 84%-94%.
For 60% Bullet Approval, the score range 85%-95%
For score/range methods (aka rate choices 0toN, winner is highest average rating) no matter what N is, the range is 84% to roughly 97%, though the larger the N is, the slightly higher the top end becomes.
Star voting with a 0-10 rating has a range of 91%-98%.
3-2-1 has a range of 91% to 95%.
If we say that we expect people to only vote honestly, and are unlikely going to employ any sort of strategic voting, that gives us numbers like:
Ranked Pairs: 98.8%
Ideal Approval: 87.5%
60% Bullet Approval: 89.9%
Score/Range with a 0-10 rating: 96.8%
Star voting with a 0-10 rating: 98.3%
Neither of these types of methods of evaluating voting systems are slam dunk, but I think it’s generally a good thing to try to select an option which:
Has a high VSE, particularly when people are honest voting (since we expect most, and maybe all people to vote honestly).
Does not have a scenario with a low VSE in case people decide not to vote honestly.
Of those, Schulze/Ranked Pair gives the highest satisfaction when everyone is voting honestly, and it has no weirdness in the voting simulations, however it has potential for strategic voters to bring the overall VSE down.
Approval voting has roughly the best voting simulations, but it has the interesting property that people are generally happier with the election results when they’re not voting honestly, then when they are, and when voting honestly it’s generally worse than honest votes with IRV (although IRV is worse in the presence of tactical voting).
Star and Range voting have the second highest satisfaction (with Star voting basically being inline with RP/Schulze), but neither one has been graphed by Ka-Ping Yee and I don’t have similar graphs handy elsewhere. Range voting’s bottom end of the VSE is lower due to issues where a one-sided tactical voting can have an outsized impact on the election, whereas Star voting doesn’t suffer from that nearly as badly (Star voting’s improvement over range voting is specifically to eliminate that).
3-2-1 also isn’t graphed be Ka-Ping Yee, and I also don’t have similar graphs handy elsewhere. It has the interesting property that tactical voting has very minimal impact on how satisfied people are with the outcome (IOW, the grouping is tighter in the graphs), but it also has people happier when they are strategic voting rather than honest voting, and it’s VSE is generally lower than other options.
Given all of that… My personal opinions are:
IRV can be ruled out because it performs poorly in simulations and it’s “bottom” end of VSE is one of the lowest we’re looking at.
Approval voting can be ruled out because VSE numbers are less great than the other options, and generally having people happier when tactical voting vs honest.
As much as I like STAR/range voting, I’d say we can rule it out because we don’t have graphs available to show how it holds up in a variety of situations (though I believe it holds up well). It’s also possibly a harder sell to get people to rate their choices than rank them.
Similarly to STAR/range voting, I don’t have graphs for them and I’m not honestly sure how it performs. It also has the property that generally folks are more satisfied with tactical votes than honest votes (though the grouping is so tight it probably doesn’t matter) and overall people are just less satisfied with it than other options, so I think we can rule it out.
That leaves some method of Condorcet, which when everyone is voting honestly has the highest VSE, which makes some amount of sense since the Condorcet winner is the winner that everyone would pick in every two way match up. The difference between the Condorcet methods ultimately comes down to what happens when there isn’t a Condorcet winner.
Many proponents of instant-runoff voting (IRV) are attracted by the belief that if their first choice does not win, their vote will be given to their second choice; if their second choice does not win, their vote will be given to their third choice, etc. This sounds perfect, but it is not true for every voter with IRV. If someone voted for a strong candidate, and their 2nd and 3rd choices are eliminated before their first choice is eliminated, IRV gives their vote to their 4th choice candidate, not their 2nd choice. Condorcet voting takes all rankings into account simultaneously, but at the expense of violating the later-no-harm criterionand the later-no-help criterion. With IRV, indicating a second choice will never affect your first choice. With Condorcet voting, it is possible that indicating a second choice will cause your first choice to lose.
There are circumstances, as in the examples above, when both instant-runoff voting and the ‘first-past-the-post’ plurality system will fail to pick the Condorcet winner. In cases where there is a Condorcet Winner, and where IRV does not choose it, a majority would by definition prefer the Condorcet Winner to the IRV winner. Proponents of the Condorcet criterion see it as a principal issue in selecting an electoral system. They see the Condorcet criterion as a natural extension of majority rule. Condorcet methods tend to encourage the selection of centrist candidates who appeal to the median voter.