The thing I like most about IRV is that I can mark a second choice without hurting my first choice. This is a pro for IRV.
A lot of the other systems like Approval, STAR, and “3-2-1” put me in the position of having to penalize my first choice if I want to say what my next preference is. So the system limits me from expressing my preference. This is a con for these systems.
I think the Pros and Cons should acknowledge this for each system.
Also, earlier Nathaniel said, “the most passionate arguments in favor of IRV seem to be from people who don’t care and just want to get things resolved.” But that’s not true for me. I do feel strongly about this and have a lot of real-world experience in these issues, and I thought I said that before, but unfortunately I don’t have the time or emotional energy for the debate. (I’ve been in debates like these many times before, and without fail they never resolve.) However, I feel like the point I made a couple times before seems to have been forgotten and not recorded in what I saw above, so I’m restating it again.
I’m uncomfortable with “ties thrown to the PSF board to resolve”. What kind of expertise or competence would they have to resolve the tie? They don’t actively follow or participate in discussions. If we need a particular person or body to resolve ties, I’d rather have @guido if he accepts it. It’s quite common for seniority to be the tie-breaker.
My understanding is that Condorcet systems also don’t have the property I mentioned – that expressing later preferences can’t hurt your higher choice. Some call this the “later-no-harm criterion.” Wikipedia lists which systems have this property and which don’t: https://en.wikipedia.org/wiki/Later-no-harm_criterion
I guess I’m uncomfortable participating. It looks like it’s using approval voting to help decide the voting method – before we decided on the method we should use. Approval isn’t a method I favor, and I’m not sure how the results will be used or interpreted, or who will be participating. It seems like it’s being organized by the people that are against IRV, so it seems like the results are likely to be skewed.
The reason I didn’t mention that, is because there are two related constraints a voting system can satisfy:
Later no harm, which basically states that once you’ve ranked something, ranking additional items behind it cannot hurt the chances of the items you’ve already ranked.
No favorite betrayal, which basically states that ranking something higher, should not hurt the chances of it being elected.
Unfortunately it’s widely accepted that these two properties are mutually exclusive, so while IRV “passes” Later no harm, it fails No favorite betrayal. The other options all go one way or another on that, with the only really novel one being 3-2-1 which technically fails both later no harm and no favorite betrayal, but it appears the properties of it suggests that it’s unlikely it’ll fail either of them (but it could).
I didn’t call out because I consider these two properties to be largely the same in terms of impact. Both of them mean that, it’s possible, for an honest ranking to hurt the chances of your top choice, just through different mechanisms.
It appears that you think later-no-harm is an important property, but that you don’t really care about no-favorite-betrayal, and my question would be why? I don’t mean that snarkingly, I’m curious because as I mentioned above, they both mean that an honest ranking will hurt the chances of your top choice, so why is one mechanism for that to happen the end be all?
I had actually forgotten when I wrote the above that Condorcet also fails both of them, although it (well the shchulze method at least) passes another, similar thing which all the other systems (except maybe 3-2-1, I’m not sure on it) fail, which is Strategy Free, which basically says that if everyone votes honestly, than the choice the majority prefers will win.
Approval: Later-no-harm and no-favorite-betrayal doesn’t apply since there is no ranking to be done, however it suffers from a conceptually similar effect of later-no-harm where approving your 2nd or 3rd choice can help them win over your first choice. It passes a conceptually similar effect to no-favorite-betrayal where approving your first choice cannot hurt it.
3-2-1: Fails later-no-harm, fails no-favorite-betrayal, appears that as a benefit it makes it exceedingly unlikely either case actually happens.
I think it’s probably a mistake to focus too much on specific criterion here, because as a number of impossibility theorems have stated, it’s basically impossible to get them all. Looking at the general quality of the outcomes is probably a far better mechanism than trying to pick which (roughly equally important/bad) criterion we’re going to care about, and which we’re not.
Better I think to look at the actual results the variety of elections give.
One source of that is looking at how well different methods fair in voting simulations like those found at http://zesty.ca/voting/sim/. In every one of the simulations there, IRV’s position is basically “better than plurality, but worse than everything else”, with a special mention of Borda which has it’s own brand of strange results when it comes to split votes.
Another mechanism we can look at is Voter Satisfaction Efficiency (VSE) (sometimes Voter Satisfaction Index or social utility efficiency), which is basically a measure of how well does a particular system fair at giving the voters what they want, under certain conditions (100% honest votes, 100% strategic votes, and in between). In these systems generally you consider drawing names out of a hat to be 0%, and being able to read people’s minds and magically select the perfect candidate to be a 100%. You can get some information at https://electology.github.io/vse-sim/VSE/.
One interesting graph from electology would be:
Which graphs the variety of options under the different scenarios (If you go to the website, and click the graph there is an interactive version).
The scores for IRV range from ~79% to ~91% depending on whether how honest people are being in their voting.
The scores for Ranked Pairs (the simplest way to determine a Condorcet winner) is 87%-98%.
The scores for Schulze are 80%-90%.
Approval is a bit hard to model as a single system, because the underlying question becomes, at what level of utility do you approve of a choice vs disapprove. The above graph has two models, one is where you approve any choice that has “above average” utility (aka IdealApproval) and another where you assume 60% of voters are going to bullet vote and select only their preferred choice, and 40% will vote as in Ideal Approval:
For Ideal Approval, the score range 84%-94%.
For 60% Bullet Approval, the score range 85%-95%
For score/range methods (aka rate choices 0toN, winner is highest average rating) no matter what N is, the range is 84% to roughly 97%, though the larger the N is, the slightly higher the top end becomes.
Star voting with a 0-10 rating has a range of 91%-98%.
3-2-1 has a range of 91% to 95%.
If we say that we expect people to only vote honestly, and are unlikely going to employ any sort of strategic voting, that gives us numbers like:
Ranked Pairs: 98.8%
Ideal Approval: 87.5%
60% Bullet Approval: 89.9%
Score/Range with a 0-10 rating: 96.8%
Star voting with a 0-10 rating: 98.3%
Neither of these types of methods of evaluating voting systems are slam dunk, but I think it’s generally a good thing to try to select an option which:
Has a high VSE, particularly when people are honest voting (since we expect most, and maybe all people to vote honestly).
Does not have a scenario with a low VSE in case people decide not to vote honestly.
Of those, Schulze/Ranked Pair gives the highest satisfaction when everyone is voting honestly, and it has no weirdness in the voting simulations, however it has potential for strategic voters to bring the overall VSE down.
Approval voting has roughly the best voting simulations, but it has the interesting property that people are generally happier with the election results when they’re not voting honestly, then when they are, and when voting honestly it’s generally worse than honest votes with IRV (although IRV is worse in the presence of tactical voting).
Star and Range voting have the second highest satisfaction (with Star voting basically being inline with RP/Schulze), but neither one has been graphed by Ka-Ping Yee and I don’t have similar graphs handy elsewhere. Range voting’s bottom end of the VSE is lower due to issues where a one-sided tactical voting can have an outsized impact on the election, whereas Star voting doesn’t suffer from that nearly as badly (Star voting’s improvement over range voting is specifically to eliminate that).
3-2-1 also isn’t graphed be Ka-Ping Yee, and I also don’t have similar graphs handy elsewhere. It has the interesting property that tactical voting has very minimal impact on how satisfied people are with the outcome (IOW, the grouping is tighter in the graphs), but it also has people happier when they are strategic voting rather than honest voting, and it’s VSE is generally lower than other options.
Given all of that… My personal opinions are:
IRV can be ruled out because it performs poorly in simulations and it’s “bottom” end of VSE is one of the lowest we’re looking at.
Approval voting can be ruled out because VSE numbers are less great than the other options, and generally having people happier when tactical voting vs honest.
As much as I like STAR/range voting, I’d say we can rule it out because we don’t have graphs available to show how it holds up in a variety of situations (though I believe it holds up well). It’s also possibly a harder sell to get people to rate their choices than rank them.
Similarly to STAR/range voting, I don’t have graphs for them and I’m not honestly sure how it performs. It also has the property that generally folks are more satisfied with tactical votes than honest votes (though the grouping is so tight it probably doesn’t matter) and overall people are just less satisfied with it than other options, so I think we can rule it out.
That leaves some method of Condorcet, which when everyone is voting honestly has the highest VSE, which makes some amount of sense since the Condorcet winner is the winner that everyone would pick in every two way match up. The difference between the Condorcet methods ultimately comes down to what happens when there isn’t a Condorcet winner.
Many proponents of instant-runoff voting (IRV) are attracted by the belief that if their first choice does not win, their vote will be given to their second choice; if their second choice does not win, their vote will be given to their third choice, etc. This sounds perfect, but it is not true for every voter with IRV. If someone voted for a strong candidate, and their 2nd and 3rd choices are eliminated before their first choice is eliminated, IRV gives their vote to their 4th choice candidate, not their 2nd choice. Condorcet voting takes all rankings into account simultaneously, but at the expense of violating the later-no-harm criterionand the later-no-help criterion. With IRV, indicating a second choice will never affect your first choice. With Condorcet voting, it is possible that indicating a second choice will cause your first choice to lose.
There are circumstances, as in the examples above, when both instant-runoff voting and the ‘first-past-the-post’ plurality system will fail to pick the Condorcet winner. In cases where there is a Condorcet Winner, and where IRV does not choose it, a majority would by definition prefer the Condorcet Winner to the IRV winner. Proponents of the Condorcet criterion see it as a principal issue in selecting an electoral system. They see the Condorcet criterion as a natural extension of majority rule. Condorcet methods tend to encourage the selection of centrist candidates who appeal to the median voter.
Which we also had for this particular sprint. Unfortunately, there were no notes from this discussion to share (since our volunteer notetaker wasn’t interested in four hours of voting discussion) and the PEP took longer than hoped to be published.
Ideally, this discussion would have been ongoing during September, so that our plan of “those who really care can decide for those of us who don’t” was on schedule. Better late than never, though with the participants changing over time I don’t see how we can possibly get consensus out of this working group.
While it would have been nice to have this ongoing since September, PEP 8001 was posted on Oct 15th, immediately met with some people (myself and @njs) expressing concerns declared Active by fiat on Oct 22nd, more concerns were raised both on python-commiters and here and now it’s being strongly suggested that we have to conclude this discussion today.
The messaging, as far as I saw, around PEP 8001 was that Raymond was going to write up a proposal and then it would be discussed. That’s fine, but by simply stating that, it meant that nobody was likely to undertake that effort otherwise, since it was already generally understood that a proposal would be forthcoming. Then that proposal wasn’t posted until 2 weeks prior to the “deadline”, and once people started raising concerns they were largely met with “well all systems have problem, and this is what we already wrote the PEP to be”, and then since there wasn’t much time left to meet the “deadline”, started being “You need to come up with consensus within a short period of time, even though there doesn’t exist consensus on IRV or we’re going with IRV”.
Thus while the decision wasn’t technically made at those sprints, the effect is that it practically was, because:
Competing proposals were discouraged by stating that there was going to be a single PEP, and there’d be discussion around it.
Discussion happened in a place where only a few people could be.
The PEP was then not published until there was very little time to change it.
Concerns were dismissed as not being serious enough to risk change this close to the “deadline”.
So far I’ve only seen one person who is actually pro-IRV (@cjerdonek), everyone else seems to either be against IRV or don’t care, but don’t want to change what was already decided at the sprints, but it’s hard to tell for sure since it’s a bit reading intentions into people’s posts.
Chris, I’ve also been careful to say “most”, with you being the unnamed exception .
Later-No-Harm (LNH) is just one of dozens of formal properties a given voting system may or may not satisfy. As you noted, it’s impossible for any system that statisfies the Condorcet criterion to satisfy LNH too, or vice versa. I think it’s fair to say that most people find the Condorcet criterion (“if some candidate beats all other candidates one-on-one, they’re the winner”) far more compelling.
Indeed, IRV is one of the only systems in actual use that satisfies LNH. Perhaps the only? Random Ballot satisfies LNH too, but that’s not exactly a good argument for it
And it’s not true that adding preferences to an IRV ballot can’t “hurt” you in other ways. You can’t hurt your favorite, but you can hurt your lesser preferences’ chances of winning.
To me, the essence of your complaint is that if you’re truthful about your preferences in other systems, that may cause your favorite to lose under other systems. The fundamental problem is that I see that as a good thing: if, overall, society prefers some other candidate to my favorite, they probably should win. If, e.g., I slightly prefer A to B and am truthful about that, and B wins “because” I was truthful about that, no, they didn’t: B won because the voters as a whole, including but not limited to me, preferred B overall.
IRV intentionally blinds itself to the totality of information voters give, dribbling it in to the process one round at a time. This allows it to satisfy LNH (your lower preferences are invisible to it until your favorite is eliminated), but at quite a cost.
Just a nerdy gloss: part of the thinking is that if you contrive the rules to satisfy one of those, it’s likely it will fail the other one “badly”. Better to fail both “mildly” - minimize the worst-case harm across both, rather than eliminate all harm in just one case.
My expectation would be they would choose another voting system that didn’t result in a tie and have that make the decision. Making the ballots public and requiring they be fully ranked means there are alternative voting systems that could be used to calculate another winner easily (e.g. anything in that poll except Approval, but Borda could also be added to that list). And we can have the PEP explicitly state we are turning to the board to choose an alternative voting system that doesn’t lead to a tie based on the already cast ballots.
I would suggest we choose the backup voting system ourselves, but based on how this conversation has gone I don’t think that’s a necessarily easy thing to do.
I say give the poll until end-of-day today in hopes of at least pushing to double digit voter count and to see if there’s a bit more clarity in the top preference. Based on the outcome of that we can update the PEP tomorrow and have this all squared away so we can consider the PEP finished for November 1.
@malemburg please do make sure to vote in the poll in this thread (not sure how it would have shown up via email).
I had a look at the poll, but all entries say “with ties thrown to the
PSF board to resolve”. I would vote for “Approval voting”, but don’t
agree with getting the PSF board involved in core development or
having some other forced tie break mechanism.
As mentioned in my earlier reply, a tie means that we have to
resolve the issues with the competing PEPs and come up with
Since we can see who voted for what options, I think it would be fair to vote the closest options, and then comment on where you differ from the actual voted option. IOW if we can tweak an option to be more satisfactory, then that seems reasonable to me. I don’t think the specific procedure there needs set in stone rather than giving us some direction into how people are generally feeling about the variety of options.