As a person with both a science and math background, I keep encountering arguments here (and elsewhere) that the 2016 polls were wrong, and then people link to sites like this one where an evaluation of the "probability of winning" is done. This is a mixing of mathematical apples and oranges. So a brief math and probability lesson.
Polling is the act of sampling an audience to ask about some attitude or (in this case) future behavior and then predicting the actions of the herd on the basis of the sample. Every poll has an associated margin of error that is, at least in part, a function of the size of the sample and the degree to which it is believed to represent the herd. So if the poll says, "A will get 44% of the vote and B will get 54% of the vote with a margin of error of 2%" the poll is wrong if and only if the vote happens and A does NOT get between 42 and 46% of the vote and/or B does NOT get between 52 and 56% of the vote.
It is also important to note the time when the poll is done. Most of these take the form of "if the vote happened today, who would you vote for?" Because people's views change, that poll can only be construed to mean "on or about that day, if the vote had been held, that would have been the expected distribution." So when someone takes a table like the one on this page points out that 180+ favor Clinton and Trump won, and then claims "so the polls were wrong," they are incorrectly applying the polls. You cannot take polls done in May and claim they were wrong because in November the vote was different. That's kind of like telling my son "you didn't mow the grass!" because 2 weeks after he mowed it the grass is long again.
Another error is to take a poll that has to do with overall support (i.e., the popular vote) and apply it to the electoral college. All of those examples are simply misuse of polling data. Indeed, Clinton took the popular vote by almost 3 million votes, which is what most of those polling sites predicted.
Now what about the sites that gave Clinton a "85% chance of winning" (i.e., from the first link above)? This is a probability based on someone's analysis of the polling data. It's not a poll, so calling it "polling data" is simply wrong. The claim, "they got it wrong" is also in error. Consider the lottery. If someone actually wins the lottery, does this mean that they have proved the probability of someone winning the lottery ISN'T 1 out of 175M? Of course not. A probability is calculated by taking all of the things considered "success" (i.e., the winning number combination - and there is only one of those for each PowerBall) and dividing it by all of the possible outcomes (i.e., the 175,223,510 possible numerical combinations in the PowerBall). That is where the 1/175M probability comes from. It is immutable (unless you change the PowerBall rules).
Where does the election probability come from? Political pundits look at the state-by-state polling data (and district-by-district for those that apportion electoral college votes by district), determine just how solid each is for one candidate or the other (i.e., are they within the margin of error where it could go either way, or is it solidly for one candidate or the other), assign each state a probability, and then back from that to what they believe is the overall probability. If they say 85% for A and 15% for B, and B actually takes it - it does not mean they were wrong; it means B hit the lottery and cashed in on their 15%. In fact, there really is no clean way to determine if they were right or wrong about their probabilities.
I'm not sure if there is another (easier?) way to explain this, but there is a LOT of misunderstanding about this issue.
OK - my mathematical little heart is sated for the day...
Polling is the act of sampling an audience to ask about some attitude or (in this case) future behavior and then predicting the actions of the herd on the basis of the sample. Every poll has an associated margin of error that is, at least in part, a function of the size of the sample and the degree to which it is believed to represent the herd. So if the poll says, "A will get 44% of the vote and B will get 54% of the vote with a margin of error of 2%" the poll is wrong if and only if the vote happens and A does NOT get between 42 and 46% of the vote and/or B does NOT get between 52 and 56% of the vote.
It is also important to note the time when the poll is done. Most of these take the form of "if the vote happened today, who would you vote for?" Because people's views change, that poll can only be construed to mean "on or about that day, if the vote had been held, that would have been the expected distribution." So when someone takes a table like the one on this page points out that 180+ favor Clinton and Trump won, and then claims "so the polls were wrong," they are incorrectly applying the polls. You cannot take polls done in May and claim they were wrong because in November the vote was different. That's kind of like telling my son "you didn't mow the grass!" because 2 weeks after he mowed it the grass is long again.
Another error is to take a poll that has to do with overall support (i.e., the popular vote) and apply it to the electoral college. All of those examples are simply misuse of polling data. Indeed, Clinton took the popular vote by almost 3 million votes, which is what most of those polling sites predicted.
Now what about the sites that gave Clinton a "85% chance of winning" (i.e., from the first link above)? This is a probability based on someone's analysis of the polling data. It's not a poll, so calling it "polling data" is simply wrong. The claim, "they got it wrong" is also in error. Consider the lottery. If someone actually wins the lottery, does this mean that they have proved the probability of someone winning the lottery ISN'T 1 out of 175M? Of course not. A probability is calculated by taking all of the things considered "success" (i.e., the winning number combination - and there is only one of those for each PowerBall) and dividing it by all of the possible outcomes (i.e., the 175,223,510 possible numerical combinations in the PowerBall). That is where the 1/175M probability comes from. It is immutable (unless you change the PowerBall rules).
Where does the election probability come from? Political pundits look at the state-by-state polling data (and district-by-district for those that apportion electoral college votes by district), determine just how solid each is for one candidate or the other (i.e., are they within the margin of error where it could go either way, or is it solidly for one candidate or the other), assign each state a probability, and then back from that to what they believe is the overall probability. If they say 85% for A and 15% for B, and B actually takes it - it does not mean they were wrong; it means B hit the lottery and cashed in on their 15%. In fact, there really is no clean way to determine if they were right or wrong about their probabilities.
I'm not sure if there is another (easier?) way to explain this, but there is a LOT of misunderstanding about this issue.
OK - my mathematical little heart is sated for the day...
Comment