Random Samples Win!
Posted by Jeffrey Henning on Mon, Sep 29, 2008
The Christian Science Monitor reported in a blog post, "Obama: Big winner in debate says new poll": "The survey released from USA Today and Gallup show that of the 701 people polled 46 percent believe Barack Obama was the victor [in Friday night's debate] while 34 percent give the edge to John McCain."
In response to this, one commenter, Patrick Ludt, asked, "According to an AOL poll yesterday 52-48 said that McCain won the debate and that was 500,000 thousand people voting how do you explain that?????"
Another commenter, Turk, followed up, with:
701 USA Today/GALLUP................46/34
500,000 MOST LIKELY VOTERS/AOL......52/48
YOU DO THE MATH!
Well, I'll do the math. First, the commenter was misremembering the survey-the AOL debate poll didn't have a half-million responses but had 40,082 responses (at the time of this writing) that went 53% for McCain, 41% for Obama and 6% calling it a draw, markedly different from the Gallup results. Still, I recognize that it is counterintuitive that a study with 40,000 respondents might be less accurate than a study with 700 representatives, but in fact that is precisely the case. Here's why.
To do the math, you have to look at the underlying theory of sampling. The Gallup poll is a scientific, random sample, and can be used to project to the U.S. population. The AOL poll is a self-selected convenience sample, and can't be used to project anything except the composition of AOL site visitors who vote in its polls. A scientific poll needs two main things to be valid: 1) randomness: an equal chance of selecting any member of the population ("probability sampling"), and 2) external selection: respondents are chosen to participate rather than deciding to take the poll themselves. The AOL poll had neither of these things.
AOL site visitors who vote in AOL polls are not representative of the U.S. population, as illustrated by AOL's recent Presidential poll: 61% voted for McCain for President and 39% for Obama, with 272,939 votes. Just to put that in perspective, if that were the outcome of the election, McCain would outperform his hero, Ronald Reagan, who won 58.8% to 40.6% in 1984. The AOL electoral map has McCain winning 535 votes to 3. No pundit is predicting a record landslide for McCain. So here's a 270,000+ respondent survey that is a poorer predictor of national outcome than the 2,000-respondent surveys independent polling organizations are running.
In case those numbers alone are not compelling, let's try a parable.
A marine biologist receives a grant to estimate the population of fish in a certain area of the sea. He commissions three fishermen to help him. One sails out, drops his net down 200 meters, and pulls up a catch with 90% tuna, 9% swordfish and 1% marine hatchetfish. The second fisherman sails out, drops his net down 1000 meters, and pulls up a catch with 9% tuna, 90% swordfish, and 1% marine hatchetfish. The third fisherman sails out, drops his net down 4000 meters and pulls up a catch with 1% tuna, 9% swordfish, and 90% marine hatchetfish. How many fish of each type live in the sea?
Got me, but I'm a market researcher by background, a fisher of men-and women-in random samples, not a fisher of-fish. All I know is that different types of fish live at different depths of water. The problem with polling is that where you cast your nets determine what kind of results you get back. If you go fishing in red-state waters, you will get red-state voters; if you go fishing in the blue-state ocean, you will get blue-state voters. AOL.com is in red-state waters.
AOL's polls only get AOL visitors. Interrupting people to take a survey, either by telephone or by email, is better than posting a poll on a page and letting people vote. Calling back repeatedly if no one answers the phone and sending scheduled email reminders to take the survey improves the statistical validity by attempting to select the same people more often, to remove any selection bias that might be present if only one contact was made.
Random sampling may not be popular, but it is very accurate. As the old polling joke has it, "If you don't believe in random sampling, next time you go to the doctor for a blood test, have him take it all." You can debate who won on Friday, but you can't debate that scientific polls achieve great accuracy with comparatively few responses.