Tuesday, April 15, 2008

Polls, statistics and lies

Running a survey and computing statistics is like walking on a ridge, with bias as the two steeply sloping sides. Conducting an objective study is rife with opportunities to skew the results. Intentionally or unintentionally, an experiment can be designed to yield the desired results. A truly objective experiment should start with a probabilistic method for selecting a sample, gather data without inducing any bias into the participants, classify and summarize the data accurately and present the data in a clear, honest and unambiguous fashion to support the decisions reached.

Literary Digest predicted a landslide victory for the republican Alf Landon in the 1936 presidential elections while the incumbent Franklin Roosevent actually won by a wide margin. During the 1948 presidential elections, polls predicted that Thomas Dewey would defeat Harry Truman. More recently, polls predicted that Barack Obama would defeat Hillary Clinton in the the New Hampshire primaries. If actual results do not match desired results, marketing machines tend to distort the facts to suit their objectives. A recent Business Week cover story discussed how manufacturers promote cholestrol lowering drugs by presenting half truths and seeming facts.

Literary Digest tallied the 2.3 million respondents from the 10 million polled and arrived at their conclusion favoring Alf Landon. Peverill Squire in his 1988 article in the Public Opinion Quarterly 52:125-133 entitled "Why the 1936 literary digest poll failed" posits a theory. First, the sample was selected based on telephone directories and automobile registration records. The sample was thus not representative of the population; the poor who backed Roosevelt could not afford telephones and automobiles. Second, the response rate was less than 25% and based on the results from a Gallup poll trying to investigate the discrepancy in results, among the non-respondents, a very high percentage favored Roosevelt.

In 1948, it was Gallup's turn to incorrectly conclude from its poll that Dewey will defeat Truman in the presidential elections. Sampling error was to blame. Pollsters adopted quota sampling where a fixed sample size was selected from each strata of society, instead of random sampling. Secondly, pollsters got cocky and stopped polling a few weeks too soon. Finally, the polls were conducted over the telephone and the poor who didn't have access to telephones swayed the election in Truman's favor. Popcorn and feed polls (items on sale were printed with donkeys and elephants and buyers urged to purchase the item printed with their party affiliation) targeted the lower income strata of society and such polls correctly indicated Truman as the winner. However, these polls were not published widely and those conducted by Gallup and others had national outreach with major newspapers. In the last few weeks leading up to election day, Truman's campaign style helped him reach out to the poorer voters, unlike Dewey's.

ABC, CNN, CSPAN, Gallup, Reuters, USA Today, Washington Post, and Zogby polls all had Democratic candidate Sen. Barack Obama winning the 2008 New Hampshire Primary against Sen. Hillary Clinton. This was perhaps the biggest error in polling results since Chicago Tribune prematurely proclaimed Dewey as the winner in the 1948 presidential elections. Once again, sampling was inaccurate but it was also compounded by a larger margin of error due to a significant 18% "undecided" vote in the exit polls. Poorer white voters tend not to participate in surveys and they generally have an unfavorable view of black candidates. Women voters who overwhelmingly supported Clinton also outpolled male voters in the primary, which could have upset the sampling.

No comments:

smugmug