Crowds are wise enough to know when other people will get it wrong

Unexpected yet popular answers often turn out to be correct.

This article by Cathleen O’Grady was published by Ars Technical on 29th January, 2017. O’Grady  is Ars Technica’s contributing science reporter. She has a background in cognitive science and evolutionary linguistics.

Flickr user. Hsing Wei

The “wisdom of the crowd” is a simple approach that can be surprisingly effective at finding the correct answer to certain problems. For instance, if a large group of people is asked to estimate the number of jelly beans in a jar, the average of all the answers gets closer to the truth than individual responses. The algorithm is applicable to limited types of questions, but there’s evidence of real-world usefulness, like improving medical diagnoses.

This process has some pretty obvious limits, but a team of researchers at MIT and Princeton published a paper in Nature [Nature, 2016. DOI: doi:10.1038/nature21054] this week suggesting a way to make it more reliable: look for an answer that comes up more often than people think it will, and it’s likely to be correct.

As part of their paper, Dražen Prelec and his colleagues used a survey on capital cities in the US. Each question was a simple True/False statement with the format “Philadelphia is the capital of Pennsylvania.” The city listed was always the most populous city in the state, but that’s not necessarily the capital. In the case of Pennsylvania, the capital is actually Harrisburg, but plenty of people don’t know that.

The wisdom of crowds approach fails this question. The problem is that questions sometimes rely on people having unusual or otherwise specialized knowledge that isn’t shared by a majority of people. Because most people don’t have that knowledge, the crowd’s answer will be resoundingly wrong.

Previous tweaks have tried to correct for this problem by taking confidence into account. People are asked how confident they are in their answers, and higher weight is given to more confident answers. However, this only works if people are aware that they don’t know something—and this is often strikingly not the case.

In the case of the Philadelphia question, people who incorrectly answered “True” were about as confident in their answers as people who correctly answered “False,” so confidence ratings didn’t improve the algorithm. But when people were asked to predict what they thought the overall answer would be, there was a difference between the two groups: people who answered “True” thought most people would agree with them, because they didn’t know they were wrong. The people who answered “False,” by contrast, knew they had unique knowledge and correctly assumed that most people would answer incorrectly, predicting that most people would answer “True.”

Because of this, the group at large predicted that “True” would be the overwhelmingly popular answer. And it was—but not to the extent that they predicted. More people knew it was a trick question than the crowd expected. That discrepancy is what allows the approach to be tweaked. The new version looks at how people predict the population will vote, looks for the answer that people gave more often than those predictions would suggest, and then picks that “surprisingly popular” answer as the correct one.

To go back to our example: most people will think others will pick Philadelphia, while very few will expect others to name Harrisburg. But, because Harrisburg is the right answer, it’ll come up much more often than the predictions would suggest.

Prelec and his colleagues constructed a statistical theorem suggesting that this process would improve matters and then tested it on a number of real-world examples. In addition to the state capitals survey, they used a general knowledge survey, a questionnaire asking art professionals and laypeople to assess the prices of certain artworks, and a survey asking dermatologists to assess whether skin lesions were malignant or benign.

Across the aggregated results from all of these surveys, the “surprisingly popular” (SP) algorithm had 21.3 percent fewer errors than a standard “popular vote” approach. In 290 of the 490 questions across all the surveys, they also assessed people’s confidence in their answers. The SP algorithm did better here, too: it had 24.2 percent fewer errors than an algorithm that chose confidence-weighted answers.

It’s easy to misinterpret the “wisdom of crowds” approach as suggesting that any answer reached by a large group of people will be the correct one. That’s not the case; it can pretty easily be undermined by social influences, like being told how other people had answered. These failings are a problem, because it could be a really useful tool, as demonstrated by its hypothetical uses in medical settings.

Improvements like these, then, contribute to sharpening the tool to the point where it could have robust real-world applications. “It would be hard to trust a method if it fails with ideal respondents on simple problems like [the capital of Pennsylvania],” the authors write. Fixing it so that it gets simple questions like these right is a big step in the right direction.