Crowds – Blinkers vs.Thinkers

January 6, 2015
Posted by Jay Livingston

Most of the wisdom-of-crowds posts in this blog have been about sports betting. The trouble there is that no matter how many people are betting,  they have only two choices – the favorite or the underdog. To see whether the crowd is wiser than the experts, you’d need data on many, many games.

The original wisdom-of-crowds test was a weight-guessing contest, so the crowd had an infinite number of choices – not just Colts or Bengals but all weights from one pound on up.

Plymouth, England, 1906. On display is an ox, slaughtered and dressed. How much does it weigh? Fairgoers submitted their guesses. A statistician, Francis Galton, happened to be there and recorded the data. Galton was also a eugenicist, so he was certain that the guesses of the masses would be less accurate than those of the experts. But it turned out that the crowd, as a group, was far more accurate. The average of all the guesses (n=787) was within one pound of the actual weight (1,198 lbs). No individual guess came that close.

In a blog post many years ago, I mentioned (here) that I was going to try to replicate the study but with the students in my class replacing the fairgoers, and instead of ox, a jar of M&Ms.  I did, and it proved to be another failure to replicate. The class mean was way off, but mostly because of one outlier, a girl whose guess was an order of magnitude higher than the others. Besides, the sample size, about 20 students, was too small.

Now, Erik Steiner, a geographer at Stanford, has gone Galton using the coins his parents had been tossing into a jar for the last 27 years.  Steiner crowdsourced guesses to the entire Internet. He posted the contest on the Stanford Website, and then Wired reposted it.

Photo: Susie Steiner

He got 602 guesses,* not exactly the entire Internet, but enough for data analysis. Here is his summary:

I won’t bore you with the finer points of asymmetric non-parametric one-sample T tests, but let’s put it this way: The crowd was waaaay off.

The value of the coins in the jar was $379.54. The average of the guesses was $596.12 – a difference of $216.58, or 57%.

Steiner’s results don’t give much support to the crowd. But the experts, those who tried to be the smart money, were even shorter on wisdom.

. . . people who claim to have done some math were far less accurate (X =$724.81) than those who made a snap judgment (X =$525.02). This may explain why estimates submitted from .edu or gmail addresses were less accurate than guesses submitted from hotmail and yahoo addresses.

Here is Steiner’s chart of the data.

(Click on the chart for a larger view.)

Steiner refers dismissively to “all that Gladwellian snap-judgment stuff.” But even he has to admit that the blinkers did better than the thinkers. In fact, the crowd median, rather than the mean, was pretty close to the actual value.  Without those thinkers who “actually did the math,” the median and mean would have been even closer to the mark.

(Steiner’s write-up, along with charts and links to the data, are at Wired – here.)

* I’m not sure what to make of these response rates. Steiner sent his query out to potentially the world, but his crowd turned out to be smaller than the one that whose guesses Galton talled that day at the fair. I guess it’s a matter of whose ox is scored.


Anonymous said...

What about overthinkers who bootstrap?

Jay Livingston said...

But how would Bunsen go about the original challenge – how many coins? It’s not about guessing the characteristic of a population based on a sample. Guessing the mean date is easier even if you don’t sample a few coins since you still have some sense of the age of coins in circulation (though the Price-Is-Right rule makes it harder). And why is it called “bootstrapping”?

Andrew Gelman said...

That's a pretty graph, but what was he thinking looking at the mean?? I guess that's what Galton did? I'd think the median would be the obvious summary.