Twenty-five Is Not a Random Number

June 21, 2009
Posted by Jay Livingston

Political scientists Bernd Beber and Alexandra Scacco have a simple test for electoral fraud in the Iranian election. Here are the results from Qom
  • Ahmadinejad . . . . .422,457
  • Karroubi . . . . . . . . . . 2,314
  • Mousavi. . . . . . . . . .148,467
  • Rezaee. . . . . . . . . . . . 16,297
Which digits are the important ones? The left-most ones, of course – Ahmadinejad’s roughly 420,000 to Mousavi’s 148,000.

But Beber and Scacco were interested in the right-most digits, the ones that we might throw out and round to zero. Here’s why:

When people try to make up numbers that appear to be random, they show certain preferences. Try it yourself. Think of any random number from 0 to 100. I’ll wait. Got your number? O.K. Chances are it’s an odd number that does not end in 5. More than likely, it does end in 7.*

In an honest vote count, about 10% of the final digits should be fives, and 10% should be sevens. If five is underrepresented, and if seven is overrepresented, someone is trying to make up numbers and have them seem random.

Beber and Scacco looked at the 116 results (four candidates x 29 provinces) and . . .
The numbers look suspicious. We find too many 7s and not enough 5s in the last digit. We expect each digit (0, 1, 2, and so on) to appear at the end of 10 percent of the vote counts. But in Iran's provincial results, the digit 7 appears 17 percent of the time, and only 4 percent of the results end in the number 5. Two such departures from the average – a spike of 17 percent or more in one digit and a drop to 4 percent or less in another – are extremely unlikely. Fewer than four in a hundred non-fraudulent elections would produce such numbers.

In a second test, Beber and Scacco also looked at the last two digits.
Psychologists have also found that humans have trouble generating non-adjacent digits (such as 64 or 17, as opposed to 23) as frequently as one would expect in a sequence of random numbers.
Sure enough, the totals had fewer non-adjacent pairs than would be expected, especially in the province totals for Ahmadinejad. The two tests provide a fairly persuasive case for what most people think anyway – that the vote totals reported by the Iranian government were fabricated.

Beber and Scacco report their research in the Washington Post here.

*Street magician David Blaine uses this same tendency in one of his mind reading tricks. A lot of people pick 37.

Hat tip to Joshua Tucker at The Monkey Cage, which has links to the electoral data.

9 comments:

  1. I'd be curious to see what happens when they apply this to unbiased elections.

    ReplyDelete
  2. Beber and Scacco didn't just come up with this on the spot in response to Iran. It's the basis of their doctoral research, which uses the same techniques to look at elections in Sweden (honest) and Nigeria (not so much).

    Their WaPo article also said, "the state-by-state vote counts for John McCain and Barack Obama in last year's U.S. presidential election. The frequencies of last digits in these election returns never rise above 14 percent or fall below 6 percent, a pattern we would expect to see in seventy out of a hundred fair elections."

    ReplyDelete
  3. I've heard that some use the same logic to detect fraud in the reporting of results in science journals....

    ReplyDelete
  4. I hope you can take a closer look at this and see that it's bogus. You say they apply a simple test -- can you actually state what that test is? Because it certainly isn't testing the null hypothesis that the last digit is uniformly distributed. Go and look at the Nigeria paper that you're talking about -- the authors say that fake random numbers will have an excess of 1s, 2s, and 3s and a dearth of 6s, 8s, 9s, and 0s (I believe). They don't mention the numbers 5 and 7 at all.

    Also, go download their 2008 dataset and look at the second-to-last digit. You'll see a similar pattern that's even more rare than the one in Iran.

    Lastly, you absolutely wouldn't expect 10% of each number in the last digit in a fair election. The odds of only having 11 or 12 occurrences of each number in a random, 116-number series are much lower than the odds of seeing what happened in Iran.

    If the authors had noticed that two digits, say 5 and 7, appeared too frequently, do you think they would've written the same article?

    ReplyDelete
  5. The method used for detecting this fraud reminds me of Benford's Law.

    ReplyDelete
  6. More scepticism about that analysis:

    http://www.analyticpolitics.org/2009/06/devil-is-in-statistics.html

    ReplyDelete
  7. James, the guy you link to hints of some Bayesian analysis, but he doesn't saw what priors we should use or even try to assess.

    I've been asking around to people who know far more about statistics and probability than I do, and none of them can figure out a way to calculate the probability of getting one digit over- or under-represented by a count of 7 (i.e., in the 116 numbers, getting 19 or more or 5 or fewer). I agree with Zach that there’s some reaosn to be skeptical about the validity the Beber-Scacco analysis. (I'm also skeptical about the official results of the election, but that's a different matter.)

    ReplyDelete
  8. Jay - I don't know much about statistics and also couldn't figure out those stats analytically for what that's worth. The probabilities are high enough that it's trivial to get the answer by simulation, though. I'm not sure exactly what you're looking for, but running 100,000 simulations (for each condition), I get these results:
    Any digit, 19 or more times: 20737
    Any digit, 5 or fewer: 19774
    X>=19 & Y <=5: 6144

    The average number of occurrences is actually 11.6, so there's a lower chance of besting 12 by 7 than getting 7 fewer than 12. In Iran, there were 20 7s and 5 5s -- it's more rare to see a single number 20 times than a single number 5 times; the authors are right here about the combined probability being about 3.5% (less that 4 in 100 in the article).

    It's not too hard to come up with an expression with stupidly large binomial coefficients and exponents that should give the right probability for these things, but I have no clue how to compute them.

    ReplyDelete
  9. Hi Aileen, thanks for the comment. It's always nice to have another reader among the select few.

    ReplyDelete