Replication and Bullshit

July 9, 2014
Posted by Jay Livingston

A bet is tax on bullshit, says Marginal Revolution’s Alex Tabarrok (here).  So is replication.

Here’s one of my favorite examples of both – the cold-open scene from “The Hustler” (1961). Charlie is proposing replication. Without it, he considers the effect to be random variation.



It’s a great three minutes of film, but to spare you the time, here’s the relevant exchange.

CHARLIE
    You ought to take up crap shooting. Talk about luck!

         EDDIE
    Luck! Whaddya mean, luck?

         CHARLIE
    You know what I mean. You couldn't make that shot again in a million years.

       EDDIE
    I couldn’t, huh? Okay. Go ahead. Set ’em up the way they were before.

         CHARLIE
    Why?

         EDDIE
    Go ahead. Set ’em up the way they were before. Bet ya twenty bucks. Make that shot just the way I made it before.

         CHARLIE
    Nobody can make that shot and you know it. Not even a lucky lush.


After some by-play and betting and a deliberate miss, Eddie (aka Fast Eddie) replicates the effect, and we segue to the opening credits* confident that the results are indeed not random variation but a true indicator of Eddie’s skill.

But now Jason Mitchell, a psychologist at Harvard, has published a long throw-down against replication. (The essay is here.) Psychologists shouldn’t try to replicate others’ experiments, he says. And if they do replicate and find no effect, the results shouldn’t be published.  Experiments are delicate mechanisms, and you have to do everything just right. The failure to replicate results means only that someone messed up.

Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way.  Unless direct replications are conducted by flawless experimenters, nothing interesting can be learned from them.


L. J. Zigerell, in a comment at Scatterplot thinks that Mitchell may have gotten it switched around. Zigerell begins by quoting Mitchell,

“When an experiment succeeds, we can celebrate that the phenomenon survived these all-too-frequent shortcomings.”

But, actually, when an experiment succeeds, we can only wallow in uncertainty about whether a phenomenon exists, or whether a phenomenon appears to exist only because a researcher invented the data, because the research report revealed a non-representative selection of results, because the research design biased results away from the null, or because the researcher performed the experiment in a context in which the effect size for some reason appeared much larger than the true effect size.

It would probably be more accurate to say that replication is not so much a tax on bullshit as a tax on those other factors Zigerell mentions. But he left out one other possibility: that the experimenter hadn’t taken all the relevant variables into account.  The best-known of these unincluded variables is the experimenter himself or herself, even in this post-Rosenthal world. But Zigerell’s comment reminded me of my own experience in an experimental psych lab. A full description is here, but in brief, here’s what happened. The experimenters claimed that a monkey watching the face of another monkey on a small black-and-white TV monitor could read the other monkey’s facial expressions.  Their publications made no mention of something that should have been clear to anyone in the lab: that the monkey was responding to the shrieks and pounding of the other monkey – auditory signals that could be clearly heard even though the monkeys were in different rooms.

Imagine another researcher trying to replicate the experiment. She puts the monkeys in rooms where they cannot hear each other, and what they have is a failure to communicate. Should a journal publish her results? Should she have even tried to replicate in the first place?  In response, here are Mitchell’s general principles:


    •    failed replications do not provide meaningful information if they closely follow original methodology;
    •     Replication efforts appear to reflect strong prior expectations that published findings are not reliable, and as such, do not constitute scientific output.
    •    The field of social psychology can be improved, but not by the publication of negative findings.
    •    authors and editors of failed replications are publicly impugning the scientific integrity of their colleagues.


Mitchell makes research sound like a zero-sum game, with “mean-spirited” replicators out to win some easy money from a “a lucky lush.” But often, the attempt to replicate is not motivated by skepticism and envy. Just the opposite. You hear about some finding, and you want to see where the underlying idea might lead.** So as a first step, to see if you’ve got it right, you try to imitate the original research. And if you fail to get similar results, you usually question your own methods.

My guess is that the arrogance Mitchell attributes to the replicators is more common among those who have gotten positive findings.  How often do they reflect on their experiments and wonder if it might have been luck or some other element not in their model?

----
* Those credits can be seen here – with the correct aspect ratio and a saxophone on the soundtrack that has to be Phil Woods. 

** (Update, July 10) ** DrugMonkey, a bio-medical research scientist says something similar:   
Trying to replicate another paper's effects is a compliment! Failing to do so is not an attack on the authors’ “integrity.” It is how science advances.  

Don’t Explain

July 3, 2014
Posted by Jay Livingston

Adam Kramer, one of the authors of the notorious Facebook study has defended this research. Bad idea. Even when an explanation is done well, it’s not as a good as a simple apology. And Kramer does not do it well. (His full post is here.)

OK so. A lot of people have asked me about my and Jamie and Jeff's recent study published in PNAS, and I wanted to give a brief public explanation.

“OK so.” That’s the way we begin explanations these days. It implies that this is a continuation of a conversation. Combined with the first-names-only reference to co-authors it implies that we’re all old friends here – me, you, Jamie, Jeff – picking up where we left off.

The reason we did this research is because we care about the emotional impact of Facebook and the people that use our product.

“We care.” This will persuade approximately nobody. Do you believe that Facebook researchers care about you? Does anyone believe that?

Regarding methodology, our research sought to investigate the above claim by very minimally deprioritizing a small percentage of content in News Feed (based on whether there was an emotional word in the post) for a group of people (about 0.04% of users, or 1 in 2500) for a short period (one week, in early 2012).

See, we inconvenienced only a handful of people – a teensy tiny 0.04%. Compare that with the actual publication, where the first words you see, in a box above the abstract, are these: 
We show, via a massive (N = 689,003) experiment on Facebook . . .[emphasis added]
The experiment involved editing posts that people saw. For some FB users, the researchers filtered out posts with negative words; other users saw fewer positive posts.

Nobody's posts were “hidden,” they just didn’t show up on some loads of Feed. Those posts were always visible on friends’ timelines, and could have shown up on subsequent News Feed loads.

“Not hidden, they just didn’t show up.” I’m not a sophisticated Facebook user, so I don’t catch the distinction here. Anyway, all you had to do was guess which of your friends had posted things that didn’t show up and then go to their timelines. Simple.

Kramer than goes to the findings.

at the end of the day, the actual impact on people in the experiment was the minimal amount to statistically detect it

That’s true. At the end of the day, the bottom line – well, it is what it is. But you might not have realized how minuscule the effect was if you had read only the title of the article:
Experimental evidence of massive-scale emotional contagion through social network  [emphasis added]
On Monday, it was massive. By Thursday, it was minimal.

Finally comes a paragraph with the hint of an apology.

The goal of all of our research at Facebook is to learn how to provide a better service. Having written and designed this experiment myself, I can tell you that our goal was never to upset anyone.

I might have been more willing to believe this “Provide a better service” idea, but Kramer lost me at “We care.” Worse, Kramer follows it with “our goal was never to upset.” Well, duh. A drunk driver’s goal is to drive from the bar to his home. It’s never his goal to smash into other cars. Then comes the classic non-apology: it’s your fault.

I can understand why some people have concerns about it, and my coauthors and I are very sorry for the way the paper described the research and any anxiety it caused. In hindsight, the research benefits of the paper may not have justified all of this anxiety.

This isn’t much different from, “If people were offended . . .” implying that if people were less hypersensitive and more intelligent, there would be no problem. If only we had described the research in such a way that you morons realized what we were doing, you wouldn’t have gotten upset. Kramer doesn’t get it.

Here’s whey I’m pissed off about this study.
  • First, I resent Facebook because of its power over us. It’s essentially a monopoly. I’m on it because everyone I know is on it. We are dependent on it.
  • Second, because it’s a monopoly, we have to trust it, and this experiment shows that Facebook is not trustworthy. It’s sneaky. People had the same reaction a couple of years ago when it was revealed that even after you logged out of Facebook, it continued to monitor your Internet activity.
  • Third, Facebook is using its power to interfere with what I say to my friends and they to me. I had assumed that if I posted something, my friends saw it.
  • Fourth, Facebook is manipulating my emotions. It matters little that they weren’t very good at it . . . this time. Yes, advertisers manipulate, but they don’t do so by screwing around with communications between me and my friends.
  • Fifth, sixth, seventh . . . I’m sure people can identify many other things in this study that exemplify the distasteful things Facebook does on a larger scale. But for now, it’s the only game in town.
And one more objection to Kramer’s justification. It is so tone-deaf, so to the likely reactions of people both to the research and the explanation, that it furthers the stereotype of the data-crunching nerd – a whiz with an algorithm but possessed of no intepersonal intelligence.

--------------
Earlier posts on apologies are here and here

The title of this post is borrowed from a Billie Holiday song, which begins, “Hush now, don’t explain.” Kramer should have listened to Lady Day.

UPDATE, July 4
At Vox, Nilay Patel says many of these same things.  “What we're mad about is the idea of Facebook having so much power we don't understand — a power that feels completely unchecked when it’s described as ‘manipulating our emotions.’”  Patel is much better informed about how Facebook works than I am. He understands how Facebook decides which 20% of the posts in your newsfeed to allow through and which 80% (!) to delete. Patel also explains why my Facebook feed has so many of those Buzzfeed things like “18 Celebrities Who Are Lactose Intolerant."

Medicare Advantage – the Private Option

June 29, 2014
Posted by Jay Livingston

Healthcare stubbornly refuses to conform to conventional economic models, particularly the idea that competing private firms are more effective than government.  Medicare Advantage may be the latest example of privatization not working out the way it’s supposed to.

Medicare Advantage is part of George W. Bush’s  Medicare Modernization Act (MMA) of 2003.  Medicare, the original,  is a single-payer system; the government pays doctors. Medicare Advantage is the private option – the government pays money to insurance companies, who in turn sell insurance plans for seniors. 

The theory behind this privatization of Medicare was that it would bring more insurance companies into the market, and the competition among those companies would result in better and cheaper medical coverage.  Opponents of the MMA saw it as yet another instance of the Bush administration giving away money to business. 

Did the Medicare Advantage subsidies bring better results? We don’t have a randomized control study, but a provision of the MMA allows for a sort of natural experiment.  Counties in areas with a population of 250,000 or more got subsidies that were 10.5% greater than counties in areas under 250,000.  Three Wharton professors* compared the outcomes. 

One of the results comes right out of the Econ textbook: where subsidies were higher, more firms followed the money and entered the marketplace. They also enrolled more people.

The first key takeaway is that a firm’s decision to enter a market is highly responsive to how much the government pays. When the government pays more for private health insurance through Medicare, more insurers compete to offer that coverage.

But the important question is whether the money that brought companies into the marketplace went to cheaper and better medical care.  And if not, where did the money go?

Our findings indicate that we see more insurers enter and we see more people enroll, and we see more advertising expenditures. But we actually don’t see much better quality when you pay plans more. The question then naturally rises, “Where does the money seem to go?” And in a final empirical analysis, we try to see how much of it ripples through to profits of health insurers. And we see that a quite significant share of it does. [emphasis added].

This is not really surprising. For-profit firms want to make a profit. In theory (classical economic theory), they should make that profit by providing a better product. Unfortunately, that’s not what happened.

A second takeaway is that, at least given the many quality measures that we can look at, we don’t find a ton of evidence that paying plans substantially more leads to much better quality. . . .  We didn’t see a big improvement in quality. And we’re talking about billions of dollars in additional government spending as a result of this somewhat higher reimbursement in the places with a population of 250,000 or more.


Under Obamacare, reimbursements to Medicare Advantage will shrink. Reimbursments to Medicare Advantage have been 14% higher than those in the traditional Medicare, and Obama care aims to reduce that difference. Obama opponents have run scare ads, and of course the insurance companies have lobbied heavily against the reductions.  But according to the Wharton study, the reductions will have little impact on seniors.

there are a number of changes that will take effect over the next several years as a result of the Affordable Care Act, better known as Obamacare. Chief among them is a reduction in the generosity of reimbursement of Medicare Advantage plans… our evidence suggests that the costs of those reimbursement cuts for consumers might not be so great after all..

---------------------------------

*Mark Duggan, Amanda Starc, and Boris Vabson, NBER paper “Who Benefits when the Government Pays More? Pass-Through in the Medicare Advantage.” The interview with Duggan is here

Soccer and Status Politics

June 27, 2014
Posted by Jay Livingston

Ann Coulter nails it in her column on soccer.  Not the part about the rising interest in soccer signalling America’s  moral decay. That’s just her usual attempt to be provocative.  What Coulter gets right is that soccer is part of the cultural divide.  The question she raises is much bigger than whether soccer is an inferior sport to baseball or football. It’s “Whose country is this anyway?”

Though she doesn’t say so explicitly, Coulter frames soccer is a matter of status politics – the struggle for recognition, respect, and prestige among different groups. She sees the soccer demographic as is a coalition of White liberals and immigrants of the past generation or two. The anti-soccer side comprises what Sarah Palin called “the real America” – non-urban, White, Protestant, nativist, Republican.  That’s Coulter’s side, and she’s worried that in the long run, her side will lose.

We’ve seen this match-up before. In the late 19th and early 20th centuries, Prohibition provided a vehicle for “real Americans” to assert the virtue and predominance of their way of life over that of the immigrant, non-Protestant groups. The opposition to Obamacare (and just about any Obama policy) had pretty much the same roster.  (See an earlier post here.) In both cases, these groups felt a threat to their position of privilege.  The anti-Obama crowd is explicit about this sense of loss and threat. America is “our” country, “they” have taken it away, and we are going to take it back.  (See my “Repo Men” post from three years ago.)

Coulter is absolutely open about her nativism and Xenophobia – none of this “America is a nation of immigrants” nonsense. Or as she says, “I promise you: No American whose great-grandfather was born here is watching soccer.”  And one of the bullet points in her argument that soccer is a sign of moral decay is
  • It's foreign.
Followed by
  • Soccer is like the metric system, which liberals also adore because it's European.
(The metric system is simpler and more logical. But it’s used in all those foreign countries, and it’s used universally in science – two reasons for conservatives like Coulter to give it the red card.)

Maybe liberals do like soccer because it’s European, or more accurately international.  But it’s equally true that conservatives fear things because they are foreign.  They demand that the rest of the world become American.  In 2006, John Tierney, a conservative/libertarian writing for the Times, said (here), “Instead of us copying the rest of the world, the rest of the world could learn from us. Maybe they love soccer because they haven’t been given better alternatives.” *

To see what else the soccer soccer coalition liked, I went to Google correlates and entered “world cup.” Unfortunately, data for the current World Cup are not in, so most of the queries are from 2010.  The map looks like what you would expect – the states where people Googled “World Cup” were the Northeast corridor and California. What’s more puzzling is that many of the highest correlates were for movies – Oscar nominees like “Avatar” and “The Hurt Locker,” but also movies liberals like – “Vicki Cristina Barcelona,” “Inception,” and “Eat, Pray, Love.” All these had correlation coefficients with “World Cup” of 0.87 or higher. Here are the results for “World Cup” and “Oscars 2010.”



The other highly correlated cluster of terms had a different theme:
  • hanukkah 2010 (0.8989)
  • passover 2010 (0.8972)
  • yom kippur 2010 (0.8950)
  • chanukah 2010 (0.8874)
Here are the graphics:



This does not necessarily mean that people who Googled “passover 2010" also Googled “World Cup.” It means only that in states where people Googled “passover 2010" people also Googled “world cup.” In New York and California, for example, it might have been Jews looking for information about Passover and while Hispanics Googled “World Cup.”

Soccer, Jews, and moral decay.  This combination reminded me of something Coulter said in a 2007 interview with Donny Deutsch, who happens to be Jewish (the full transcript is here):


COULTER: Well, OK, take the Republican National Convention. People were happy. They're Christian. They're tolerant. They defend America, they —
DEUTSCH: Christian — so we should be Christian? It would be better if we were all Christian?
COULTER: Yes.
DEUTSCH: We should all be Christian?
COULTER: Yes. Would you like to come to church with me, Donny? . . . . .
COULTER: No, we think — we just want Jews to be perfected, as they say.
DEUTSCH: Wow, you didn't really say that, did you?
. . . . . .


DEUTSCH: Ann said she wanted to explain her last comment. So I'm going to give her a chance. So you don't think that was offensive?
COULTER: No. I'm sorry. It is not intended to be. I don't think you should take it that way, but that is what Christians consider themselves: perfected Jews.

Coulter didn’t mention soccer at the time, but perhaps that is yet another sign of the how Jews are imperfect compared to Christians – they live in places where soccer is popular, places where small-town and suburban WASP conservatives are not so dominant. For Coulter, that’s not just imperfect, that’s moral decay.


----------------------
*In 2012, Marco Rubio, addressing the Republican convention, used nearly identical language – the same know-nothing arrogance – in speaking about Democratic proposals like Obamacare: “These are ideas that threaten to make America more like the rest of the world instead of making the rest of the world more like America.”