Showing posts with label Methods. Show all posts
Showing posts with label Methods. Show all posts

A Behavioral Econ Lab Is Not a Restaurant

July 16, 2018
Posted by Jay Livingston

Great title for an article
We should totally open a restaurant:
How optimism and overconfidence affect beliefs
It will be in the August issue of the Journal of Economic Psychology. The link popped up in my Twitter feed this morning.


No, the failure rate for restaurants is not 90% in the first year as a 2003 American Express ad claimed. But most restaurants don’t make it to three years. So it’s only natural to ask about the people who think that their new restaurant will be among those that beat the odds. This was an article I wanted to read.

Imagine my surprise when I discovered that the article was not at all about people who started up a restaurant. True, the word restaurant appears 13 times in the article, plus another seven if you include restauranteur [sic – the preferred term is still restaurateur, no n]. But the data in the article is a from a laboratory experiment where subjects try to guess whether a ball drawn from an urn will be white or black. No chefs brilliant but overweaning, no surly waitstaff, no price-gouging suppliers, no unpredictable customers, no food, and no location, location, location. Just opaque jars with white balls and black balls.

The procedure is too complicated to summarize here – I’m still not sure I understand it – but the authors (Stephanie A. Hegera and Nicholas W. Papageorge) want to distinguish, as the title of the article says, between optimism and overconfidence. Both are rosy perceptions that can make risky ventures seem less risky. Optimism looks outward; it overestimates the chances of success that are inherent in the external situation. Optimism would be the misperception that most restaurants survive for years and bring their owners wealth and happiness. Overconfidence, by contrast, looks inward; it is an inflated belief in one’s own abilities.

Both in the lab and probably in real life, there’s a strong correlation between optimism and overconfidence. People who were optimistic also overestimated their own abilities. (Not their ability to run a restaurant, remember, but their ability to predict white balls.) So it’s hard to know which process is really influencing decisions.

The big trouble is that the leap from lab to restaurant is a long one. It’s the same long leap that Cass Sunstein takes in using his experiment about “blaps” to conclude that New York Times readers would not choose a doctor who was a Republican. (See this earlier post.)

The Hegera-Papageorge article left me hungry for an ethnography about real people starting a real restaurant. How did they estimate their chances of success, how did they size up the external conditions (the “market”), and how did they estimate their own abilities. How did those perceptions change over time from the germ of the idea (“You know, I’ve always thought I could . . .”) to the actual restaurant and everything in between — and what caused those perceptions to change? On these questions, the lab experiment has nothing to say.



But you’ve got to admit, it’s a great title. Totally.

R-E-S-P-E-C-T, Find Out What It Means to Me . . . Or Not

July 8, 2018
Posted by Jay Livingston

In the previous post, I wondered why Republican women surveyed by Pew saw Donald Trump as having “a great deal” or “a fair amount” of respect for women. One of the explanations I didn’t consider is that people don’t always answer the question that researchers are asking. The Pew survey asked dozens of questions. Several were about respect — how much respect does Trump have for women, men, Blacks, Hispanics, Evangelicals, and more. Others asked how believable Trump is, whether he keeps his business interests separate from his presidential decisions, whether he respects democratic institutions. (Results from the survey are here.)

But maybe to the people being interviewed, these were all the same question: Trump – good or bad?

Claude Fischer blogged recently (here) about this difference between questions researchers think they are asking and the questions people are actually responding to. Sometimes people give incorrect answers to basic factual questions. But it’s not that these respondents are ignorant.

an interesting fragment of respondents treat polls not as a quiz to be graded on but as an opportunity for what survey scholars have termed “expressiveness” and partisan “cheerleading.”

I would broaden this kind of poll responding to include “self-presentation” or, more simply, “sending a message.” That is, there are respondents who treat some factual questions not as chances to show what they know but as chances to tell the interviewer, or data analyst, or reader, or even themselves something more important than facts.

If expressing feelings or sending a message underlie people’s responses to factual questions,  those same purposes should have even more importance when it comes to subjective judgments, like whether Trump has a lot of respect for women.

Fischer seems to side with the “sending a message” explanation. But that phrase suggests, to me at least, an intention to have some specific effect. For example, proponents of harsher criminal penalties claim that these will “send a message” to potential criminals. The obvious corollary is that these punishments will have an actual effect – less crime.

When pollsters call me, I’m often tempted to send a message. I consider what the implications of my answer will be when it’s reported in the survey and how that might affect politicians’ decisions. I’m even tempted to lie on demographic questions (age, income, party affiliation). Maybe my preferences will swing more weight coming from a young Independent.

But my hunch is that in most of the Pew questions about respect, people are not trying to influence policy. They’re just expressing a global feeling about Trump. The message, as Fischer says, is that they want others to know how they feel.        

Which is it — a deliberate strategy or an expression of sentiment? The trouble is that the only way to know what people are thinking when we ask them whether Trump respects women is to ask them and to listen to their answers instead of giving them four choices and them moving on to the next question. That is the great limitation of questionnaire surveys.

A Class of Rich People — Gallup Goes Marxist

June 10, 2018
Posted by Jay Livingston

Gallup asked “Do You Think the United States Benefits From Having a Class of Rich People, or Not?” Here are the results.



Gallup’s lede is that Democrats have grown more skeptical about the rich while Independents and Republicans haven’t changed their views. The other obvious conclusions from the survey is that Republicans think far more favorably of the rich and that Independents are closer to Democrats than to Republicans. (The Gallup summary is here.)

What surprised me is that Republicans would agree to even answer the question given that it was about “a class of rich people.” The true conservative would tell the Gallup interviewer, “There are no classes in America. We have only individuals; some of them get rich.” But overall, only 3% of the 1500 people surveyed refused to answer, though Gallup does not provide data on the political affiliation of these refuseniks.

Most of the time, when Americans talk about “class” they really mean “social status” – a scale based mostly on money which, therefore, has infinite gradations. A person with $100,000 is higher on the scale than is a person with $90,000. But “class” in the Gallup question implies a more Marxian definition — a group of people who share common economic interests and who act to secure those interests against the interests of other classes.

Unfortunately, we don’t know what Gallup’s respondents had in mind when they heard the question. Maybe Republicans, Independents, and Democrats interpreted the question differently.

What else could Gallup have asked?

“Does the US benefit from policies that allow some people to get very rich?” frames wealth as an individual matter with America as the land of unlimited opportunity.  A question like this would probably draw higher rates of agreement across the board.

“Do Americans in general benefit from policies that benefit the rich?” treats the rich more as a true class. It implies that some policies benefit one class, the rich, even though they might not benefit most people. This question might have fewer people agreeing.

I wonder what the results would be if Gallup asked both these questions.

Experiments and the Real World

May 26, 2018
Posted by Jay Livingston

Two days ago, the NY Times published an op-ed by Tali Sharot and Cass Sunstein, “Would You Go to a Republican Doctor?” It is based on a single social psychology experiment. That experiment does not involve going to the doctor. It does not involve anything resembling choices that people make in their real lives. I was going to blog about it, but Anderw Gelman (here) beat me to it and has done a much better and more thorough job than I could have done. Here, for example, is a quote from the op-ed and Gelman’s follow-up.

“Knowing a person’s political leanings should not affect your assessment of how good a doctor she is — or whether she is likely to be a good accountant or a talented architect. But in practice, does it?”

I followed the link to the research article and did a quick search. The words “doctor,” “accountant,” and “architect” appear . . . exactly zero times.

Gelman takes the article apart piece by piece. But when you put the pieces together, what you get is a picture of the larger problem with experiments. They are metaphors or analogies. They are clever and contrived. They can sharpen our view of the world outside the lab, the “real” world  — but they are not that world.

 “My love is like a red, red rose.” Well, yes, Bobby, in some ways she is. But she is not in fact a red, red rose.

Here is the world of the Sharot-Sunstein experiment.

We assigned people the most boring imaginable task: to sort 204 colored geometric shapes into one of two categories, “blaps” and “not blaps,” based on the shape’s features. We invented the term “blap,” and the participants had to try to figure out by trial and error what made a shape a blap. Unknown to the participants, whether a shape was deemed a blap was in fact random.

The 97 Mechanical Turkers in the experiment had to work with a partner (that is, they thought they would work with a partner – there was no actual collaboration and no actual partner). Players thought they would be paid according to how well they sorted blaps. The result:

[The players] most often chose to hear about blaps from co-players who were politically like-minded, even when those with different political views were much better at the task.

To repeat, despite the title of the article (“Would You Go to a Republican Doctor?”), this experiment was not about choosing a doctor. To get to New York Times readers choosing doctors, you have to make a long inferential leap from Mechanical Turkers choosing blap-sorters. Sharot-Sunstein are saying, “My partner in sorting ‘blaps’ is like a red, red rose a doctor or an architect.” Well, yes, but . . . .

See the Gelman post for the full critique.

Full disclosure: my dentist has a MAGA hat in his office, and I’m still going back for a crown next month. A crown is like a hat in some ways, but not in others. 

Evidence at the Upshot

March 31, 2018
Posted by Jay Livingston

“Common sense” is not evidence. Neither is “what everyone knows” or, to use a source of data favored by our president, what “people say.”  That’s one of the first things students hear in the intro sociology course. Our discipline is empirical, we insist. It is evidence-based, and evidence is something that really happened. Often you have to actually count those things.

The Upshot is the “data-driven” site that the New York Times created to compete with FiveThirtyEight. Friday, an Upshot article about marriage, social class, and college had this lede,* a six-word graf.*
Princetonians like to marry one another.
The article, by Kevin Carey, showed that students from wealthier families are more likely to be married by their early thirties than are students from the bottom fifth of the income ladder. Carey argued that the cause was “assortative mating” – like marries like – and that the pattern holds even for graduates of the same elite school – Princeton, for example. Rich Princetonians marry other rich Princetonians, says Carey. Poor Princetonians remain unmarried. In their early thirties, only a third of them were married.

(Click on the image for a larger view.)

According to Carey, the sorting that leads to mating takes place in the “eating clubs” – Princeton’s version of fraternities and sororities. Acceptance into this or that club depends in part on social class, so as Carey sees it, “Eating clubs are where many upper-income marriages begin.”

It’s logical and it makes sense. The only trouble is that Carey provides no evidence for Tiger intermarriage. That 56% of rich Princeton alums who were married by age 32-34 – we don’t know who they married. Another rich Princetonian? Maybe, maybe not. We know only that they were married, not to whom.

Oh, wait. I said Carey provided no evidence. I take that back. Here’s the second graf.

Although the university is coy about the exact number of Tiger-Tiger marriages, Princeton tour guides are often asked about matrimonial prospects, and sometimes include apocryphal statistics — 50 percent! Maybe 75! — in their patter. With an insular campus social scene, annual reunions and a network of alumni organizations in most major cities, opportunities to find a special someone wearing orange and black are many.

You don’t have to have taken a methods course to know that this is not good evidence, or even evidence at all. What people say, and even logical reasons that something should happen, are not evidence that it does happen. Carey all but admits that he has no real data on Princeton intermarriage, but that doesn’t stop him from writing about it as though it’s a solid fact.

Is it? Five years ago, a Princeton alumna, president of the class of ’77, published a letter in The Daily Princetonian giving her 21st-century counterparts this bit of advice: “Find a husband on campus before you graduate.”

The reaction was swift and predictable. Some even thought that the Princetonian had run the piece as an April Fool’s joke. Besides, people these days typically do not get married till their late twenties – at least five years after they graduate. A lot can happen to that eating-club romance in those five years.

Let me clear: the negative reaction to the letter and the median marriage age of the US population are not evidence that Princetonians are not marrying one another. But it’s just as good (or bad) as Carey’s evidence that they are.

---------------
*Using journalism jargon when I’m writing about journalism is one of my favorite affectations.

Connecting the Dots

March 22, 2018
Posted by Jay Livingston


Brilliance in science is sometimes a matter of simplifying – paring away complicated scientific techniques and seeing what non-scientists would see if they looked in the right place. That’s what Richard Feynman did when he dropped a rubber ring into a glass of ice water – a flash of brilliance that allowed everyone to understand what caused the space shuttle Challenger disaster.

Andrew Gelman isn’t Richard Feynman, but he did something similar in his blog post about an article that’s been getting much buzz, including at Buzzfeed, since it was posted at SSRN two weeks ago. The article is about Naloxone, the drug administered to people who have overdosed on heroin or opoiods. It keeps them from dying.

The authors of the article, Jennifer Doleac and Anita Mukherjee, argue that while the drug may save lives in the immediate situation, it does not reduce overall drug deaths. Worse, the unintended consequences of the drug outweigh its short-run benefits. Those whose lives are saved go back to using drugs, committing crimes, and winding up in emergency rooms. In addition, a drug that will prevent overdoes death “[makes] riskier opioid use more appealing.” 

The title is “The Moral Hazard of Lifesaving Innovations: Naloxone Access, Opioid Abuse, and Crime.” (A moral hazard is something that encourages people to do bad things by protecting them from negative consequences.)

Naloxone didn’t happen all at once. In 2013 fewer than ten states allowed it; the next year the number had doubled. In 2015, only nine states still did not allow its use. Doleac and Mukherjee used these time differences to look at bad outcomes (theft, death, ER admissions) before and after the introduction Naloxone in the different states.  Here are some of their graphs.

(Click on an image for a larger view.)


They conclude that “broadening Naloxone access led to more opioid-related ER visits.” As for deaths, “in some areas, particularly the Midwest, expanding Naloxone access has increased opioid-related mortality.”

There are reasons to be skeptical of the data, but let’s assume that the numbers – the points in the graph – are accurate. Even so, says Andrew Gelman (here), there’s still the question of how to interpret that array of points. Doleac and Mukherjee add lines and what I assume are confidence bands to clarify the trends. But do these added techniques clarify, or do they create a picture that is different from the underlying reality? Here’s Gelman:

The weird curvy lines are clearly the result of overfitting some sort of non-regularized curves. More to the point, if you take away the lines and the gray bands, I don’t see any patterns at all! Figure 4 just looks like a general positive trend, and figure 8 doesn’t look like anything at all. The discontinuity in the midwest is the big thing—this is the 14% increase mentioned in the abstract to the paper—but, just looking at the dots, I don’t see it.


Are these graphs really an optical illusion, with the lines and shadings getting me to see something that isn’t really there? My powers of visualization are not so acute, so to see what Gelman meant about looking only at the dots, I erased the added lines and bands. Here is what the graphs looked like.


Like Gelman, I can’t see any clear patterns showing the effect of Naloxone. And as I read the reactions to the paper, I sense that its results are ambiguous enough to provide rich material for motivated perception. Conservatives and libertarians often start from the assumption that government attempts to help people only make things worse. The unintended-consequences crowd – Megan McCardle, for example (here) – take the paper at face value. Liberals Richard G. Frank, Keith Humphreys, and Harold A. Pollack (here), who have done their own research on Naloxone – are more skeptical about the accuracy of the data.*

----------------

* This reminded me of a post I did in the first year of this blog.  It was about an editorial in the WSJ that included an utterly dishonest, ideologically motivated connect-the-dots line imposed on an array of points. The post is here.

Ass-Backwards Through the Gateway

March 11, 2018
Posted by Jay Livingston

Imagine that you’re a US Attorney on the drug beat. Your boss is Jeff Sessions, who has announced that he’s going to vigorously enforce laws against marijuana and use the federal law when state laws are more lax. Maybe you also think that weed is a dangerous drug. You do a little “research” and tweet out your findings.



This brief tweet might serve as an example of how not to do real research. The sample, which excludes people who have not gone to treatment centers, is hardly representative of all users. There’s researcher bias since the guy with the ax to grind is the one asking the questions. The respondents too (the drug counselors) no doubt feel some pressure to give the Sessions-politically-correct answer. They may also be selectively remembering their patients. 

But even without the obvious bias, this tweet makes an error that mars research on less contentious issues. It samples on the dependent variable. The use of heavier drugs (opioids, heroin, meth, etc.) is the dependent variable – the outcome you are trying to predict. Marijuana use is the independent variable – the one you use to make that prediction. Taking your sample from confirmed heroin/opioid addicts gets things backwards. To see if weed makes a difference, you have to compare weed users with those who do not use and then see how many in each group take up more serious drugs.

Here’s an analogy – back pain. Suppose that, thanks to advances in imaging (MRIs and the like) doctors find that many of the people who show up with back pain have spinal abnormalities, especially disk bulges and protrusions. These bugles must be the gateway to back pain. So the doctors start doing more surgeries to correct these bulges. These surgeries often fail to improve things.

The doctors were sampling on the dependent variable (back pain), not on the independent variable (disk bulges). The right way to find out if spinal abnormalities cause back pain is to take MRIs of all people, not just those who show up in the doctor’s office. Eventually, researchers started doing this and found that lots of people with spinal abnormalities did not pass through the gateway and on to back pain.

The same problem often plagues explanations that try to reverse-engineer success. Find a bunch of highly effective people, then see what habits they share. Or look at some highly successful people (The Beatles, Bill Gates) and discover that early on in their careers they spent 10,000 hours working on their trade.


US Atty. Stuart’s tweet has all the advantages of anecdotal evidence and eyewitness testimony. It’s a good story, and it’s persuasive. But like anecdotal evidence and eyewitness testimony, it is frequently misleading or wrong. The systematic research – many studies over many years – shows little or no gateway effect of marijuana. No wonder US Attorney Stuart chose to ignore that research.*

----------------------------
* As Mark Kleiman has argued, even when a marijuana user does add harder drugs to his repertoire, the causes may have less to do with the drug itself than with the marketplace. The dealer you go to for your weed probably also carries heavier drugs and would be only too happy to sell them to you.  Legalizing weed so that it’s sold openly by specialty shops rather than by criminals may break that link to other drugs.

Algorithms and False Positives

September 13, 2017
Posted by Jay Livingston

Can face-recognition software tell if you’re gay?

Here’s the headline from The Guardian a week ago.


Yilun Wang and Michal Kosinski at Stanford’s School of Business have written an article showing that artificial intelligence – machines that can learn from their experiences – can develop algorithms to distinguish the gay from the straight. Kosinski goes farther. According to Business Insider,
He predicts that self-learning algorithms with human characteristics will also be able to identify:
  • a person’s political beliefs
  • whether they have high IQs
  • whether they are predisposed to criminal behaviour
When I read that last line, something clicked. I remembered that a while ago I had blogged about an Israeli company, Faception, that claimed its face recognition software could pick out the faces of terrorists, professional poker players, and other types. It all reminded me of Cesare Lombroso, the Italian criminologist. Nearly 150 years ago, Lombroso claimed that criminals could be distinguished by the shape of their skulls, ears, noses, chins, etc. (That blog post, complete with pictures from Lombroso’s book, is here.) So I was not surprised to learn that Kosinski had worked with Faception.

For a thorough (3000 word) critique of the Wang-Kosinski paper, see Greggor Mattson’s post at Scatterplot. The part I want to emphasize here is the problem of False Positives.

Wang-Kosinski tested their algorithm by showing a series of paired pictures from a dating site. In each pair, one person was gay, the other straight. The task was to guess which was which. The machine’s accuracy was roughly 80% – much better than guessing randomly and better than the guesses made by actual humans, who got about 60% right. (These are the numbers for photos of men only. The machine and humans were not as good at spotting lesbians. In my hypothetical example that follows, assume that all the photos are of men.)

But does that mean that the face-recognition algorithm can spot the gay person? The trouble with Wang-Kosinki’s gaydar test was that it created a world where half the population was gay. For each trial, people or machine saw one gay person and one straight.

Let’s suppose that the machine had an accuracy rate of 90%. Let’s also present the machine with a 50-50 world. Looking at the 50 gays, the machine will guess correctly on 45. These are “True Positives.” It identified them as gay, and they were gay. But it will also classify 5 of the gay people as not-gay. These are the False Negatives.

It will have the same ratio of true and false for the not-gay population. It will correctly identify 45 of the not-gays (True Negatives), but it will guess incorrectly that 5 of these straight people are gay (False Positive).


It looks pretty good. But how well will this work in the real world, where the gay-straight ratio is nowhere near 50-50? Just what that ratio is depends on definitions. But to make the math easier, I’m going to use 5% as my estimate. In a sample of 1000, only 50 will be gay. The other 950 will be straight.

Again, let’s give the machine an accuracy rate of 90%. For the 50 gays, it will again have 45 True Positives and 5 False Negatives. But what about the 950 not-gays. It will be correct 90% of the time and identify 885 of them as not-gay (True Negatives). But it will also guess incorrectly that 10% are gay. That’s 95 False Positives.


The number of False Positives is more than double the number of True Positives. The overall accuracy may be 90%, but when it comes to picking out gays, the machine is wrong far more often than it’s right.

The rarer the thing that you’re trying to predict, the greater the ratio of False Positives to True Positives. And those False Positives can have bad consequences. In medicine, a false positive diagnosis can lead to unnecessary treatment that is physically and psychologically damaging. As for politics and policy, think of the consequences if the government goes full Lomborso and uses algorithms for predicting “predisposition to criminal behavior.”

Somewhat Likely to Mess Up on the Likert Scale

May 27, 2017
Posted by Jay Livingston

Ipsos called last night, and I blew it. The interviewer, a very nice-sounding man in Toronto, didn’t have to tell me what Ipsos was, though he did, sticking with his script. I’d regularly seen their numbers cited (The latest “Reuers/Ipsos” poll shows Trump’s approve/disapprove at 37%/57%.)

The interviewer wanted to speak with someone in the household older than 18. No problem; I’m your man. After all, when I vote, I am a mere one among millions. The Ipsos sample, I figured, was only 1,000.  My voice would be heard.

He said at the start that the survey was about energy. Maybe he even said it was sponsored by some energy group. I wish I could remember.

 After a few questions about whether I intended to vote in local elections and how often I got news from various sources (newspapers, TV, Internet), he asked how well-informed I was about energy issues Again, I can’t remember the exact phrasing, but my Likert choices ranged from Very Well Informed to Not At All Informed.

I thought about people who are really up on this sort of thing – a guy I know who writes an oil industry newsletter, bloggers who post about fracking and earthquakes or the history of the cost of solar energy.  I feel so ignorant compared with them when I read about these things. So I went for the next-to-least informed choice. I think it was “not so well informed.”

“That concludes the interview. Thank you.”
“Wait a minute,” I said. “I don’t get to say what I think about energy companies? Don’t you want to know what bastards I think they are?”
“I’m sorry, we have to go with the first response.”
“I was being falsely modest.”
He laughed.
“The Koch brothers, Rex Tillerson, climate change, Massey Coal . . .”
He laughed again, but he wouldn’t budge. They run a tight ship at Ipsos.

Next time they ask, whatever the topic, I’m a freakin’ expert.

Imagine There’s a $5 Discount. It’s Easy If You Try. . . .

June 21, 2016
Posted by Jay Livingston

Reading Robert H. Frank’s new book Luck and Success, I came across this allusion to the famous Kahneman and Tversky finding about “framing.”

It is common . . . for someone to be willing to drive across town to save $10 on a $20 clock radio, but unwilling to do so to save $10 on a $1,000 television set.

Is it common? Do we really have data on crosstown driving to save $10? The research that I assume Frank is alluding to is a 1981 study by Daniel Kahneman and Amos Tversky. (pdf here ) Here are the two scenarios that Kahneman and Tversky presented to their subjects.

A.  Imagine that you are about to purchase a jacket for $125 and a calculator for $15. The calculator salesman informs you that the calculator you wish to buy is on sale for $10 at the other branch of the store, located 20 minutes drive away. Would you make the trip to the other store?

B. Imagine that you are about to purchase a calculator for $125 and a jacket for $15. The calculator salesman informs you that the calculator you wish to buy is on sale for $120 at the other branch of the store, located 20 minutes drive away. Would you make the trip to the other store?

The two are really the same: would you drive 20 minutes to save $5 on a calculator? But when the discount was on a $15 calculator, 68% of the subject said they would make the 20 minute trip. When the $5 savings applied to the $125 calculator, only 29% said they’d make the trip.

The study is famous even outside behavioral economics, and rightly so. It points up one of the many ways that we are not perfectly rational when we think about money. But whenever I read about this result, I wonder: how many of those people actually did drive to the other store? The answer of course is none. There was no actual store, no $125 calculator, no $15 jacket. The subjects were asked to “imagine.” They were thinking about an abstract calculator and an abstract 20-minute drive, not real ones.*

But if they really did want a jacket and a calculator, would 60 of the 90 people really have driven the 20 minutes to save $5 on a $15 calculator? One of the things we have long known in social research is that what people say they would do is not always what they actually will do. And even if these subjects were accurate about what they would do, their thinking might be including real-world factors beyond just the two in the Kahneman-Tversky abstract scenario (20 minutes, $5). Maybe they were thinking that they might be over by that other mall later in the week, or that if they didn’t buy the $15 calculator right now, they could always come back to this same store and get it.

It’s surprising that social scientists who cite this study take the “would do” response at face value, surprising because another well-known topic in behavioral economics is the discrepancy between what people say they will do and what they actually do. People say that they will start exercising regularly, or save more of their income, or start that diet on Monday. Then Monday comes, and everyone else at the table is having dessert, and well, you know how it is.

In the absence of data on behavior, I prefer to think that these results tell us not so much what people will do. They tell us what people think a rational person in that situation would do. What’s interesting then is that their ideas about abstract economic rationality are themselves not so rational.

---------------------------
* I had the same reaction to another Kahneman study, the one involving “Linda,” an imaginary bank teller. (My post about that one, nearly four years ago, is here ). What I said of the Linda problem might also apply to the jacket-and-calculator problem: “It’s like some clever riddle or a joke – something with little relevance outside its own small universe. You’re never going to be having a real drink in a real bar and see, walking in through the door an Irishman, a rabbi, and a panda.”

The Face That Launched a Thousand False Positives

May 27, 2016
Posted by Jay Livingston

What bothered the woman sitting next to him wasn’t just that the guy was writing in what might have been Arabic (it turned out to be math). But he also looked like a terrorist. (WaPo story here.)


We know what terrorists look like. And now an Israeli company, Faception, has combined big data with facial recognition software to come up with this.


According to their Website:

Faception can analyze faces from video streams, cameras, or . . . databases. We match an individual with various personality traits or types such as an Extrovert, a person with High IQ, Professional Poker Player or a Terrorist.

My first thought was, “Oh my god, Lombroso.”

If you’ve taken Crim 101, you might remember that Lombroso, often called “the father of criminology,” had the idea that criminals were atavisms, throwbacks to earlier stages of human evolution, with different skull shapes and facial features. A careful examination of a person’s head and face could diagnose criminality – even the specific type of lawbreaking the criminal favored. Here is an illustration from an 1876 edition of his book. Can you spot the poisoner, the Neapolitan thief, the Piedmont forger?

(Click on the image for a larger view.)

Criminology textbooks still mention Lombroso, though rarely as a source enlightenment. For example, one book concludes the section on Lombroso, “At this point, you may be asking: If Lombroso, with his ideas about criminal ears and jaws, is the ‘father of criminology,’ what can we expect of subsequent generations of criminologists?”

Apparently there’s just something irresistible in the idea that people’s looks reveal their character. Some people really do look like criminals, and some people look like cops.* Some look like a terrorist or a soccer mom or a priest. That’s why Hollywood still pays casting directors. After all, we know that faces show emotion, and most of us know at a glance whether the person we’re looking at is feeling happy, angry, puzzled, hurt, etc. So it’s only logical that a face will reveal more permanent characteristics. As Faception puts it, “According to social and life science research, our personality is determined by our DNA reflected in our face.” It’s not quite true, but it sounds plausible.

The problem with this technique is not the theory or science behind it, and probably not even its ability to pick out terrorists, brand promoters, bingo players, or any of their other dramatis personae in the Faception cast of characters. The problem is false positives. Even when a test is highly accurate, if the thing it’s testing for is rare, a positive identification is likely to be wrong. Mammograms, for example, have an accuracy rate as high as 90%. Each year, about 37 million women in the US are given mammograms. The number who have breast cancer is about 180,000. The 10% error rate means that of the 37 million women tested, 3.7 million will get results that are false positives. It also means that for the woman who does test positive, the likelihood that the diagnosis is wrong is 95%.**

Think of these screening tests as stereotypes. The problem with stereotypes is not that they are wrong; without some grain of truth, they wouldn’t exist. The problem is that they have many grains of untruth – false positives. We have been taught to be wary of stereotypes not just because they denigrate an entire class of people but because in making decisions about individuals, those stereotypes yield a lot of false positives.  

Faception does provide some data on the accuracy of its screening. But poker champions and terrorists are rarer even than breast cancer. So even if the test can pick out the true terrorist waiting to board the plane, it’s also going to pick out a lot of bearded Italian economists jotting integral signs and Greek letters on their notepads.

(h/t Cathy O’Neil at MathBabe.org)
----------------------

* Some people look like cops. My favorite example is the opening of Richard Price’s novel Lush Life – four undercover cops, though the cover they are under is not especially effective.

The Quality of Life Task Force: four sweatshirts in a bogus taxi set up on the corner of Clinton Street alongside the Williamsburg Bridge off-ramp to profile the incoming salmon run; their mantra: Dope, guns, overtime; their motto: Everyone’s got something to lose. 
[...]
At the corner of Houston and Chrystie, a cherry-red Denali pulls up alongside them, three overdressed women in the backseat, the driver alone up front and wearing sunglasses.
The passenger-side window glides down . “Officers, where the Howard Johnson hotel at around here ...”
“Straight ahead three blocks on the far corner,” Lugo offers.
“Thank you.” [. . .]
The window glides back up and he shoots east on Houston.
“Did he call us officers?”
“It’s that stupid flattop of yours.”
“It’s that fuckin’ tractor hat of yours.”

It wasn’t the haircut or the hat. They just looked like cops.


** The probability that the diagnosis is correct is 5% – the 180,000 true positives divided by the 3.7 million false positives plus the 180,000 true positives – roughly 180,000 / 3,900,000. (I took this example from Howard Wainer’s recent book, Truth and Truthiness.)

Show, Don’t Tell

March 23, 2016
Posted by Jay Livingston

Can the mood of a piece of writing be graphed?

For his final project in Andrew Gelman’s course on statistical communication and graphics, Lucas Estevem created a “Text Sentiment Visiualizer.” Gelman discusses it on his blog, putting the Visualizer through its paces with the opening of Moby Dick.

(Click on an image for a slightly larger view.)

Without reading too carefully, I thought that the picture – about equally positive and negative – seemed about right. Sure things ended badly, but Ishmael himself seemed like a fairly positive fellow. So I went to the Visualizer (here)  and pasted in the text of one of my blogposts. It came out mostly negative. I tried another. Ditto. And another. The results were not surprising when I thought about what I write here, but they were sobering nevertheless. Gotta be more positive.

But how did the Visualizer know? What was its formula for sussing out the sentiment in a sentence? Could the Visualizer itself be a glum creature, tilted towards the dark side, seeing negativity where others might see neutrality? I tried other novel openings. Kafka’s Metamorphosis was entirely in the red, and Holden Caulfield looked to be at about 90%. But Augie March, not exactly a brooding or nasty type, scored about 75% negative. Joyce’s Ulysses came in at about 50-50.

To get a somewhat better idea of the scoring, I looked more closely at page one of The Great Gatsby. The Visualizer scored the third paragraph heavily negative – 17 out of 21 lines. But many of those lines had words that I thought would be scored as positive.

Did the Visualizer think that extraordinary gift, gorgeous, and successful were not such a good thing?

Feeling slightly more positive about my own negative scores, I tried Dr. Seuss. He too skewed negative.


What about A Tale of Two Cities? Surely the best of times would balance out the worst of times, and that famous opening paragraph would finish in a draw. But a line-by-line analysis came out almost all negative.


Only best, hope, and Heaven made it to the blue side.

I’m not sure what the moral of the story is except that, as I said in a recent post, content analysis is a bitch.

Gelman is more on the positive side about the Visualizer. It’s “far from perfect,” but it’s a step in the right direction – i.e., towards visual presentation – and we can play around with it, as I’ve done here, to see how it works and how it might be improved. Or as Gelman concludes, “Visualization. It’s not just about showing off. It’s a tool for discovering and learning about anomalies.”

Race and Tweets

March 20, 2016
Posted by Jay Livingston


Nigger* is a racially charged word. And if you sort cities or states according to how frequently words like nigger turn up from them on Twitter, you’ll find large differences. In some states these words appear forty times more often than in others. But do those frequencies tell us about the local climate of race relations? The answer seems to be: it depends on who is tweeting.

In the previous post, I wondered whether the frequency of tweets with words like bitch, cunt, etc. tell us about general levels of misogyny in a state or city. Abodo.com, the Website that mapped the geography of sexist tweets, also had charts and maps showing both racially charged tweets (with words like “nigger”) and more neutral, politically correct, tweets (“African Americans” or “Black people”). Here are the maps of the two different linguistic choices.

(Click on the image for a larger view.)

West Virginia certainly looks like the poster state for racism – highest in “anti-Black” tweets, and among the lowest in “neutral or tolerant” tweets. West Virginia is 95% White, so it’s clear that we’re looking at how White people there talk about Blacks. That guy who sang about the Mountaineer State being almost heaven – I’m pretty sure he wasn’t a Black dude. Nevada too is heavily White (75% , Black 9%), but there, tweets with polite terms well outnumber those with slurs. Probably, Nevada is a less racist place than West Virginia.

But what about states with more Blacks? Maryland, about 30% Black, is in the upper range for neutral race-tweets, but it’s far from the bottom on “anti-Black” tweets. The same is true for Georgia and Louisiana, both about 30% Black. These states score high on both kinds – what we might call, with a hat-tip to Chris Rock, “nigger tweets” and “Black people tweets.” (If you are not familiar with Rock’s “Niggers and Black People,” watch it here.) If he had released this 8-minute stand-up routine as a series of tweets, and if Chris Rock were a state instead of a person, that state would be at the top in both categories – “anti-Black” and “neutral and tolerant. How can a state or city be both?

The answer of course is that the meaning of nigger depends on who is using it.  When White people are tweeting about Blacks, then the choice of words probably tells us about racism. But when most of the people tweeting are Black, it’s harder to know. Here, for example, are Abodo’s top ten cities for “anti-Black tweets.”


Blacks make up a large percent of the population in most of these cities.  The top four – Baltimore, Atlanta, and New Orleans – are over 50% Black. It’s highly unlikely that it’s the Whites there who are flooding Twitter with tweets teeming with “nigger, coon, dindu, jungle bunny, monkey, or spear chucker” – the words included in Abodo’s anti-Black tag.** If the tag had included niggas, the “anti-Black” count in these cities would have been even higher.

All this tells us is that Black people tweet about things concerning Black people. And since hip-hop has been around for more than thirty years, it shouldn’t surprise anyone that Blacks use these words with no slur intended. When I searched Twitter yesterday for nigger, the tweets I saw on the first page were all from Black people, and some of those tweets, rather than using the word nigger were talking about the use of it.  (Needless to say, if you search for niggas, you can scroll through many, many screens trying to find a tweet with a White profile picture.)



For some reason, Abodo refused to draw this obvious conclusion. They do say in another section of the article that  “anti-Hispanic slurs have largely not been reclaimed by Hispanic and Latino people in the way that the N-word is commonly used in black communities.” So they know what’s going on. But in the section on Blacks, they say nothing, tacitly implying that these “anti-Black” tweets announce an anti-Black atmosphere. But that’s true only if the area is mostly White. When those tweets are coming from Blacks, it’s much more complicated.

----------------------------

*Abodo backs away from using the actual word. They substitute the usual euphemism – “the N-word.” As I have said elsewhere in this blog, if you can’t say the word you’re talking about when you’re talking about it as a word, then the terrorists have won. In this view, I differ from another Jay (Smooth) whose views I respect. A third Jay (Z) has no problems with using the word. A lot.

** I confess, porch monkey and dindu were new to me, but then, I don’t get out much, at least not in the right circles. Abodo ignored most of the terms in the old SNL sketch with Richard Pryor and Chevy Chase.  (The available videos, last time I checked, are of low quality, but like Chris Rock’s routine, it is an important document that everyone interested in race and media should be familiar with. A link along with a partial transcript is in this earlier post.)

Content Analysis Is a Bitch

March 18, 2016
Posted by Jay Livingston

Can Twitter tell us about the climate of intolerance? Do the words in all those tweets reveal something about levels of racism and sexism? Maybe. But the language of intolerance – “hate speech” – can be tricky to read.

Adobo is website for people seeking apartments – Zillow for renters – and it recently posted an article, “America’s Most P.C. and Prejudiced Places” (here), with maps and graphs of data from Twitter. Here, for example, are the cities with the highest rates of misogynistic tweets. 


Unfortunately, Abodo does not say which words are in its formula for “deragotory language against women.” But Abodo does recognize that bitch might be a problem because “it is commonly used as profanity but not always with sexist intent.”  Just to see what those uses might be, I searched for “bitch” on Twitter, but the results, if not overtly sexist, all referred to a female as a bitch.


Maybe it was New Orleans. I tried again adding “NOLA” as a search term and found one non-sexist bitch.


When Abodo ran their much larger database of tweets but excluded the word bitch from its misogyny algorithm, New Orleans dropped from first place to fourth, and Baton Rouge disappeared from the top ten. Several Northeast and Western cities now made the cut.


This tells us what we might have known if we’d been following Jack Grieve’s Twitter research (here) – that bitch is especially popular in the South.


The Twitter map of cunt is just the opposite. It appears far more frequently in tweets from the Northeast than from the South.


The bitch factor changes the estimated sexism of states as well as cities. Here are two maps, one with and one without bitch in its sexism screen.

(Click on the image for a larger view.)

With bitch out of the equation, Louisiana looks much less nasty, and the other Southeast states also shade more towards the less sexist green. The Northeast and West, especially Nevada, now look more misogynistic. A few states remain nice no matter how you score the tweets – Montana, Wyoming, Vermont – but they are among the least populous states so even with Twitter data, sample size might be a problem. Also note that bitch accounts for most of what Abodo calls sexist language. Without bitch, the rates range from 26 to 133 per 100,000 tweets. Add bitch to the formula and the range moves to 74 to 894 per 100,000.  That means that at least two-thirds of all the “derogatory language against women” on Twitter is the word bitch.

There’s a further problem in using these tweets as an index of sexism. Apparently a lot of these bitch tweets are coming from women (if my small sample of tweets is at all representative). Does that mean that the word has lost some of its misogyny? Or, as I’m sure some will argue, do these tweets mean that women have become “self-hating”? This same question is raised, in spades, by the use of nigger. Abodo has data on that too, but I will leave it for another post.

Which Percentages, Which Bars

March 13, 2016
Posted by Jay Livingston

Whence Trumpismo, as ABC calls it (here), as though he were a Latin American dictator? Where does Trump get his support? Who are the voters that prefer Trump to the other candidates?

The latest ABC/WaPo poll, out today, has some answers. But it also has some bafflingly screwed-up ways of setting out the results. For example, the ABC write-up (by Chad Kiewiet De Jonge) says what many have been saying: “Trump’s support stems from economic discontent, particularly among working-class whites.” Appropriately, the poll asked people how they were doing economically – were they Struggling, Comfortable, or Moving Up?

That’s pretty clear: economic circumstances is the independent variable, candidate preference is the dependent variable. You compare these groups and find that Strugglers are far more likely to support Trump than were the folks who are better off. Instead, we get a chart percentaged the other way.



Instead of comparing people of different economic circumstances, it compares the supporters of the different candidates. And it doesn’t even do that correctly. If you want to compare Trump backers with Cruz, and Rubio/Kasich backers, the candidates should be the columns. (The poll merged Rubio and Kasich supporters for purposes of sample size.) Here’s the same data. Which chart is easier to interpret?



This makes the comparison a bit easier.  The margin of error is about 5 points. So Trump supporters might be somewhat more likely to see their economic circumstances as a struggle.

There’s a similar problem with their analysis of authoritarianism. “It’s also been argued that people who are predisposed to value order, obedience and respect for traditional authority tend to be strongly attracted to Trump.” But instead of comparing the very authoritarian with the less so, ABC/WaPo again compares the supporters of Trump, Cruz, and Rubio/Kasich.



Instead of telling us who authoritarians prefer, this analysis tells us which candidate’s backers have a higher proportion of authoritarians. And again, even for that, it makes the answer hard to see. Same data, different chart.



Cruz supporters, not Trumpistas, are the most authoritarian, probably because of that old time religion, the kind that emphasizes respect for one’s elders. (For more on Cruz supporters and uncompassionate Christian conservatism, see this post.)

The poll has worthwhile data, and it gets the other charts right. The pdf lists Abt SRBI and Langer Research as having done the survey and analysis. To their credit, they present a regression model of the variables that is far more sophisticated than what the popular press usually reports. But come on guys, percentage on the independent variable.

Margin of Error – Mostly Error

February 14, 2016
Posted by Jay Livingston

It’s the sort of social “science” I’d expect from Fox, not Vox. But today, Valentine’s Day, Vox (here) posted this map purporting to show the average amount people in each state spent on Valentine’s Day.

(Click on the image for a larger view.)


“What’s with North Dakota spending $108 on average, but South Dakota spending just $36?” asks Vox. The answer is almost surely: Error.

The sample size was 3,121. If they sampled each state in its proportion of the US population, the sample in the each Dakota would be about n = 80 n = 8. The source of the data, Finder, does not report any margins of error or standard deviations, so we can’t know. Possibly, a couple of guys in North Dakota who’d saved their oil-boom money and spent it on chocolates are responsible for that average. Idaho, Nevada, and Kansas – the only other states over the $100 mark – are also small-n. So are the states at the other other end, the supposedly low-spending states (SD, WY, VT, NH, ME, etc.). So we can’t trust these numbers.

The sample in the states with large populations (NY, CA, TX, etc.) might have been as high as 300-400, possibly enough to make legitimate comparisons, but the differences among them are small – less than $20.

My consultant on this matter, Dan Cassino (he does a lot of serious polling), confirmed my own suspicions. “The study is complete bullshit.”

UPDATE February 24, 2016: Andrew Gelman (here) downloaded the data did a far more thorough analysis, estimating the variation for each state. His graph of the states shows that even between the state with the highest mean and the state with the lowest, the uncertainty is too great to allow for any conclusions: “Soooo . . . we got nuthin’.”

Andrew explains why it’s worthwhile to do a serious analysis even on frivolous data like this Valentine-spending survey. He also corrects my order-of-magnitude overestimation of the North Dakota sample size. 

Too Good to Be True

January 26, 2016
Posted by Jay Livingston


Some findings that turn up in social science research look good to be true, as when a small change in inputs brings a large change in outcomes. Usually the good news comes in the form of anecdotal evidence, but systematic research too can yield wildly optimistic results.

Anecdotal evidence?  Everyone knows to be suspicious, even journalists. A Lexis-Nexis search returns about 300 news articles just in this month where someone was careful to specify that claims were based on “anecdotal evidence” and not systematic research.

Everywhere else, the anecdotal-systematic scale of credibility is reversed. As Stalin said, “The death of a million Russian soldiers – that is a statistic. The death of one Russian soldier – that is a tragedy.” He didn’t bother to add the obvious corollary: a tragedy is far more compelling and persuasive than is a statistic.

Yet here is journalist Heather Havrilesky in the paper of record reviewing Presence, a new book by social scientist Amy Cuddy:

This detailed rehashing of academic research . . . has the unintended effect of transforming her Ph.D. into something of a red flag.

Yes, you read that correctly. Systematic research supporting an idea is a bright red warning sign.

Amy Cuddy, for those who are not among the millions who have seen her TED talk, is the social psychologist (Ph.D. Princeton) at the Harvard Business School who claims that standing in the Wonder Woman “power pose” for just two minutes a day will transform the self-doubting and timid into the confident, assertive, and powerful. Power posing even changes levels of hormones like cortisol and testosterone.


Havrilesky continues.

While Cuddy’s research seems to back up her claims about the effects of power posing, even more convincing are the personal stories sent to the author by some of the 28 million people who have viewed her TED talk. Cuddy scatters their stories throughout the book. . . .

Systematic research is OK for what it is, Havrilesky is saying, but the clincher is the anecdotal evidence. Either way, the results fall into the category of “Amazing But True.”

Havrilesky was unwittingly closer to the truth with that “seems” in the first clause. “Cuddy’s research seems to back up her claims . . . ” Perhaps, but research done by people other than Cuddy and her colleagues does not.  As Andrew Gelman and Kaiser Fung detail in Slate, the power-pose studies have not had a Wonder Woman-like resilience in the lab. Other researchers trying to replicate Cuddy’s experiments could not get similarly positive results.

But outside the tiny world of replication studies, Cuddy’s findings have had a remarkable staying power considering how fragile* the power-pose effect was. The problem is not just that the Times reviewer takes anecdotal evidence as more valid. It’s that she is unaware that contradictory research was available. Nor is she unique in this ignorance. It pervades reporting even in serious places like the Times. “Gee whiz science,” as Gelman and Fung call it, has a seemingly irresistible attraction, much like anecdotal evidence. Journalists and the public want to believe it; scientists want to examine it further.

Our point here is not to slam Cuddy and her collaborators. . . . And we are not really criticizing the New York Times or CBS News, either. . . . Rather, we want to highlight the yawning gap between the news media, science celebrities, and publicists on one side, and the general scientific community on the other. To one group, power posing is a scientifically established fact and an inspiring story to boot. To the other, it’s just one more amusing example of scientific overreach.

I admire Gelman and Fung’s magnanimous view. But I do think that those in the popular press who report about science should do a little skeptical fact-checking when the results seem too good to be true, for too often these results are in fact too good to be true.

---------------------
* “Fragile” is the word used by Joe Simmons and Uri Simonsohn in their review and replication of Cuddy’s experiments (here).

B is for Beauty Bias

January 6, 2016
Posted by Jay Livingston

The headlines make it pretty clear.
Attractive Students Get Higher Grades, Researchers Say

That’s from NewsweekSlate copied Scott Jaschik’s piece, “Graded on Looks,” at Inside Higher Ed and gave it the title, “Better-Looking Female Students Get Better Grades.”

But how much higher, how much better?

For female students, an increase of one standard deviation in attractiveness was associated with a 0.024 increase in grade (on a 4.0 scale).

The story is based on a paper by Rey Hernández-Julián and Christina Peters presented at the American Economic Association meetings. 

You can read the IHE article for the methodology. I assume it’s solid. But for me the problem is that I don’t know if the difference is a lot or if it’s a mere speck of dust – statistically significant dust, but a speck nevertheless. It’s like the 2007 Price-Wolfers research on fouls in the NBA. White refs were more likely to call fouls on Black players than on Whites. Andrew Gelman (here), who is to statistics what Steph Curry is to the 3-pointer, liked the paper, so I have reservations about my reservations. But the degree of bias it found came to this: if an all-Black NBA team played a very hypothetical all-White NBA team in a game refereed by Whites, the refs’ unconscious bias would result in one extra foul called against the all-Blacks. 

I have the same problem with this beauty-bias paper. Imagine a really good-looking girl, one whose beauty is 2½ standard deviations above the mean – the beauty equivalent of an IQ of 137. Her average-looking counterpart with similar performance in the course gets a 3.00 – a B. But the stunningly attractive girl winds up with a 3.06 – a B.

The more serious bias reported in the paper is the bias against unattractive girls.

The least attractive third of women, the average course grade was 0.067 grade points below those earned by others.

It’s still not enough to lower a grade from B to B-, but perhaps the bias is greater against girls who are in the lower end of that lower third. The report doesn’t say.

Both these papers, basketball and beauty, get at something dear to the liberal heart – bias based on physical characteristics that the person has little power to change. And like the Implicit Association Test, they reveal that the evil may lurk even in the hearts and minds of those who think they are without bias. But if one foul in a game or one-sixth of a + or - appended to your letter grade on your GPA is all we had to worry about, I’d feel pretty good about the amount of bias in our world.

[Personal aside: the research I’d like to see would reverse the variables. Does a girl’s academic performance in the course affect her beauty score? Ask the instructor on day one to rate each student on physical attractiveness. Then ask him to rate them again at the end of the term. My guess is that the good students will become better looking.]

Men Are From Mars, Survey Respondents Are From Neptune

November 22, 2015
Posted by Jay Livingston

Survey researchers have long been aware that people don’t always answer honestly. In face-to-face interviews especially, people may mask their true opinion with the socially desirable response. Anonymous questionnaires have the same problem, though perhaps to a lesser degree. But self-administered surveys, especially the online variety, have the additional problem of people who either don’t take it seriously or treat it with outright contempt. Worse, as Shane Frederick (Yale, management) discovered, the proportion of “the random and the perverse” varies from item to item.

On open-ended questions, when someone answers “asdf” or “your mama,” as they did on an online survey Frederick conducted, it’s pretty clear that they are making what my professor in my methods class called “the ‘fuck you’ response.”

But what about multiple-choice items.
Is 8+4 less than 3? YES / NO
11% Yes.
Maybe 11% of Americans online really can’t do the math.  Or maybe all 11% were blowing off the survey. But then what about this item?

Were you born on the planet Neptune? YES / NO
17% Yes
     
Now the ranks of the perverse have grown by at least six percentage points, probably more. Non-responders, the IRB, and now the random and the perverse – I tell ya, survey researchers don’t get no respect.

----------------
Big hat tip to Andrew Gelman. I took everything in this post from his blog (here), where commenters try seriously to deal with the problem created by these kinds of responses.

Evidence vs. Bullshit – Mobster Edition

September 21, 2015
Posted by Jay Livingston

Maria Konnikova is a regular guest on Mike Pesca’s pocast “The Gist.”  Her segment is called “Is That Bullshit.” She addresses pressing topics like
  • Compression sleeves – is that bullshit?
  • Are there different kinds of female orgasm?
  • Are artificial sweeteners bad for your health?
  • Does anger management work?
We can imagine of all kinds of reasons why compression sleeves might work or why diet soda might be unhealthful, but if you want to know if it’s bullshit, you need good evidence. Which is what Konnikova researches and reports on.

Good evidence is also the gist of my class early in the semester. I ask students whether more deaths are caused each year by fires or by drownings. Then I ask them why they chose their answer. They come up with good reasons. Fires can happen anywhere – we spend most of our time in buildings, not so much on water. Fires happen all year round; drownings are mostly in the summer. A fire may kill many people, but group drownings are rare. The news reports a lot about fires, rarely about drownings. And so on.

The point is that for a good answer to the question, you need more than just persuasive reasoning. You need someone to count up the dead bodies. You need the relevant evidence.

“Why Do We Admire Mobsters?” asks Maria Konnikova recently in the New Yorker (here).  She has some answers:
  • Prohibition: “Because Prohibition was hugely unpopular, the men who stood up to it [i.e., mobsters] were heralded as heroes, not criminals.” Even after Repeal, “that initial positive image stuck.”
  • In-group/ out-group: For Americans, Italian (and Irish) mobsters are “similar enough for sympathy, yet different enough for a false sense of safety. . .  Members of the Chinese and Russian mob have been hard to romanticize.”
  • Distance: “Ultimately the mob myth depends on psychological distance. . .  As painful events recede into the past, our perceptions soften. . . . Psychological distance allows us to romanticize and feel nostalgia for almost anything.”
  • Ideals: “We enjoy contemplating the general principles by which they are supposed to have lived: omertà, standing up to unfair authority, protecting your own.”
These are plausible reasons, but are they bullshit? Konnikova offers no systematic evidence for anything she says. Do we really admire mobsters? We don’t know. Besides it would be better to ask: how many of us admire them, and to what degree? Either way, I doubt that we have good survey data on approval ratings for John Gotti. All we know is that mobster movies often sell a lot of tickets. Yet the relation between our actual lives (admiration, desires, behavior) and what we like to watch on screen is fuzzy and inconsistent.

It’s fun to speculate about movies and mobsters,* but without evidence all we have is at best speculation, at worst bullshit.

UPDATE:
In a message to me, Maria Konnikova says that there is evidence, including surveys, but that the New Yorker edited that material out of the final version of her article.

----------
* Nine years ago, in what is still one of my favorite posts on this blog, I speculated on the appeal of mafia movies (here). I had the good sense to acknowledge that I was speculating and to point out that our preferences in fantasyland had a complicated relation to our preferences in real life.